[Devel] [PATCH RFC] mm: Limit number of busy-looped shrinking processes

Dmitry Monakhov dmonakhov at openvz.org
Tue Sep 5 12:49:35 MSK 2017


Kirill Tkhai <ktkhai at virtuozzo.com> writes:

> When a FUSE process is making shrink, it must not wait
> on page writeback. Otherwise, it may meet a page,
> that is being writebacked by him, and the process will stall.
>
> So, our kernel does not wait writeback after commit a9707947010d
> "mm: vmscan: never wait on writeback pages".
>
> But in case of huge number of writebacked pages and
> memory pressure, this lead to busy loop: many process
> in the system are trying to shrink memory and have
> no success. And the node shows high time, spent in kernel.
>
> This patch reduces the number of processes, which may
> busy looping on shrink. Only one userspace process --
> vstorage -- will be allowed not to sleep on writeback.
> Other processes will sleep up to 5 seconds to wait
> writeback completion on every page.
>
> The detection of vstorage is very simple and it based
> on process name. It seems, there is no a way to detect
NAK. Detection by name is very very bad design style.
fused and others should mark iself as writeback-proof explicitly
via API similar ioctl/madvice/ionice/ulimit,
may be it is reasonable to place such app to speciffic cgroup,
you may pick any recepy you like. But please do not do comm-name
matching.

> all FUSE processes, especially from !ve0, because FUSE
> mount is tricky, and a process doing mount may not be
> a FUSE daemon. So, we remain the vanila kernel behaviour,
> but we don't wait forever, just 5 second. This will save
> us from lookup messages from kernel and will allow
> to kill FUSE daemon if necessary.
>
> https://jira.sw.ru/browse/PSBM-69296
>
> Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
> ---
>  mm/vmscan.c |   19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a5db5940bb1..e72d515c111 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -959,8 +959,16 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  
>  			/* Case 3 above */
>  			} else {
> -				nr_immediate++;
> -				goto keep_locked;
> +				/*
> +				 * Currently, vstorage is the only fuse process,
> +				 * exercising writeback; it mustn't sleep to avoid
> +				 * deadlocks.
> +				 */
> +				if (!strncmp(current->comm, "vstorage", 8) ||
> +				    wait_on_page_bit_killable_timeout(page, PG_writeback, 5 * HZ) != 0) {
> +					nr_immediate++;
> +					goto keep_locked;
> +				}
>  			}
>  		}
>  
> @@ -1592,9 +1600,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  	if (nr_writeback && nr_writeback == nr_taken)
>  		zone_set_flag(zone, ZONE_WRITEBACK);
>  
> -	if (!global_reclaim(sc) && nr_immediate)
> -		congestion_wait(BLK_RW_ASYNC, HZ/10);
> -
> +	/*
> +	 * memcg will stall in page writeback so only consider forcibly
> +	 * stalling for global reclaim
> +	 */
>  	if (global_reclaim(sc)) {
>  		/*
>  		 * Tag a zone as congested if all the dirty pages scanned were


More information about the Devel mailing list