[Devel] [PATCH rh7] fs: add __GFP_NORETRY in alloc_fdmem

Konstantin Khorenko khorenko at virtuozzo.com
Thu Mar 16 08:03:49 PDT 2017


Andrey, please take a look.

All other patches from Anatoly are applied already, except this one.
Worth to apply this one as well?

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 10/21/2016 02:42 PM, Anatoly Stepanov wrote:
> This is a backport of upstream (vanilla) commit:
> commit 96c7a2ff21501691587e1ae969b83cbec8b78e08
>
> Under certain conditions there might be a lot of
> alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.
>
> For example: httpd which is doing a lot of fork() calls.
>
> Real-life examples from our customers:
>
> [532506.773243] httpd           D ffff8803f5fecc20     0 939874   6606
> [532506.773257] Call Trace:
> [532506.773261]  [<ffffffff8163ce29>] schedule+0x29/0x70
> [532506.773264]  [<ffffffff8163a9d5>] schedule_timeout+0x175/0x2d0
> [532506.773272]  [<ffffffff8108cc90>] ? internal_add_timer+0x70/0x70
> [532506.773276]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
> [532506.773280]  [<ffffffff8119be85>] wait_iff_congested+0x135/0x150
> [532506.773284]  [<ffffffff810a86e0>] ? wake_up_atomic_t+0x30/0x30
> [532506.773288]  [<ffffffff8119071f>] shrink_inactive_list+0x65f/0x6c0
> [532506.773292]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
> [532506.773296]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
> [532506.773300]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
> [532506.773310]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
> [532506.773315]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
> [532506.773320]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
> [532506.773324]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
> [532506.773327]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
> [532506.773332]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
> [532506.773337]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
> [532506.773341]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
> [532506.773344]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
> [532506.773354]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
> [532506.773358]  [<ffffffff8107a641>] do_fork+0xe1/0x320
> [532506.773370]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
> [532506.773376]  [<ffffffff81648299>] stub_clone+0x69/0x90
> [532506.773380]  [<ffffffff81647f49>] ? system_call_fastpath+0x16/0x1b
>
> [513890.005271] httpd           D ffff880425db7230     0 811718   6606
> [513890.005279] Call Trace:
> [513890.005282]  [<ffffffff8163ce29>] schedule+0x29/0x70
> [513890.005284]  [<ffffffff8163aa99>] schedule_timeout+0x239/0x2d0
> [513890.005292]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
> [513890.005296]  [<ffffffff8163c448>] io_schedule+0x18/0x20
> [513890.005298]  [<ffffffff812c6268>] get_request+0x218/0x780
> [513890.005303]  [<ffffffff812c8526>] blk_queue_bio+0xc6/0x3a0
> [513890.005309]  [<ffffffffa0002c59>] ? dm_make_request+0x119/0x170 [dm_mod]
> [513890.005311]  [<ffffffff812c3892>] generic_make_request+0xe2/0x130
> [513890.005313]  [<ffffffff812c3957>] submit_bio+0x77/0x1c0
> [513890.005318]  [<ffffffff811bf87e>] __swap_writepage+0x1be/0x260
> [513890.005337]  [<ffffffff811bf959>] swap_writepage+0x39/0x80
> [513890.005340]  [<ffffffff8118f68d>] shrink_page_list+0x4ad/0xa80
> [513890.005343]  [<ffffffff811902bb>] shrink_inactive_list+0x1fb/0x6c0
> [513890.005345]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
> [513890.005348]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
> [513890.005350]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
> [513890.005353]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
> [513890.005355]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
> [513890.005358]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
> [513890.005360]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
> [513890.005362]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
> [513890.005365]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
> [513890.005367]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
> [513890.005369]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
> [513890.005371]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
> [513890.005376]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
> [513890.005378]  [<ffffffff8107a641>] do_fork+0xe1/0x320
> [513890.005380]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
> [513890.005382]  [<ffffffff81648299>] stub_clone+0x69/0x90
>
> We observed that sometimes kswapd cannot handle this which
> causes many direct reclaim attempts which in turn:
>
> 1. Increases iowait time due to congestion_wait
> 2. Increases number of block reqs per second due to
> page swapping and writeback
> 3. May induce OOMs
>
> So it's better DO NOT try that hard to allocate contiguous
> area, and fallback to vmalloc() as soon as possible.
>
> Signed-off-by: Anatoly Stepanov <astepanov at cloudlinux.com>
> ---
>  fs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index 366d9bb..3f65ba0 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size)
>  	 * vmalloc() if the allocation size will be considered "large" by the VM.
>  	 */
>  	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
> +		void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>  		if (data != NULL)
>  			return data;
>  	}
>


More information about the Devel mailing list