[Devel] [PATCH rh7] fs: add __GFP_NORETRY in alloc_fdmem
Konstantin Khorenko
khorenko at virtuozzo.com
Thu Mar 16 08:03:49 PDT 2017
Andrey, please take a look.
All other patches from Anatoly are applied already, except this one.
Worth to apply this one as well?
--
Best regards,
Konstantin Khorenko,
Virtuozzo Linux Kernel Team
On 10/21/2016 02:42 PM, Anatoly Stepanov wrote:
> This is a backport of upstream (vanilla) commit:
> commit 96c7a2ff21501691587e1ae969b83cbec8b78e08
>
> Under certain conditions there might be a lot of
> alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.
>
> For example: httpd which is doing a lot of fork() calls.
>
> Real-life examples from our customers:
>
> [532506.773243] httpd D ffff8803f5fecc20 0 939874 6606
> [532506.773257] Call Trace:
> [532506.773261] [<ffffffff8163ce29>] schedule+0x29/0x70
> [532506.773264] [<ffffffff8163a9d5>] schedule_timeout+0x175/0x2d0
> [532506.773272] [<ffffffff8108cc90>] ? internal_add_timer+0x70/0x70
> [532506.773276] [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
> [532506.773280] [<ffffffff8119be85>] wait_iff_congested+0x135/0x150
> [532506.773284] [<ffffffff810a86e0>] ? wake_up_atomic_t+0x30/0x30
> [532506.773288] [<ffffffff8119071f>] shrink_inactive_list+0x65f/0x6c0
> [532506.773292] [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
> [532506.773296] [<ffffffff811914af>] shrink_zone+0xef/0x2d0
> [532506.773300] [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
> [532506.773310] [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
> [532506.773315] [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
> [532506.773320] [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
> [532506.773324] [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
> [532506.773327] [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
> [532506.773332] [<ffffffff811d8c69>] __kmalloc+0x259/0x270
> [532506.773337] [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
> [532506.773341] [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
> [532506.773344] [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
> [532506.773354] [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
> [532506.773358] [<ffffffff8107a641>] do_fork+0xe1/0x320
> [532506.773370] [<ffffffff8107a906>] SyS_clone+0x16/0x20
> [532506.773376] [<ffffffff81648299>] stub_clone+0x69/0x90
> [532506.773380] [<ffffffff81647f49>] ? system_call_fastpath+0x16/0x1b
>
> [513890.005271] httpd D ffff880425db7230 0 811718 6606
> [513890.005279] Call Trace:
> [513890.005282] [<ffffffff8163ce29>] schedule+0x29/0x70
> [513890.005284] [<ffffffff8163aa99>] schedule_timeout+0x239/0x2d0
> [513890.005292] [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
> [513890.005296] [<ffffffff8163c448>] io_schedule+0x18/0x20
> [513890.005298] [<ffffffff812c6268>] get_request+0x218/0x780
> [513890.005303] [<ffffffff812c8526>] blk_queue_bio+0xc6/0x3a0
> [513890.005309] [<ffffffffa0002c59>] ? dm_make_request+0x119/0x170 [dm_mod]
> [513890.005311] [<ffffffff812c3892>] generic_make_request+0xe2/0x130
> [513890.005313] [<ffffffff812c3957>] submit_bio+0x77/0x1c0
> [513890.005318] [<ffffffff811bf87e>] __swap_writepage+0x1be/0x260
> [513890.005337] [<ffffffff811bf959>] swap_writepage+0x39/0x80
> [513890.005340] [<ffffffff8118f68d>] shrink_page_list+0x4ad/0xa80
> [513890.005343] [<ffffffff811902bb>] shrink_inactive_list+0x1fb/0x6c0
> [513890.005345] [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
> [513890.005348] [<ffffffff811914af>] shrink_zone+0xef/0x2d0
> [513890.005350] [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
> [513890.005353] [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
> [513890.005355] [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
> [513890.005358] [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
> [513890.005360] [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
> [513890.005362] [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
> [513890.005365] [<ffffffff811d8c69>] __kmalloc+0x259/0x270
> [513890.005367] [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
> [513890.005369] [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
> [513890.005371] [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
> [513890.005376] [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
> [513890.005378] [<ffffffff8107a641>] do_fork+0xe1/0x320
> [513890.005380] [<ffffffff8107a906>] SyS_clone+0x16/0x20
> [513890.005382] [<ffffffff81648299>] stub_clone+0x69/0x90
>
> We observed that sometimes kswapd cannot handle this which
> causes many direct reclaim attempts which in turn:
>
> 1. Increases iowait time due to congestion_wait
> 2. Increases number of block reqs per second due to
> page swapping and writeback
> 3. May induce OOMs
>
> So it's better DO NOT try that hard to allocate contiguous
> area, and fallback to vmalloc() as soon as possible.
>
> Signed-off-by: Anatoly Stepanov <astepanov at cloudlinux.com>
> ---
> fs/file.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/file.c b/fs/file.c
> index 366d9bb..3f65ba0 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size)
> * vmalloc() if the allocation size will be considered "large" by the VM.
> */
> if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> - void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
> + void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
> if (data != NULL)
> return data;
> }
>
More information about the Devel
mailing list