[Devel] [PATCH rh7] fs: add __GFP_NORETRY in alloc_fdmem
Andrey Ryabinin
aryabinin at virtuozzo.com
Thu Mar 16 08:08:50 PDT 2017
On 03/16/2017 06:03 PM, Konstantin Khorenko wrote:
> Andrey, please take a look.
>
> All other patches from Anatoly are applied already, except this one.
> Worth to apply this one as well?
>
Yep,
Acked-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
> --
> Best regards,
>
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
>
> On 10/21/2016 02:42 PM, Anatoly Stepanov wrote:
>> This is a backport of upstream (vanilla) commit:
>> commit 96c7a2ff21501691587e1ae969b83cbec8b78e08
>>
>> Under certain conditions there might be a lot of
>> alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.
>>
>> For example: httpd which is doing a lot of fork() calls.
>>
>> Real-life examples from our customers:
>>
>> [532506.773243] httpd D ffff8803f5fecc20 0 939874 6606
>> [532506.773257] Call Trace:
>> [532506.773261] [<ffffffff8163ce29>] schedule+0x29/0x70
>> [532506.773264] [<ffffffff8163a9d5>] schedule_timeout+0x175/0x2d0
>> [532506.773272] [<ffffffff8108cc90>] ? internal_add_timer+0x70/0x70
>> [532506.773276] [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
>> [532506.773280] [<ffffffff8119be85>] wait_iff_congested+0x135/0x150
>> [532506.773284] [<ffffffff810a86e0>] ? wake_up_atomic_t+0x30/0x30
>> [532506.773288] [<ffffffff8119071f>] shrink_inactive_list+0x65f/0x6c0
>> [532506.773292] [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
>> [532506.773296] [<ffffffff811914af>] shrink_zone+0xef/0x2d0
>> [532506.773300] [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
>> [532506.773310] [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
>> [532506.773315] [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
>> [532506.773320] [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
>> [532506.773324] [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
>> [532506.773327] [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
>> [532506.773332] [<ffffffff811d8c69>] __kmalloc+0x259/0x270
>> [532506.773337] [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
>> [532506.773341] [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
>> [532506.773344] [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
>> [532506.773354] [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
>> [532506.773358] [<ffffffff8107a641>] do_fork+0xe1/0x320
>> [532506.773370] [<ffffffff8107a906>] SyS_clone+0x16/0x20
>> [532506.773376] [<ffffffff81648299>] stub_clone+0x69/0x90
>> [532506.773380] [<ffffffff81647f49>] ? system_call_fastpath+0x16/0x1b
>>
>> [513890.005271] httpd D ffff880425db7230 0 811718 6606
>> [513890.005279] Call Trace:
>> [513890.005282] [<ffffffff8163ce29>] schedule+0x29/0x70
>> [513890.005284] [<ffffffff8163aa99>] schedule_timeout+0x239/0x2d0
>> [513890.005292] [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
>> [513890.005296] [<ffffffff8163c448>] io_schedule+0x18/0x20
>> [513890.005298] [<ffffffff812c6268>] get_request+0x218/0x780
>> [513890.005303] [<ffffffff812c8526>] blk_queue_bio+0xc6/0x3a0
>> [513890.005309] [<ffffffffa0002c59>] ? dm_make_request+0x119/0x170 [dm_mod]
>> [513890.005311] [<ffffffff812c3892>] generic_make_request+0xe2/0x130
>> [513890.005313] [<ffffffff812c3957>] submit_bio+0x77/0x1c0
>> [513890.005318] [<ffffffff811bf87e>] __swap_writepage+0x1be/0x260
>> [513890.005337] [<ffffffff811bf959>] swap_writepage+0x39/0x80
>> [513890.005340] [<ffffffff8118f68d>] shrink_page_list+0x4ad/0xa80
>> [513890.005343] [<ffffffff811902bb>] shrink_inactive_list+0x1fb/0x6c0
>> [513890.005345] [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
>> [513890.005348] [<ffffffff811914af>] shrink_zone+0xef/0x2d0
>> [513890.005350] [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
>> [513890.005353] [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
>> [513890.005355] [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
>> [513890.005358] [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
>> [513890.005360] [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
>> [513890.005362] [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
>> [513890.005365] [<ffffffff811d8c69>] __kmalloc+0x259/0x270
>> [513890.005367] [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
>> [513890.005369] [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
>> [513890.005371] [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
>> [513890.005376] [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
>> [513890.005378] [<ffffffff8107a641>] do_fork+0xe1/0x320
>> [513890.005380] [<ffffffff8107a906>] SyS_clone+0x16/0x20
>> [513890.005382] [<ffffffff81648299>] stub_clone+0x69/0x90
>>
>> We observed that sometimes kswapd cannot handle this which
>> causes many direct reclaim attempts which in turn:
>>
>> 1. Increases iowait time due to congestion_wait
>> 2. Increases number of block reqs per second due to
>> page swapping and writeback
>> 3. May induce OOMs
>>
>> So it's better DO NOT try that hard to allocate contiguous
>> area, and fallback to vmalloc() as soon as possible.
>>
>> Signed-off-by: Anatoly Stepanov <astepanov at cloudlinux.com>
>> ---
>> fs/file.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/file.c b/fs/file.c
>> index 366d9bb..3f65ba0 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size)
>> * vmalloc() if the allocation size will be considered "large" by the VM.
>> */
>> if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>> - void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
>> + void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>> if (data != NULL)
>> return data;
>> }
>>
More information about the Devel
mailing list