[Devel] [PATCH rh7] fs: add __GFP_NORETRY in alloc_fdmem

Andrey Ryabinin aryabinin at virtuozzo.com
Thu Mar 16 08:08:50 PDT 2017



On 03/16/2017 06:03 PM, Konstantin Khorenko wrote:
> Andrey, please take a look.
> 
> All other patches from Anatoly are applied already, except this one.
> Worth to apply this one as well?
> 

Yep,
	Acked-by: Andrey Ryabinin <aryabinin at virtuozzo.com>


> -- 
> Best regards,
> 
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> 
> On 10/21/2016 02:42 PM, Anatoly Stepanov wrote:
>> This is a backport of upstream (vanilla) commit:
>> commit 96c7a2ff21501691587e1ae969b83cbec8b78e08
>>
>> Under certain conditions there might be a lot of
>> alloc_fdmem() invocations with order <= PAGE_ALLOC_COSTLY_ORDER.
>>
>> For example: httpd which is doing a lot of fork() calls.
>>
>> Real-life examples from our customers:
>>
>> [532506.773243] httpd           D ffff8803f5fecc20     0 939874   6606
>> [532506.773257] Call Trace:
>> [532506.773261]  [<ffffffff8163ce29>] schedule+0x29/0x70
>> [532506.773264]  [<ffffffff8163a9d5>] schedule_timeout+0x175/0x2d0
>> [532506.773272]  [<ffffffff8108cc90>] ? internal_add_timer+0x70/0x70
>> [532506.773276]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
>> [532506.773280]  [<ffffffff8119be85>] wait_iff_congested+0x135/0x150
>> [532506.773284]  [<ffffffff810a86e0>] ? wake_up_atomic_t+0x30/0x30
>> [532506.773288]  [<ffffffff8119071f>] shrink_inactive_list+0x65f/0x6c0
>> [532506.773292]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
>> [532506.773296]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
>> [532506.773300]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
>> [532506.773310]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
>> [532506.773315]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
>> [532506.773320]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
>> [532506.773324]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
>> [532506.773327]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
>> [532506.773332]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
>> [532506.773337]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
>> [532506.773341]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
>> [532506.773344]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
>> [532506.773354]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
>> [532506.773358]  [<ffffffff8107a641>] do_fork+0xe1/0x320
>> [532506.773370]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
>> [532506.773376]  [<ffffffff81648299>] stub_clone+0x69/0x90
>> [532506.773380]  [<ffffffff81647f49>] ? system_call_fastpath+0x16/0x1b
>>
>> [513890.005271] httpd           D ffff880425db7230     0 811718   6606
>> [513890.005279] Call Trace:
>> [513890.005282]  [<ffffffff8163ce29>] schedule+0x29/0x70
>> [513890.005284]  [<ffffffff8163aa99>] schedule_timeout+0x239/0x2d0
>> [513890.005292]  [<ffffffff8163c3ae>] io_schedule_timeout+0xae/0x130
>> [513890.005296]  [<ffffffff8163c448>] io_schedule+0x18/0x20
>> [513890.005298]  [<ffffffff812c6268>] get_request+0x218/0x780
>> [513890.005303]  [<ffffffff812c8526>] blk_queue_bio+0xc6/0x3a0
>> [513890.005309]  [<ffffffffa0002c59>] ? dm_make_request+0x119/0x170 [dm_mod]
>> [513890.005311]  [<ffffffff812c3892>] generic_make_request+0xe2/0x130
>> [513890.005313]  [<ffffffff812c3957>] submit_bio+0x77/0x1c0
>> [513890.005318]  [<ffffffff811bf87e>] __swap_writepage+0x1be/0x260
>> [513890.005337]  [<ffffffff811bf959>] swap_writepage+0x39/0x80
>> [513890.005340]  [<ffffffff8118f68d>] shrink_page_list+0x4ad/0xa80
>> [513890.005343]  [<ffffffff811902bb>] shrink_inactive_list+0x1fb/0x6c0
>> [513890.005345]  [<ffffffff81190f55>] shrink_lruvec+0x395/0x800
>> [513890.005348]  [<ffffffff811914af>] shrink_zone+0xef/0x2d0
>> [513890.005350]  [<ffffffff81191a30>] do_try_to_free_pages+0x170/0x530
>> [513890.005353]  [<ffffffff81191ec5>] try_to_free_pages+0xd5/0x160
>> [513890.005355]  [<ffffffff811850ab>] __alloc_pages_nodemask+0x8ab/0xc10
>> [513890.005358]  [<ffffffff811cb2f9>] alloc_pages_current+0xa9/0x170
>> [513890.005360]  [<ffffffff8119f8f8>] kmalloc_order+0x18/0x50
>> [513890.005362]  [<ffffffff8119f956>] kmalloc_order_trace+0x26/0xa0
>> [513890.005365]  [<ffffffff811d8c69>] __kmalloc+0x259/0x270
>> [513890.005367]  [<ffffffff812184d0>] alloc_fdmem+0x20/0x50
>> [513890.005369]  [<ffffffff812185ac>] alloc_fdtable+0x6c/0xe0
>> [513890.005371]  [<ffffffff81218b69>] dup_fd+0x1f9/0x2d0
>> [513890.005376]  [<ffffffff810797cf>] copy_process.part.30+0x87f/0x1510
>> [513890.005378]  [<ffffffff8107a641>] do_fork+0xe1/0x320
>> [513890.005380]  [<ffffffff8107a906>] SyS_clone+0x16/0x20
>> [513890.005382]  [<ffffffff81648299>] stub_clone+0x69/0x90
>>
>> We observed that sometimes kswapd cannot handle this which
>> causes many direct reclaim attempts which in turn:
>>
>> 1. Increases iowait time due to congestion_wait
>> 2. Increases number of block reqs per second due to
>> page swapping and writeback
>> 3. May induce OOMs
>>
>> So it's better DO NOT try that hard to allocate contiguous
>> area, and fallback to vmalloc() as soon as possible.
>>
>> Signed-off-by: Anatoly Stepanov <astepanov at cloudlinux.com>
>> ---
>>  fs/file.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/file.c b/fs/file.c
>> index 366d9bb..3f65ba0 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -36,7 +36,7 @@ static void *alloc_fdmem(size_t size)
>>       * vmalloc() if the allocation size will be considered "large" by the VM.
>>       */
>>      if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>> -        void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN);
>> +        void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY);
>>          if (data != NULL)
>>              return data;
>>      }
>>


More information about the Devel mailing list