[Devel] [PATCH rh7] ploop: use GFP_NOIO in ploop_make_request
Kir Kolyshkin
kir at openvz.org
Mon Aug 31 19:40:24 PDT 2015
On 08/31/2015 06:59 AM, Konstantin Khorenko wrote:
> Maxim, please review.
Maxim,
Perhaps the following analysis by Alex Kompel will be helpful for you
while reviewing this patch:
>
> I ran into the same issue. I think the problem arises when ploop code
> calls bio_alloc from ploop_make_request with allocation flags that
> allow I/O. The execution path may end up in blkdev_issue_discard under
> memory pressure. blkdev_issue_discard issues more I/O requests and
> then waits for completion. These requests never complete because there
> is already I/O pending for current task resulting in deadlock (see
> generic_make_request).
>
> generic_make_request => append to current->bio_list =>
> ploop_make_request => bio_alloc(GFP_NOFS, ...) => ...
> try_to_free_pages ... => blkdev_issue_discard => submit_bio =>
> generic_make_request => append to current->bio_list => wait for
> completion => deadlock
>
> Changing GFP_NOFS to GFP_NOIO in preallocate_bio appears to solve the
> problem:
The above is a copy-paste from from https://bugs.openvz.org/browse/OVZ-6293
>
> Do we need the same in PCS6?
>
> --
> Best regards,
>
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
>
> On 08/17/2015 04:30 PM, Vladimir Davydov wrote:
>> Currently, we use GFP_NOFS, which may result in a dead lock as follows:
>>
>> filemap_fault
>> do_mpage_readpage
>> submit_bio
>> generic_make_request initializes current->bio_list
>> calls make_request_fn
>> ploop_make_request
>> bio_alloc(GFP_NOFS)
>> kmem_cache_alloc
>> memcg_charge_kmem
>> try_to_free_mem_cgroup_pages
>> swap_writepage
>> generic_make_request puts bio on current->bio_list
>> try_to-free_mem_cgroup_pages
>> wait_on_page_writeback
>>
>> The wait_on_page_writeback will never complete then, because the
>> corresponding bio is on current->bio_list and for it to get to the queue
>> we must return from ploop_make_request first.
>>
>> The stack trace of a hung task:
>>
>> [<ffffffff8115ae2e>] sleep_on_page+0xe/0x20
>> [<ffffffff8115abb6>] wait_on_page_bit+0x86/0xb0
>> [<ffffffff8116f4b2>] shrink_page_list+0x6e2/0xaf0
>> [<ffffffff8116ff2b>] shrink_inactive_list+0x1cb/0x610
>> [<ffffffff81170ab5>] shrink_lruvec+0x395/0x790
>> [<ffffffff81171031>] shrink_zone+0x181/0x350
>> [<ffffffff811715a0>] do_try_to_free_pages+0x170/0x530
>> [<ffffffff81171b76>] try_to_free_mem_cgroup_pages+0xb6/0x140
>> [<ffffffff811c6b5e>] __mem_cgroup_try_charge+0x1de/0xd70
>> [<ffffffff811c8c4b>] memcg_charge_kmem+0x9b/0x100
>> [<ffffffff811c8e1b>] __memcg_charge_slab+0x3b/0x90
>> [<ffffffff811b3664>] new_slab+0x264/0x3f0
>> [<ffffffff815e97c6>] __slab_alloc+0x315/0x48f
>> [<ffffffff811b49ac>] kmem_cache_alloc+0x1cc/0x210
>> [<ffffffff8115e4b5>] mempool_alloc_slab+0x15/0x20
>> [<ffffffff8115e5f9>] mempool_alloc+0x69/0x170
>> [<ffffffff8120bd42>] bvec_alloc+0x92/0x120
>> [<ffffffff8120bfb8>] bio_alloc_bioset+0x1e8/0x2e0
>> [<ffffffffa0072246>] ploop_make_request+0x2a6/0xac0 [ploop]
>> [<ffffffff81297172>] generic_make_request+0xe2/0x130
>> [<ffffffff81297237>] submit_bio+0x77/0x1c0
>> [<ffffffff8121341f>] do_mpage_readpage+0x37f/0x6e0
>> [<ffffffff8121386b>] mpage_readpages+0xeb/0x160
>> [<ffffffffa01a051c>] ext4_readpages+0x3c/0x40 [ext4]
>> [<ffffffff811683c0>] __do_page_cache_readahead+0x1e0/0x260
>> [<ffffffff81168b11>] ra_submit+0x21/0x30
>> [<ffffffff8115dea1>] filemap_fault+0x321/0x4b0
>> [<ffffffff811864ca>] __do_fault+0x8a/0x560
>> [<ffffffff8118b2d0>] handle_mm_fault+0x3d0/0xd80
>> [<ffffffff815f73ee>] __do_page_fault+0x15e/0x530
>> [<ffffffff815f77da>] do_page_fault+0x1a/0x70
>> [<ffffffff815f3a08>] page_fault+0x28/0x30
>>
>> https://jira.sw.ru/browse/PSBM-38842
>>
>> Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
>> ---
>> drivers/block/ploop/dev.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
>> index 30eb8a7551e5..f37df4dacf8c 100644
>> --- a/drivers/block/ploop/dev.c
>> +++ b/drivers/block/ploop/dev.c
>> @@ -717,7 +717,7 @@ preallocate_bio(struct bio * orig_bio, struct
>> ploop_device * plo)
>> }
>>
>> if (nbio == NULL)
>> - nbio = bio_alloc(GFP_NOFS, max(orig_bio->bi_max_vecs,
>> block_vecs(plo)));
>> + nbio = bio_alloc(GFP_NOIO, max(orig_bio->bi_max_vecs,
>> block_vecs(plo)));
>> return nbio;
>> }
>>
>> @@ -852,7 +852,7 @@ static void ploop_make_request(struct
>> request_queue *q, struct bio *bio)
>>
>> if (!current->io_context) {
>> struct io_context *ioc;
>> - ioc = get_task_io_context(current, GFP_NOFS, NUMA_NO_NODE);
>> + ioc = get_task_io_context(current, GFP_NOIO, NUMA_NO_NODE);
>> if (ioc)
>> put_io_context(ioc);
>> }
>>
> _______________________________________________
> Devel mailing list
> Devel at openvz.org
> https://lists.openvz.org/mailman/listinfo/devel
More information about the Devel
mailing list