[CRIU] GSoC: Memory mapping collection

Pavel Emelianov xemul at virtuozzo.com
Fri Jun 7 18:10:44 MSK 2019


On 6/6/19 10:31 AM, abhishek dubey wrote:
> Please find Inline reply.
> On 05/06/19 6:19 PM, Pavel Emelianov wrote:
>> On 6/4/19 7:50 PM, Abhishek Dubey wrote:
>>> Hi,
>>>
>>> On running "zdtm.py run -a", testcases of zdtm/static directory are checked and that of zdtm/transition are skipped. Do all testcases of transition directory need manual run?
>> Yes.
>>
>>> What is specific about transition testcases?
>> They take time to accomplish, so the default run is limited.
> So checking default is enough for patch submission?

Yes, but we still ask submitters to test their patches with as many tests as possible :)

>>> For the first step of fetching VMAs between freeze and unfreeze of pre-dump:
>>> - To maintain correspondence between PID and collected VMA, added new entry in struct pstree_item
>>> - Moved out unfreeze code from cr_pre_dump_finish()
>>> - Moved out collect_mappings() from pre_dump_one_task
>>>
>>> I have tested these changes with env00-basic example and zdtm.py test-suite. All test-cases under zdtm/static directory are Passing. Remaining transition test cases are skipped.
>>>
>>> Share your thoughts on this approach. I am attaching patch-file for reference.
>>> Hi,
>>>
>>> On running "zdtm.py run -a", testcases of zdtm/static directory are checked and that of zdtm/transition are skipped. Do all testcases of transition directory need manual run?
>>> What is specific about transition testcases?
>>>
>>> For the first step of fetching VMAs between freeze and unfreeze of pre-dump:
>>> - To maintain correspondence between PID and collected VMA, added new entry in struct pstree_item
>>> - Moved out unfreeze code from cr_pre_dump_finish()
>>> - Moved out collect_mappings() from pre_dump_one_task
>> Please, describe why you need this. Current code does "fetching vmas between freeze and unfreeze", 
>> what's wrong with it so you need this change?
> 
> In current approach TIME_FROZEN encapsulates TIME_MEMDUMP, like:
> 
>    *start(TIME_FROZEN)*
> 
> *        collect mappings*
> 
> *        start(TIME_MEMDUMP)*
> 
> *                step0: page map cache init*
> 
> *                step1: generate pagemap      **
> *
> 
> *                step2: grab pages into pipes*
> 
> *         end(TIME_MEMDUMP)*
> 
> *         unfreeze pstree
> *
> 
> *    end(TIME_FROZEN)*

Yes

> and the objective I perceived, is something like:
> 
>     *start(TIME_FROZEN)           ----}    ***
> 
> *            collect mappings            }==> Current patch 
> *
> 
> *            unfreeze pstree             }
> *
> 
> *    **end(TIME_FROZEN)            ----}*
> 
> *    start(TIME_MEMDUMP) *
> 
> *            step0: page map cache init*
> 
> **
> 
> *            step1: generate pagemap      **
> *
> 
> **
> 
> *            step2: grab pages into pipes*
> 
> **
> 
> *    end(TIME_MEMDUMP)*

Ah, I see. You need to move the dumping pages out of tasks-are-frozen state. This single patch
is incorrect in this case, as once tasks will start doing something with their memory the current
memory predumping code will fail and exit, while the code we're looking for would handle memory
dump errors carefully without stopping the whole action.

> *
> *
> 
> With reference to the comment "fetching vmas between freeze and unfreeze", I think this is what is needed:
> 
> *    start(TIME_FROZEN)*
> 
> *        collect mappings*
> 
> *        start(TIME_MEMDUMP)*
> 
> *                step0: page map cache init*
> 
> *                step1: generate pagemap      **
> *
> 
> *         end(TIME_MEMDUMP)*
> 
> *         unfreeze pstree
> *
> 
> *    end(TIME_FROZEN)*
> 
> 
> **    step2: grab pages into user buffer using system_vm_readv and resolve races
> **
> 
>  Please confirm if above design is right?

Yes, with the "grab pages into user buffer" part the patch SET would be correct.

>  So, I don't have to worry about unfreezing after collection? Let it happen the way it is now.

Yes, but once you unfroze the tree you must handle memory dump errors in a grace manner (by
updating the pagemap.img entries respectively), not the way it is done now (the whole criu just
exits with an error message).

> My immediate task will be getting pipes out of picture. I will start with "commit bb98a82" as reference, as
> 
> suggested by Radostin in other thread.
> 
>>> I have tested these changes with env00-basic example and zdtm.py test-suite. All test-cases under zdtm/static directory are Passing. Remaining transition test cases are skipped.
>>>
>>> Share your thoughts on this approach. I am attaching patch-file for reference.
>>>
>>> -Abhishek
>>>
>>> 0001-freezing-target-task-to-collect-memory-mappings-foll.patch
>>>
>>> >From 5b131559ce20ebbdb8a04be2cced321b55d864f4 Mon Sep 17 00:00:00 2001
>> Please mind the https://criu.org/How_to_submit_patches article. Patch should go inline, not attached,
>> and properly formatted. Also comments are inline.
> Sure, I will take care of same when sending upstream patches. This patch was just to discuss approach and not meant in any way to be upstream.

Cool :)

>>> From: abhidubey4178 <abhishekdubey4178940 at gmail.com>
>>> Date: Tue, 4 Jun 2019 19:59:21 +0530
>>> Subject: [PATCH] freezing target task to collect memory mappings, followed by
>>>  unfreeze
>>>
>>> ---
>>>  criu/cr-dump.c        | 103 +++++++++++++++++++++++++++++++-----------
>>>  criu/include/pstree.h |   1 +
>>>  2 files changed, 78 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/criu/cr-dump.c b/criu/cr-dump.c
>>> index 7f2e5edf..40bc7594 100644
>>> --- a/criu/cr-dump.c
>>> +++ b/criu/cr-dump.c
>>> @@ -1154,20 +1154,19 @@ static int pre_dump_one_task(struct pstree_item *item, InventoryEntry *parent_ie
>>>  	if (item->pid->state == TASK_DEAD)
>>>  		return 0;
>>>  
>>> -	ret = collect_mappings(pid, &vmas, NULL);
>> Up above there have been left the now unused vmas variable.
> ok.
>>> -	if (ret) {
>>> +	if (!item->vmas) {
>>>  		pr_err("Collect mappings (pid: %d) failed with %d\n", pid, ret);
>>>  		goto err;
>>>  	}
>>>  
>>>  	ret = -1;
>>> -	parasite_ctl = parasite_infect_seized(pid, item, &vmas);
>>> +	parasite_ctl = parasite_infect_seized(pid, item, item->vmas);
>>>  	if (!parasite_ctl) {
>>>  		pr_err("Can't infect (pid: %d) with parasite\n", pid);
>>>  		goto err_free;
>>>  	}
>>>  
>>> -	ret = parasite_fixup_vdso(parasite_ctl, pid, &vmas);
>>> +	ret = parasite_fixup_vdso(parasite_ctl, pid, item->vmas);
>>>  	if (ret) {
>>>  		pr_err("Can't fixup vdso VMAs (pid: %d)\n", pid);
>>>  		goto err_cure;
>>> @@ -1192,14 +1191,14 @@ static int pre_dump_one_task(struct pstree_item *item, InventoryEntry *parent_ie
>>>  	mdc.stat = NULL;
>>>  	mdc.parent_ie = parent_ie;
>>>  
>>> -	ret = parasite_dump_pages_seized(item, &vmas, &mdc, parasite_ctl);
>>> +	ret = parasite_dump_pages_seized(item, item->vmas, &mdc, parasite_ctl);
>>>  	if (ret)
>>>  		goto err_cure;
>>>  
>>>  	if (compel_cure_remote(parasite_ctl))
>>>  		pr_err("Can't cure (pid: %d) from parasite\n", pid);
>>>  err_free:
>>> -	free_mappings(&vmas);
>>> +	free_mappings(item->vmas);
>>>  err:
>>>  	return ret;
>>>  
>>> @@ -1471,26 +1470,9 @@ static int setup_alarm_handler()
>>>  
>>>  static int cr_pre_dump_finish(int status)
>>>  {
>>> -	InventoryEntry he = INVENTORY_ENTRY__INIT;
>>>  	struct pstree_item *item;
>>>  	int ret;
>>>  
>>> -	/*
>>> -	 * Restore registers for tasks only. The threads have not been
>>> -	 * infected. Therefore, the thread register sets have not been changed.
>>> -	 */
>>> -	ret = arch_set_thread_regs(root_item, false);
>>> -	if (ret)
>>> -		goto err;
>>> -
>>> -	ret = inventory_save_uptime(&he);
>>> -	if (ret)
>>> -		goto err;
>>> -
>>> -	pstree_switch_state(root_item, TASK_ALIVE);
>>> -
>>> -	timing_stop(TIME_FROZEN);
>>> -
>>>  	if (status < 0) {
>>>  		ret = status;
>>>  		goto err;
>>> @@ -1540,9 +1522,6 @@ err:
>>>  	if (bfd_flush_images())
>>>  		ret = -1;
>>>  
>>> -	if (write_img_inventory(&he))
>>> -		ret = -1;
>>> -
>> You write inventory too early, I'm not 100% sure this is correct.
>>
>>>  	if (ret)
>>>  		pr_err("Pre-dumping FAILED.\n");
>>>  	else {
>>> @@ -1552,6 +1531,68 @@ err:
>>>  	return ret;
>>>  }
>>>  
>>> +
>>> +static int pre_dump_collect_vmas(struct pstree_item *item)
>>> +{
>>> +       pid_t pid = item->pid->real;
>>> +       struct vm_area_list *vmas;
>>> +       int ret = -1;
>>> +
>>> +       vmas = xmalloc(sizeof(struct vm_area_list));
>> This memory is never free-d.
> Got it.
>>> +       if(!vmas)
>>> +            return -1;
>>> +       INIT_LIST_HEAD(&vmas->h);
>>> +       vmas->nr = 0;
>>> +
>>> +       pr_info("========================================\n");
>>> +       pr_info("Collecting VMAs of (pid: %d)\n", pid);
>>> +       pr_info("========================================\n");
>>> +
>>> +       if (item->pid->state == TASK_STOPPED) {
>>> +               pr_warn("Stopped tasks are not supported\n");
>>> +                    return 0;
>>> +            }
>> Formatting.
>>
>>> +
>>> +       if (item->pid->state == TASK_DEAD)
>>> +                    return 0;
>> Vmas allocation can happen here.
> Got it.
>>> +
>>> +       ret = collect_mappings(pid, vmas, NULL);
>>> +       if (ret) {
>>> +               pr_err("Collect mappings (pid: %d) failed with %d\n", pid, ret);
>>> +               return -1;
>>> +            }
>>> +
>>> +       item->vmas = vmas;
>>> +       return 0;
>>> +}
>>> +
>>> +
>>> +static int unfreeze_pstree()
>>> +{
>>> +	InventoryEntry he = INVENTORY_ENTRY__INIT;
>>> +	int ret;
>>> +
>>> +	/*
>>> +	 * Restore registers for tasks only. The threads have not been
>>> +	 * infected. Therefore, the thread register sets have not been changed.
>>> +	 */
>>> +	ret = arch_set_thread_regs(root_item, false);
>>> +	if (ret)
>>> +		return -1;
>>> +
>>> +	ret = inventory_save_uptime(&he);
>>> +	if (ret)
>>> +		return -1;
>>> +
>>> +	if (write_img_inventory(&he))
>>> +		return -1;
>>> +
>>> +	pstree_switch_state(root_item, TASK_ALIVE);
>>> +	timing_stop(TIME_FROZEN);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>>  int cr_pre_dump_tasks(pid_t pid)
>>>  {
>>>  	InventoryEntry *parent_ie = NULL;
>>> @@ -1619,6 +1660,16 @@ int cr_pre_dump_tasks(pid_t pid)
>>>  	/* Errors handled later in detect_pid_reuse */
>>>  	parent_ie = get_parent_inventory();
>>>  
>>> +	/* Collecting VMAs of all processes in pstree */
>>> +	for_each_pstree_item(item)
>>> +		if (pre_dump_collect_vmas(item))
>>> +			goto err;
>>> +
>>> +	/* Unfreeze pstree */
>> This is pointless comment as it 100% repeats the subsequent function name :)
> Won't repeat!
>>> +	ret = unfreeze_pstree();
>>> +	if(ret)
>>> +		goto err;
>>> +
>>>  	for_each_pstree_item(item)
>>>  		if (pre_dump_one_task(item, parent_ie))
>>>  			goto err;
>>> diff --git a/criu/include/pstree.h b/criu/include/pstree.h
>>> index 7303c1fe..69562608 100644
>>> --- a/criu/include/pstree.h
>>> +++ b/criu/include/pstree.h
>>> @@ -30,6 +30,7 @@ struct pstree_item {
>>>  		futex_t		task_st;
>>>  		unsigned long	task_st_le_bits;
>>>  	};
>>> +	struct vm_area_list *vmas;
>> This is dump-only field, it should be placed on dmp_info instead.
> Sure. 
>>>  };
>>>  
>>>  static inline pid_t vpid(const struct pstree_item *i)
>>> -- 
>>> 2.17.1
>>>
>> -- Pavel

-- Pavel



More information about the CRIU mailing list