[CRIU] [PATCH] restore: handle exit code of the unlock network script

Pavel Emelyanov xemul at parallels.com
Tue Mar 25 06:57:19 PDT 2014


On 03/25/2014 05:52 PM, Andrey Wagin wrote:
> 2014-03-25 17:45 GMT+04:00 Pavel Emelyanov <xemul at parallels.com>:
>> On 03/25/2014 05:41 PM, Andrew Vagin wrote:
>>> On Tue, Mar 25, 2014 at 05:23:07PM +0400, Pavel Emelyanov wrote:
>>>> On 03/25/2014 05:13 PM, Andrew Vagin wrote:
>>>>> On Tue, Mar 25, 2014 at 05:06:53PM +0400, Pavel Emelyanov wrote:
>>>>>> On 03/25/2014 12:41 PM, Andrew Vagin wrote:
>>>>>>> On Tue, Mar 25, 2014 at 02:27:33AM +0400, Pavel Emelyanov wrote:
>>>>>>>> On 03/24/2014 03:07 PM, Andrey Vagin wrote:
>>>>>>>>> When we are migrating processes from one host to another host,
>>>>>>>>> we need to know the moment, when processes can be killed on the source
>>>>>>>>> host.
>>>>>>>>> If a migration script is killed (segv, exception, etc), the process tree
>>>>>>>>> must not live on both nodes and we need to reduce the chance of
>>>>>>>>> killing processes.
>>>>>>>>
>>>>>>>> I didn't quite get why the existing scheme used by p.haul is flawed.
>>>>>>>> Can you draw a two-sided diagram of source-destination interaction
>>>>>>>> and show where the problem is and how you propose to solve it?
>>>>>>>
>>>>>>> source                            destination
>>>>>>> criu dump
>>>>>>> post-dump
>>>>>>>                           criu restore
>>>>>>>                           network unlock
>>>>>>>                           post-restore
>>>>>>>                           kill p.haul before receiving cr_rpc.RESTORE
>>>>>>> resume
>>>>>>>
>>>>>>> In this case both hosts will have alive process trees...
>>>>>>>
>>>>>>> And I want to move post-restore before network_unlock, because we can't
>>>>>>> fail after unlocking network.
>>>>>>
>>>>>> OK, but this patch does something different.
>>>>>
>>>>> No, it doesn't. It doesn't move post-restore, it will be done in another
>>>>> patch. But network_unlock is a line after which the tree can't be
>>>>> resumed on the source host.
>>>>>
>>>>
>>>> OK, so this is preparatory.
>>>> Show me the resulting 2-sided diagram you want to achieve
>>>
>>> source                                destination
>>> criu dump
>>> post-dump
>>>                               criu restore
>>>                               network unlock
>>>               <--- kill processes
>>> exit from post_dump
>>>               [    window     ]
>>>                               exit from network_unlock
>>>                               resume the process tree
>>>
>>> In this scheme you can kill p.haul in any moment, but the process tree
>>> will be resumed only on one side. And we have a small window, when the
>>> tree will not be resumed at all.
>>>
>>>> or send the full set.
>>>
>>> I want to understand that I have missed nothing before doing anything
>>> else.
>>
>> You told that you want to move post-restore before netowork-unlock,
>> but it's not in the diagram above. Probably this.
> 
> You are trying to troll me. If post_dump will be not interesting for
> us, if it will be called before network_unlock. post_dump was added to
> its place by mistake.

Then rephrase what your intention is w/o saying "post-restore script".
Other than this -- I think that making network-unlock script notify
the source about "restore is complete" is bad idea.

>         /*
>          * -------------------------------------------------------------
>          * Below this line nothing can fail, because network is unlocked
>          */
> 
>         ret = restore_switch_stage(CR_STATE_RESTORE_CREDS);
>         BUG_ON(ret);
> 
>         timing_stop(TIME_RESTORE);
> 
>         ret = run_scripts("post-restore");
> 
> 
>>
>>> Thanks.
>>> .
>>>
>>
>>
> .
> 




More information about the CRIU mailing list