[CRIU] [PATCH] restore: handle exit code of the unlock network script

Tue Mar 25 07:17:44 PDT 2014

On Tue, Mar 25, 2014 at 05:57:19PM +0400, Pavel Emelyanov wrote:
> On 03/25/2014 05:52 PM, Andrey Wagin wrote:
> > 2014-03-25 17:45 GMT+04:00 Pavel Emelyanov <xemul at parallels.com>:
> >> On 03/25/2014 05:41 PM, Andrew Vagin wrote:
> >>> On Tue, Mar 25, 2014 at 05:23:07PM +0400, Pavel Emelyanov wrote:
> >>>> On 03/25/2014 05:13 PM, Andrew Vagin wrote:
> >>>>> On Tue, Mar 25, 2014 at 05:06:53PM +0400, Pavel Emelyanov wrote:
> >>>>>> On 03/25/2014 12:41 PM, Andrew Vagin wrote:
> >>>>>>> On Tue, Mar 25, 2014 at 02:27:33AM +0400, Pavel Emelyanov wrote:
> >>>>>>>> On 03/24/2014 03:07 PM, Andrey Vagin wrote:
> >>>>>>>>> When we are migrating processes from one host to another host,
> >>>>>>>>> we need to know the moment, when processes can be killed on the source
> >>>>>>>>> host.
> >>>>>>>>> If a migration script is killed (segv, exception, etc), the process tree
> >>>>>>>>> must not live on both nodes and we need to reduce the chance of
> >>>>>>>>> killing processes.
> >>>>>>>>
> >>>>>>>> I didn't quite get why the existing scheme used by p.haul is flawed.
> >>>>>>>> Can you draw a two-sided diagram of source-destination interaction
> >>>>>>>> and show where the problem is and how you propose to solve it?
> >>>>>>>
> >>>>>>> source                            destination
> >>>>>>> criu dump
> >>>>>>> post-dump
> >>>>>>>                           criu restore
> >>>>>>>                           network unlock
> >>>>>>>                           post-restore
> >>>>>>>                           kill p.haul before receiving cr_rpc.RESTORE
> >>>>>>> resume
> >>>>>>>
> >>>>>>> In this case both hosts will have alive process trees...
> >>>>>>>
> >>>>>>> And I want to move post-restore before network_unlock, because we can't
> >>>>>>> fail after unlocking network.
> >>>>>>
> >>>>>> OK, but this patch does something different.
> >>>>>
> >>>>> No, it doesn't. It doesn't move post-restore, it will be done in another
> >>>>> patch. But network_unlock is a line after which the tree can't be
> >>>>> resumed on the source host.
> >>>>>
> >>>>
> >>>> OK, so this is preparatory.
> >>>> Show me the resulting 2-sided diagram you want to achieve
> >>>
> >>> source                                destination
> >>> criu dump
> >>> post-dump
> >>>                               criu restore
> >>>                               network unlock
> >>>               <--- kill processes
> >>> exit from post_dump
> >>>               [    window     ]
> >>>                               exit from network_unlock
> >>>                               resume the process tree
> >>>
> >>> In this scheme you can kill p.haul in any moment, but the process tree
> >>> will be resumed only on one side. And we have a small window, when the
> >>> tree will not be resumed at all.
> >>>
> >>>> or send the full set.
> >>>
> >>> I want to understand that I have missed nothing before doing anything
> >>> else.
> >>
> >> You told that you want to move post-restore before netowork-unlock,
> >> but it's not in the diagram above. Probably this.
> > 
> > You are trying to troll me. If post_dump will be not interesting for
> > us, if it will be called before network_unlock. post_dump was added to
> > its place by mistake.
> 
> Then rephrase what your intention is w/o saying "post-restore script".

Could you imagine that network_unlock and post_restore are the same type
of scripts and look at my diagram once again?

> Other than this -- I think that making network-unlock script notify
> the source about "restore is complete" is bad idea.

Why? Do you have smth better?

> 
> >         /*
> >          * -------------------------------------------------------------
> >          * Below this line nothing can fail, because network is unlocked
> >          */
> > 
> >         ret = restore_switch_stage(CR_STATE_RESTORE_CREDS);
> >         BUG_ON(ret);
> > 
> >         timing_stop(TIME_RESTORE);
> > 
> >         ret = run_scripts("post-restore");
> > 
> > 
> >>
> >>> Thanks.
> >>> .
> >>>
> >>
> >>
> > .
> > 
> 
>