[CRIU] [PATCH] restore: handle exit code of the unlock network script
Andrew Vagin
avagin at parallels.com
Tue Mar 25 07:17:44 PDT 2014
On Tue, Mar 25, 2014 at 05:57:19PM +0400, Pavel Emelyanov wrote:
> On 03/25/2014 05:52 PM, Andrey Wagin wrote:
> > 2014-03-25 17:45 GMT+04:00 Pavel Emelyanov <xemul at parallels.com>:
> >> On 03/25/2014 05:41 PM, Andrew Vagin wrote:
> >>> On Tue, Mar 25, 2014 at 05:23:07PM +0400, Pavel Emelyanov wrote:
> >>>> On 03/25/2014 05:13 PM, Andrew Vagin wrote:
> >>>>> On Tue, Mar 25, 2014 at 05:06:53PM +0400, Pavel Emelyanov wrote:
> >>>>>> On 03/25/2014 12:41 PM, Andrew Vagin wrote:
> >>>>>>> On Tue, Mar 25, 2014 at 02:27:33AM +0400, Pavel Emelyanov wrote:
> >>>>>>>> On 03/24/2014 03:07 PM, Andrey Vagin wrote:
> >>>>>>>>> When we are migrating processes from one host to another host,
> >>>>>>>>> we need to know the moment, when processes can be killed on the source
> >>>>>>>>> host.
> >>>>>>>>> If a migration script is killed (segv, exception, etc), the process tree
> >>>>>>>>> must not live on both nodes and we need to reduce the chance of
> >>>>>>>>> killing processes.
> >>>>>>>>
> >>>>>>>> I didn't quite get why the existing scheme used by p.haul is flawed.
> >>>>>>>> Can you draw a two-sided diagram of source-destination interaction
> >>>>>>>> and show where the problem is and how you propose to solve it?
> >>>>>>>
> >>>>>>> source destination
> >>>>>>> criu dump
> >>>>>>> post-dump
> >>>>>>> criu restore
> >>>>>>> network unlock
> >>>>>>> post-restore
> >>>>>>> kill p.haul before receiving cr_rpc.RESTORE
> >>>>>>> resume
> >>>>>>>
> >>>>>>> In this case both hosts will have alive process trees...
> >>>>>>>
> >>>>>>> And I want to move post-restore before network_unlock, because we can't
> >>>>>>> fail after unlocking network.
> >>>>>>
> >>>>>> OK, but this patch does something different.
> >>>>>
> >>>>> No, it doesn't. It doesn't move post-restore, it will be done in another
> >>>>> patch. But network_unlock is a line after which the tree can't be
> >>>>> resumed on the source host.
> >>>>>
> >>>>
> >>>> OK, so this is preparatory.
> >>>> Show me the resulting 2-sided diagram you want to achieve
> >>>
> >>> source destination
> >>> criu dump
> >>> post-dump
> >>> criu restore
> >>> network unlock
> >>> <--- kill processes
> >>> exit from post_dump
> >>> [ window ]
> >>> exit from network_unlock
> >>> resume the process tree
> >>>
> >>> In this scheme you can kill p.haul in any moment, but the process tree
> >>> will be resumed only on one side. And we have a small window, when the
> >>> tree will not be resumed at all.
> >>>
> >>>> or send the full set.
> >>>
> >>> I want to understand that I have missed nothing before doing anything
> >>> else.
> >>
> >> You told that you want to move post-restore before netowork-unlock,
> >> but it's not in the diagram above. Probably this.
> >
> > You are trying to troll me. If post_dump will be not interesting for
> > us, if it will be called before network_unlock. post_dump was added to
> > its place by mistake.
>
> Then rephrase what your intention is w/o saying "post-restore script".
Could you imagine that network_unlock and post_restore are the same type
of scripts and look at my diagram once again?
> Other than this -- I think that making network-unlock script notify
> the source about "restore is complete" is bad idea.
Why? Do you have smth better?
>
> > /*
> > * -------------------------------------------------------------
> > * Below this line nothing can fail, because network is unlocked
> > */
> >
> > ret = restore_switch_stage(CR_STATE_RESTORE_CREDS);
> > BUG_ON(ret);
> >
> > timing_stop(TIME_RESTORE);
> >
> > ret = run_scripts("post-restore");
> >
> >
> >>
> >>> Thanks.
> >>> .
> >>>
> >>
> >>
> > .
> >
>
>
More information about the CRIU
mailing list