[CRIU] [PATCH] restore: define root_as_sibling before using it

Tycho Andersen tycho.andersen at canonical.com
Tue Sep 9 13:42:41 PDT 2014


On Tue, Sep 09, 2014 at 11:58:00PM +0400, Pavel Emelyanov wrote:
> On 09/09/2014 11:44 PM, Tycho Andersen wrote:
> > On Tue, Sep 09, 2014 at 11:23:47PM +0400, Pavel Emelyanov wrote:
> >> On 09/09/2014 10:15 PM, Tycho Andersen wrote:
> >>> root_as_sibling is used in criu_signals_setup(), but was only defined later
> >>> (when forking the root task for the first time). This meant that the
> >>> SA_NOCLDSTOP was never masked off, which meant SIGCHLD was never delivered
> >>> after ptracing the root task. Thus, when the a child of the root task died
> >>> (e.g. from cr_system), the root task sat in PTRACE_STOP, and the restore task
> >>> never PTRACE_CONT'd, resulting in a deadlock.
> >>>
> >>> We also drop the pdeath_sig constraint from setting root_as_sibling when in
> >>> --restore-detached mode; in --restore-detached we /always/ need to have
> >>> root_as_sibling, but we only need to clone the parent if pdeath_sig is set and
> >>> we want to restore the task as alive.
> >>>
> >>> v2: re-work the condition for CLONE_PARENT
> >>>
> >>> Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> >>> ---
> >>>  cr-restore.c | 30 ++++++++++++++++++++----------
> >>>  1 file changed, 20 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/cr-restore.c b/cr-restore.c
> >>> index 2735d0d..fe6c798 100644
> >>> --- a/cr-restore.c
> >>> +++ b/cr-restore.c
> >>> @@ -956,25 +956,21 @@ struct cr_clone_arg {
> >>>  static void maybe_clone_parent(struct pstree_item *item,
> >>>  			      struct cr_clone_arg *ca)
> >>>  {
> >>> +	/*
> >>> +	 * zdtm runs in kernel 3.11, which has the problem described below. We
> >>> +	 * avoid this by including the pdeath_sig test. Once users/zdtm migrate
> >>> +	 * off of 3.11, this condition can be simplified to just test
> >>> +	 * root_as_sibling.
> >>> +	 */
> >>>  	if (opts.swrk_restore ||
> >>>  	    (opts.restore_detach && ca->core->thread_core->pdeath_sig)) {
> >>>  		/*
> >>> -		 * This means we're called from lib's criu_restore_child().
> >>> -		 * In that case create the root task as the child one to+
> >>> -		 * the caller. This is the only way to correctly restore the
> >>> -		 * pdeath_sig of the root task. But also looks nice.
> >>> -		 *
> >>> -		 * Alternatively, if we are --restore-detached, a similar trick is
> >>> -		 * needed to correctly restore pdeath_sig and prevent processes from
> >>> -		 * dying once restored.
> >>> -		 *
> >>>  		 * There were a problem in kernel 3.11 -- CLONE_PARENT can't be
> >>>  		 * set together with CLONE_NEWPID, which has been solved in further
> >>>  		 * versions of the kernels, but we treat 3.11 as a base, so at
> >>>  		 * least warn a user about potential problems.
> >>>  		 */
> >>>  		item->rst->clone_flags |= CLONE_PARENT;
> >>> -		root_as_sibling = 1;
> >>>  		if (item->rst->clone_flags & CLONE_NEWPID)
> >>>  			pr_warn("Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,"
> >>>  				"because not all kernels support such clone flags combinations!\n");
> >>> @@ -1792,6 +1788,20 @@ int cr_restore_tasks(void)
> >>>  {
> >>>  	int ret = -1;
> >>>  
> >>> +	if (opts.swrk_restore || opts.restore_detach) {
> >>> +		/*
> >>> +		 * This means we're called from lib's criu_restore_child().
> >>> +		 * In that case create the root task as the child one to+
> >>> +		 * the caller. This is the only way to correctly restore the
> >>> +		 * pdeath_sig of the root task. But also looks nice.
> >>> +		 *
> >>> +		 * Alternatively, if we are --restore-detached, a similar trick is
> >>> +		 * needed to correctly restore pdeath_sig and prevent processes from
> >>> +		 * dying once restored.
> >>> +		 */
> >>> +		root_as_sibling = 1;
> >>
> >> The root_as_sibling and item->rst->clone_flags above should come
> >> in sync, they don't work one w/o another. Thus the whole complex
> >> logic about whether or not to do it should sit here and the
> >> maybe_clone_parent() should just check the root_as_sibling.
> > 
> > But isn't it wrong to --restore-detach and not set root_as_sibling?
> 
> If we --restore-detach and don't set this variable the root task
> would be born as criu's child, then criu would exit and the fresh
> task would get re-parented to init. Why is it wrong from your
> point of view?

I guess I don't think either way is particularly wrong, but I think
it should be one or the other always, not creating the process tree
with some parent based on some internal state of the dumped process,
since that is "hard" to predict.

In LXC we depend (well, it won't break anything except for convention)
on this behavior, so it would be nice to keep it consistent.

> BTW, the --restore-detach is the CLI use-case. We also have the
> scenario when we restore tasks from standalone service, in that
> case restore tasks must get re-parented to init and thus must
> be born as criu child. And we have the criu_restore_child() case
> -- here the task must be born as criu sibling. %)

Yes, although I think all three usecases are solved by dropping the
last part of the &&, except that zdtm won't let us :(

> > Another problem is that we don't actually have the core when we are
> > setting up the signals, those are loaded in fork_with_pid.
> 
> Damn :( And thus we don't have any ideas whether or not we're should
> use the root_as_sibling. It's already midnight in MSK, so I need some
> time-out to think what we can do about it, sorry. But if you have some 
> good idea -- don't wait for me to wake up :)

I think we can just unmask SA_NOCLDSTOP when we PTRACE_SEIZE. I will
send a patch.

Tycho


More information about the CRIU mailing list