[CRIU] [PATCH 2/4] restore: TASK_HELPERs live until RESTORE stage

Andrew Vagin avagin at parallels.com
Fri Sep 12 13:21:06 PDT 2014


On Fri, Sep 12, 2014 at 01:47:11PM -0500, Tycho Andersen wrote:
> On Fri, Sep 12, 2014 at 10:35:00PM +0400, Andrew Vagin wrote:
> > On Fri, Sep 12, 2014 at 01:13:00PM -0500, Tycho Andersen wrote:
> > > In order to use TASK_HELPERS to open files from dead processes, they should
> > > persist until the end of the restore stage, so that the /proc files exist when
> > > setting up the fds.
> > > 
> > > This commit is in preparation for the remap_dead_pid commits.
> > > 
> > > v2: wait() on helpers after restore stage is over
> > 
> > [root at avagin-fc19-cr criu]#  bash test/zdtm.sh  ns/static/session00
> > ================================= CRIU CHECK =================================
> > Error (timerfd.c:56): timerfd: No timerfd support for c/r: Inappropriate ioctl for device
> > ============================= WARNING =============================
> > Not all features needed for CRIU are merged to upstream kernel yet,
> > so for now we maintain our own branch which can be cloned from:
> > git://git.kernel.org/pub/scm/linux/kernel/git/gorcunov/linux-cr.git
> > ===================================================================
> > Execute zdtm/live/static/session00
> > ./session00 --pidfile=session00.pid --outfile=session00.out
> > /root/git/criu/test
> > Dump 11182
> > Restore
> > test/zdtm.sh: line 564: 11220 Segmentation fault      setsid $CRIU restore -D $ddump -o restore.log -v4 -d $gen_args
> > Test: zdtm/live/static/session00, Result: FAIL
> > ==================================== ERROR ====================================
> > Test: zdtm/live/static/session00, Namespace: 1
> > Dump log   : /root/git/criu/test/dump/static/session00/11182/1/dump.log
> > --------------------------------- grep Error ---------------------------------
> > ------------------------------------- END -------------------------------------
> > Restore log: /root/git/criu/test/dump/static/session00/11182/1/restore.log
> > --------------------------------- grep Error ---------------------------------
> > (00.162581) Error (cr-restore.c:1738): BUG at cr-restore.c:1738
> > ------------------------------------- END -------------------------------------
> > ================================= ERROR OVER =================================
> 
> :( I guess this is with all the patches applied? Must still be some
> synchronization issue, I will take a look.

Look at stage_participants(). I think you smth like this:
        case CR_STATE_RESTORE:
+               return task_entries->nr_threads + task_entries->nr_helpers;

> 
> Tycho
> 
> > > 
> > > Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
> > > ---
> > >  cr-restore.c | 13 +++++++------
> > >  1 file changed, 7 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/cr-restore.c b/cr-restore.c
> > > index 4d5ccd5..75d3afa 100644
> > > --- a/cr-restore.c
> > > +++ b/cr-restore.c
> > > @@ -702,7 +702,7 @@ static int pstree_wait_helpers()
> > >  {
> > >  	struct pstree_item *pi;
> > >  
> > > -	list_for_each_entry(pi, &current->children, sibling) {
> > > +	for_each_pstree_item(pi) {
> > >  		int status, ret;
> > >  
> > >  		if (pi->state != TASK_HELPER)
> > > @@ -770,9 +770,6 @@ static int restore_one_alive_task(int pid, CoreEntry *core)
> > >  
> > >  	rst_mem_switch_to_private();
> > >  
> > > -	if (pstree_wait_helpers())
> > > -		return -1;
> > > -
> > >  	if (prepare_fds(current))
> > >  		return -1;
> > >  
> > > @@ -931,9 +928,10 @@ static int restore_one_task(int pid, CoreEntry *core)
> > >  		ret = restore_one_alive_task(pid, core);
> > >  	else if (current->state == TASK_DEAD)
> > >  		ret = restore_one_zombie(pid, core);
> > > -	else if (current->state == TASK_HELPER)
> > > +	else if (current->state == TASK_HELPER) {
> > > +		restore_finish_stage(CR_STATE_RESTORE);
> > >  		ret = 0;
> > > -	else {
> > > +	} else {
> > >  		pr_err("Unknown state in code %d\n", (int)core->tc->task_state);
> > >  		ret = -1;
> > >  	}
> > > @@ -1711,6 +1709,9 @@ static int restore_root_task(struct pstree_item *init)
> > >  	if (ret < 0)
> > >  		goto out_kill;
> > >  
> > > +	if (pstree_wait_helpers() < 0)
> > > +		goto out_kill;
> > > +
> > >  	ret = run_scripts(ACT_POST_RESTORE);
> > >  	if (ret != 0) {
> > >  		pr_err("Aborting restore due to script ret code %d\n", ret);
> > > -- 
> > > 1.9.1
> > > 
> > > _______________________________________________
> > > CRIU mailing list
> > > CRIU at openvz.org
> > > https://lists.openvz.org/mailman/listinfo/criu


More information about the CRIU mailing list