[CRIU] [PATCH 3/4] remap: add a dead pid /proc remap

Tycho Andersen tycho.andersen at canonical.com
Fri Sep 19 06:27:50 PDT 2014


On Fri, Sep 19, 2014 at 11:12:07AM +0400, Pavel Emelyanov wrote:
> On 09/18/2014 09:54 PM, Tycho Andersen wrote:
> > Hi Pavel,
> > 
> > On Thu, Sep 18, 2014 at 08:07:52PM +0400, Pavel Emelyanov wrote:
> >> On 09/18/2014 05:41 PM, Tycho Andersen wrote:
> >>> On Thu, Sep 18, 2014 at 03:03:15PM +0400, Pavel Emelyanov wrote:
> >>>> On 09/18/2014 12:32 AM, Tycho Andersen wrote:
> >>>>> If a file like /proc/20/mountinfo is open, but 20 is a zombie (or doesn't exist
> >>>>> any more), we can't read this file at all, so a link remap won't work. Instead,
> >>>>> we add a new remap, called the dead process remap, which forks a TASK_HELPER as
> >>>>> that dead pid so that the restore task can open the new /proc/20/mountinfo
> >>>>> instead.
> >>>>>
> >>>>> This commit also adds a new stage CR_STATE_RESTORE_SHARED. Since new
> >>>>> TASK_HELPERS are added when loading the shared resource images, we need to wait
> >>>>> to start forking tasks until after these resources are loaded.
> >>>>
> >>>> I don't quite understand this explanation. CRIU forks root task, which calls
> >>>> root_prepare_shared() in which all the remaps are collected. At that point any
> >>>> proc remap is lined into the pstree. After this root task starts forking
> >>>> whatever entries it finds in there. Why do we need one more stage?
> >>>
> >>> Because of restore_root_task. Think of the case when after the
> >>> CR_STATE_RESTORE_NS, the restore_root_task process runs until it
> >>> starts waiting for CR_STATE_FORKING. It enters that wait looking for
> >>> task_entries->nr_tasks + task_entries->nr_helpers. But then the init
> >>> task runs, parses the shared state and adds some more helpers. The
> >>> first thread was waiting on the wrong number of helpers.
> >>>
> >>> Another option would be to get rid of the CR_STATE_RESTORE_NS point
> >>> and just synchronize after restoring the shared state. Then the
> >>> ACT_SETUP_NS scripts wouldn't necessarily run before the tasks are
> >>> forked, though. I didn't know if that was a problem or not, so I left
> >>> it the way it was.
> >>
> >> I think we can just move the call to root_prepare_shared() to be above
> >> the _NS synchronization point. IIRC there's nothing that requires namespaces
> >> to be set up.
> >>
> >> Can you try that?
> > 
> > With that [1], I get a failure running all the tests:
> > 
> > Execute zdtm/live/static/cwd01
> > ./cwd01 --pidfile=cwd01.pid --outfile=cwd01.out --dirname=cwd01.test
> > /tmp/criu/test
> > Dump 9484
> > Restore
> > Check results 9515
> > Waiting...
> > cat: zdtm/live/static/cwd01.out: No such file or directory
> > cat: zdtm/live/static/cwd01.out: No such file or directory
> > Test: zdtm/live/static/cwd01, Result: FAIL
> > ==================================== ERROR ====================================
> > Test: zdtm/live/static/cwd01, Namespace: 1
> > Dump log   : /tmp/criu/test/dump/static/cwd01/9484/1/dump.log
> > --------------------------------- grep Error ---------------------------------
> > ------------------------------------- END -------------------------------------
> > Restore log: /tmp/criu/test/dump/static/cwd01/9484/1/restore.log
> > --------------------------------- grep Error ---------------------------------
> > (00.123315)      1: Error (files-reg.c:97): Can't make ghost dir: No such file or directory
> > (00.123442) Error (cr-restore.c:1785): Restoring FAILED.
> > ------------------------------------- END -------------------------------------
> > ================================= ERROR OVER =================================
> > 
> > I guess this must depend on the results of prepare_namespace, which has to
> > come after the RESTORE_NS synchronize call?
> 
> Hm... It looks like inter-dependencies in criu code became more complex that
> I think they were :) OK, I like the patches. I'll apply them and we'll think
> what to do with extra stage a bit later.

All interesting things are complex, I guess :)

Tycho

> Thanks!
> 
> > [1]: https://github.com/tych0/criu/commit/5bbc76d6e6c67b5dc7e7496a19e5034a7fea82e6
> > 
> > Tycho
> > .
> > 
> 


More information about the CRIU mailing list