[CRIU] problem dumping some kinds of lxc containers

Wed Aug 27 06:43:44 PDT 2014

On Wed, Aug 27, 2014 at 05:11:39PM +0400, Pavel Emelyanov wrote:
> On 08/27/2014 05:08 PM, Tycho Andersen wrote:
> > Hi Pavel,
> > 
> > On Wed, Aug 27, 2014 at 12:35:07PM +0400, Pavel Emelyanov wrote:
> >> On 08/27/2014 03:18 AM, Tycho Andersen wrote:
> >>> Hi all,
> >>>
> >>> I'm trying to dump an lxc container (created with the ubuntu-cloud
> >>> template). I get:
> >>>
> >>> (00.563988) Error (files-reg.c:457): Can't link remap to /proc/20/mountinfo: No such file or directory
> >>>
> >>> /proc/20 doesn't exist, and when this happens there is no pid in the
> >>> container with pid 20. This is a little confusing, though, since
> >>> fill_fdlink() takes a struct fd_parms with a pid in the host pid ns,
> >>> but gives back the path in the container pid ns.
> >>>
> >>> After a bit of debugging, I found that the process that is causing
> >>> this problem is:
> >>>
> >>> root     17593  0.3  0.0  26052  1340 ?        S    17:49   0:00 \_ mountall --daemon
> >>>
> >>> If I try to checkpoint the container after mountall has exited, it all
> >>> works fine.
> >>>
> >>> Any ideas what is going on here?
> >>
> >> Yes. CRIU finds an open file, that cannot be opened by the path kernel provides.
> >> In your case this is because task 20 has died. At the same time stat() reports
> >> that the link count on that file is not 0 (this is due to how proc works), which
> >> in case of disk file would mean, that file "should exist" and we just have to
> >> create some other name for it. This is called "link remap". For disk files CRIU
> >> handles it by creating a hard link on the file. For proc this will obviously not
> >> work, we have to invent something else.
> > 
> > Thanks for the explanation. Any ideas on what the proper solution is?
> 
> I was thinking that when we meet an opened file of a dead task, we could create
> a "fake" one on restore with desired pid (it can be a light-weight task with FS,
> VM, FILES, etc. shared with parent), wait for its /proc/pid/smth to get opened,
> then kill one.
> 
> We have TASK_HELPER state for that in CRIU, they help us restore orphaned pgrps
> and sessions. Probably these helpers can help here too :)

Just to repeat it in my own words so I understand: if we see a
/proc/$pid that doesn't exist, we write it down somewhere (a new
protobuf, or would it fit somewhere currently?), and then on restore
we create a task helper for each pid we found. The rest of the restore
process opens /proc/$pid/whatever, and then it is ok for the task
helper to exit immediately once the restore completes and we are
running the actual processes.

Does that sound about right? Any ideas where the "fake" pid list would
go?

Tycho

> Thanks,
> Pavel
> 
>