[CRIU] c/r of tasks with currently open cgroup files

Tycho Andersen tycho.andersen at canonical.com
Fri Sep 11 14:09:37 PDT 2015


On Fri, Sep 11, 2015 at 08:57:31PM +0000, Serge Hallyn wrote:
> Quoting Tycho Andersen (tycho.andersen at canonical.com):
> > Hi all,
> > 
> > Some tasks may want to open a cgroup directory (or file):
> > 
> > (00.008556) 5826 fdinfo 5: pos: 0x               0 flags:          2304000/0x1
> > (00.008586) Dumping path for 5 fd via self 31 [/sys/fs/cgroup/systemd/lxc/priv]
> > 
> > The problem is that on restore, criu users can pass --cgroup-root=/lxc/priv2
> > (or whatever) to rewrite their root cgroup paths, and this path is not created.
> > 
> > It seems like we can rewrite the paths of any such file to be the new cgroup
> > root path. However, we can't necessarily do things like lseek, because the
> > files may not have the right values yet:
> > 
> > * tasks: it should have the right values, because the file restore happens
> >   after CR_STATE_FORKING
> > * cgroup.procs: this will be wrong in case there are threads, because the
> >   threads are only cloned by the restorer blob at the very end
> > * special properties, e.g. cpuset.cpus: we have to fill in these before any
> >   task is inserted into them so these should be ok
> > * other properties: these have to be restored at the end for speed reasons, so
> >   it happens after the fd restore runs
> > 
> > I think (?) it's probably ok that lseek won't work in most cases since it's
> > unlikely normal people will be seeking around in cgroup files. Even still, if
> > the application has computed a path to open on cgfs, it may have also cached
> > something about that path, so if we do a --cgroup-root, that'll screw things
> > up.
> > 
> > Should I write a patch to rewrite these file paths and hope for the best?
> > What's the right course of action here?
> 
> I can think of a few "clever" things we could do, but I think the only
> sane thing to do is detect if there are any cgroupfs files open and
> refuse --cgroup-root in those cases.    Perhaps a checkpoint option
> which says "error out if it's going to clash with --cgroup-root" would
> be good.  Mind you, this isn't perfect - userspace can still have
> stored the directory pathname and intend to refer to it later.

It looks like whatever version of systemd is in wily opens the "root"
systemd cgroup path and just leaves it open,

root at criu2:/proc/18362/fd# ls -alh /proc/18362/fd/5
lr-x------. 1 root root 64 Sep 11 15:01 /proc/18362/fd/5 -> /sys/fs/cgroup/systemd/lxc/priv

So I suspect if we answer is that any modern container will _always_ fail if
you say --dont-dump-if-cgroup-root-conflicts. The other option would
be to not use --cgroup-root ever, but that seems bad too.

> My glib answer would be that userspace should be using cgmanager API
>  to administer cgroups :)  But that's not helpful.
> 
> More helpful is the cgroup namespace.  Doing a next iteration on
> Aditya's previous patch, augmenting it to work with legacy hierarchies,
> is on my todo list.  I'm going to look at the patch a bit this afternoon
> to let it sink into my brain over the weekend.

Yes, cgroup namespaces would solve this problem nicely. When we talked
about it in Seattle it hadn't occurred to me that this would be a
motivating use case.

What about a clever patch that rewrites stuff in the meantime? In some
sense it's nice that systemd is just opening this fd and keeping it
around, which at least means that this rewriting is even possible.

Tycho


More information about the CRIU mailing list