[CRIU] c/r of tasks with currently open cgroup files

Serge Hallyn serge.hallyn at ubuntu.com
Fri Sep 11 14:49:40 PDT 2015


Quoting Tycho Andersen (tycho.andersen at canonical.com):
> On Fri, Sep 11, 2015 at 08:57:31PM +0000, Serge Hallyn wrote:
> > Quoting Tycho Andersen (tycho.andersen at canonical.com):
> > > Hi all,
> > > 
> > > Some tasks may want to open a cgroup directory (or file):
> > > 
> > > (00.008556) 5826 fdinfo 5: pos: 0x               0 flags:          2304000/0x1
> > > (00.008586) Dumping path for 5 fd via self 31 [/sys/fs/cgroup/systemd/lxc/priv]
> > > 
> > > The problem is that on restore, criu users can pass --cgroup-root=/lxc/priv2
> > > (or whatever) to rewrite their root cgroup paths, and this path is not created.
> > > 
> > > It seems like we can rewrite the paths of any such file to be the new cgroup
> > > root path. However, we can't necessarily do things like lseek, because the
> > > files may not have the right values yet:
> > > 
> > > * tasks: it should have the right values, because the file restore happens
> > >   after CR_STATE_FORKING
> > > * cgroup.procs: this will be wrong in case there are threads, because the
> > >   threads are only cloned by the restorer blob at the very end
> > > * special properties, e.g. cpuset.cpus: we have to fill in these before any
> > >   task is inserted into them so these should be ok
> > > * other properties: these have to be restored at the end for speed reasons, so
> > >   it happens after the fd restore runs
> > > 
> > > I think (?) it's probably ok that lseek won't work in most cases since it's
> > > unlikely normal people will be seeking around in cgroup files. Even still, if
> > > the application has computed a path to open on cgfs, it may have also cached
> > > something about that path, so if we do a --cgroup-root, that'll screw things
> > > up.
> > > 
> > > Should I write a patch to rewrite these file paths and hope for the best?
> > > What's the right course of action here?
> > 
> > I can think of a few "clever" things we could do, but I think the only
> > sane thing to do is detect if there are any cgroupfs files open and
> > refuse --cgroup-root in those cases.    Perhaps a checkpoint option
> > which says "error out if it's going to clash with --cgroup-root" would
> > be good.  Mind you, this isn't perfect - userspace can still have
> > stored the directory pathname and intend to refer to it later.
> 
> It looks like whatever version of systemd is in wily opens the "root"
> systemd cgroup path and just leaves it open,
> 
> root at criu2:/proc/18362/fd# ls -alh /proc/18362/fd/5
> lr-x------. 1 root root 64 Sep 11 15:01 /proc/18362/fd/5 -> /sys/fs/cgroup/systemd/lxc/priv
> 
> So I suspect if we answer is that any modern container will _always_ fail if
> you say --dont-dump-if-cgroup-root-conflicts. The other option would
> be to not use --cgroup-root ever, but that seems bad too.
> 
> > My glib answer would be that userspace should be using cgmanager API
> >  to administer cgroups :)  But that's not helpful.
> > 
> > More helpful is the cgroup namespace.  Doing a next iteration on
> > Aditya's previous patch, augmenting it to work with legacy hierarchies,
> > is on my todo list.  I'm going to look at the patch a bit this afternoon
> > to let it sink into my brain over the weekend.
> 
> Yes, cgroup namespaces would solve this problem nicely. When we talked
> about it in Seattle it hadn't occurred to me that this would be a
> motivating use case.
> 
> What about a clever patch that rewrites stuff in the meantime? In some
> sense it's nice that systemd is just opening this fd and keeping it
> around, which at least means that this rewriting is even possible.

Well true, I guess if it will always do openat() then that just might
work.  Worth trying then.

-serge


More information about the CRIU mailing list