[CRIU] c/r of tasks with currently open cgroup files
Tycho Andersen
tycho.andersen at canonical.com
Mon Sep 14 08:14:04 PDT 2015
On Mon, Sep 14, 2015 at 02:56:52PM +0000, Serge Hallyn wrote:
> Quoting Tycho Andersen (tycho.andersen at canonical.com):
> > On Mon, Sep 14, 2015 at 01:39:50PM +0300, Pavel Emelyanov wrote:
> > > On 09/11/2015 11:36 PM, Tycho Andersen wrote:
> > > > Hi all,
> > > >
> > > > Some tasks may want to open a cgroup directory (or file):
> > > >
> > > > (00.008556) 5826 fdinfo 5: pos: 0x 0 flags: 2304000/0x1
> > > > (00.008586) Dumping path for 5 fd via self 31 [/sys/fs/cgroup/systemd/lxc/priv]
> > > >
> > > > The problem is that on restore, criu users can pass --cgroup-root=/lxc/priv2
> > > > (or whatever) to rewrite their root cgroup paths, and this path is not created.
> > >
> > > The --cgroup-root implicitly implies :) that tasks don't see the full paths of
> > > cgroup files, i.e. thy live in a FS tree where /sys/fs/cgroup/anything points
> > > to some /sys/fs/cgroup/anything/foo/ directory on host in which the tasks we
> > > dump actually live. And putting --cgroup-root on restore means that you just
> > > want to move tasks from /sys/.../foo into /sys/.../bar fixing the visible FS
> > > tree accordingly (with the --ext-mount-map I suppose).
> > >
> > > So how can this happen that task in a container sees full cgroup path?
> >
> > Under lxcfs, tasks do see the full path, they just get EACCES if they
> > try to read/write from it. I suppose another option would be to patch
> > lxcfs to function somewhat like cgroup namespaces and do as you say
> > and hide parts of the cgroup tree. Do you know why it doesn't work
> > this way know Serge?
>
> Not sure I understand the question. lxcfs limits the visibility to
> your container and its descendents, so if before checkpoint you were
> under /sys/fs/cgroup/devices/lxc/foo1, and after restore you are under
> /sys/fs/cgroup/devices/lxc/bar1, then tasks in the container will only
> see bar1 under /sys/fs/cgroup/devices/lxc.
Right, I think the issue is that they see the full path, i.e.:
criu2:~ lxc exec unpriv bash
root at unpriv:~# cat /proc/self/cgroup
10:memory:/lxc/unpriv
...
instead of:
10:memory:/
> There's no way right now for lxcfs to know that it should pretend
> that /sys/fs/cgroup/devices/lxc/foo1 now really means
> /sys/fs/cgroup/devices/lxc/bar1.
>
> The way lxcfs mounts are set up is by a post-mount hook script. So
> you could simply, in criu, set up the scripts to set up /sys/fs/cgroup
> so that /sys/fs/cgroup/devices/lxc/foo1 are a path in tmpfs, and
> /var/lib/lxcfs/cgroup/devices/lxc/bar1 is bind-mounted straight onto
> the restored container's /sys/fs/cgroup/devices/lxc/foo1.
This could work. Although perhaps our use of --cgroup-root in LXC is
incorrect since we don't have anything like cgroup namespaces in the
Ubuntu kernels (yet).
Tycho
More information about the CRIU
mailing list