[CRIU] [PATCH 1/2] Re-create cgroups if necessary
Serge Hallyn
serge.hallyn at ubuntu.com
Tue Jun 24 13:12:10 PDT 2014
Quoting Pavel Emelyanov (xemul at parallels.com):
> On 06/24/2014 11:34 PM, Saied Kazemi wrote:
> >
> >
> >
> > On Tue, Jun 24, 2014 at 10:05 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com>> wrote:
> >
> > On 06/24/2014 09:01 PM, Saied Kazemi wrote:
> > >
> > >
> > >
> > > On Tue, Jun 24, 2014 at 9:26 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
> > >
> > > On 06/24/2014 06:12 PM, Serge Hallyn wrote:
> > >
> > > >> Yes. Emply cgroups cannot be discovered through /proc/pid/cgroup file,
> > > >> we should walk the alive cgroup mount. But the problem is -- we cannot
> > > >> just take the system /sys/fs/cgroup/ directories, since there will be
> > > >> cgroups from other containers as well. We should find the root subdir
> > > >> of the container we dump and walk _this_ subtree.
> > > >
> > > > I volunteer to work on a proper cgroup c/r implementation, once Tycho
> > > > gets the very basics done.
> > >
> > > Serge, Tycho, I think I need to clarify one more thing.
> > >
> > > I believe, that once we do full cgroups hierarchy restore all the
> > > mkdirs would go away from the move_in_cgroup() routine. Instead,
> > > we will have some code, that would construct all the cgroup subtree
> > > before criu will start forking tasks. And once we have it, the
> > > move_in_cgroup() would (should) never fail. Thus this patch would
> > > be effectively reversed.
> > >
> > > Thanks,
> > > Pavel
> > >
> > >
> > > I agree. Creation of the cgroup and its subtree should be done in one place as opposed
> > > to being split apart (i.e., between prepare_cgroup_sfd() and move_in_cgroup() as is done
> > > currently).
> > >
> > > Regarding the 4 items to do for cgroups in your earlier email, I believe that we should
> > > have CLI options to tell CRIU what cgroups it needs to restore (almost like the way we
> > > tell it about external bind mounts).
> >
> > I was thinking that if we take the root task, check cgroups it lives in and
> > dump the whole subtree starting from it, this would work properly and would
> > not require and CLI hints.
> >
> > Do you mean, that we need to tell criu where in cgroup hierarchy to start
> > recreating the subtree it dumped?
> >
> > > This way we can handle the empty cgroups as well as dumping and restoring on the same
> > > machine versus on a different machine (i.e., migration). For migration, CRIU definitely
> > > needs to be told how to handle cgroups name collision.
> >
> > But if we ask criu to restore tasks in a fresh new sub-cgroup, why would this
> > collision happen?
> >
> > > This is not something that it can handle at dump time.
> > >
> > > --Saied
> >
> >
> > I am not sure if I understand what is meant by "fresh new sub-cgroup". Since the process
> > has to be restored in the same cgroup, I assume you mean a new mountpoint. But if the
> > cgroup already exists, giving it a private new mountpoint doesn't mean that it will set
> > up a new hierarchy. Consider the following example:
> >
> > # cat /sys/fs/cgroup/hugetlb/notify_on_release
> > # mkdir /mnt/foo
> > # mount -t cgroup -o hugetlb cgroup /mnt/foo
> > # cat /mnt/foo/notify_on_release
> > 0
> > # echo 1 > /sys/fs/cgroup/hugetlb/notify_on_release
> > # cat /mnt/foo/notify_on_release
> > 1
> > # echo 0 > /mnt/foo/notify_on_release
> > # cat /sys/fs/cgroup/hugetlb/notify_on_release
> > 0
> > #
> >
> > So I think we need a mechanism to tell CRIU whether it should expect the cgroup already existing
> > (e.g., restore on the same machine) or not (e.g., restore after reboot or on a different machine).
> >
> > I am not a cgroups expert, but I hope it's more clear now.
>
> Yes, thank you :) My understanding of cgroups tells me that we don't need special option
> for that. AFAIU LXC and OpenVZ don't fail if they create cgroup that already exists,
> neither should CRIU.
Right, if the taskset was under /cpuset/lxc/u1, for instance, then if u1
is running (or /cpuset/lxc/u1 was not cleaned up) then the criu should
simply use /cpuset/lxc/u1.1, then u1.2, etc. Under that, since u1.N did
not exist, there should be no collisions (and if there are it's cause
for failing the restart as we either have a bug, or some race with
another criu instance or another toolset)
(And of course I agree that we should create and configure all cgroups
before we restart any tasks.)
-serge
More information about the CRIU
mailing list