[CRIU] [PATCH 1/2] Re-create cgroups if necessary

Pavel Emelyanov xemul at parallels.com
Tue Jun 24 12:49:38 PDT 2014


On 06/24/2014 11:34 PM, Saied Kazemi wrote:
> 
> 
> 
> On Tue, Jun 24, 2014 at 10:05 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com>> wrote:
> 
>     On 06/24/2014 09:01 PM, Saied Kazemi wrote:
>     >
>     >
>     >
>     > On Tue, Jun 24, 2014 at 9:26 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
>     >
>     >     On 06/24/2014 06:12 PM, Serge Hallyn wrote:
>     >
>     >     >> Yes. Emply cgroups cannot be discovered through /proc/pid/cgroup file,
>     >     >> we should walk the alive cgroup mount. But the problem is -- we cannot
>     >     >> just take the system /sys/fs/cgroup/ directories, since there will be
>     >     >> cgroups from other containers as well. We should find the root subdir
>     >     >> of the container we dump and walk _this_ subtree.
>     >     >
>     >     > I volunteer to work on a proper cgroup c/r implementation, once Tycho
>     >     > gets the very basics done.
>     >
>     >     Serge, Tycho, I think I need to clarify one more thing.
>     >
>     >     I believe, that once we do full cgroups hierarchy restore all the
>     >     mkdirs would go away from the move_in_cgroup() routine. Instead,
>     >     we will have some code, that would construct all the cgroup subtree
>     >     before criu will start forking tasks. And once we have it, the
>     >     move_in_cgroup() would (should) never fail. Thus this patch would
>     >     be effectively reversed.
>     >
>     >     Thanks,
>     >     Pavel
>     >
>     >
>     > I agree.  Creation of the cgroup and its subtree should be done in one place as opposed
>     > to being split apart (i.e., between prepare_cgroup_sfd() and move_in_cgroup() as is done
>     > currently).
>     >
>     > Regarding the 4 items to do for cgroups in your earlier email, I believe that we should
>     > have CLI options to tell CRIU what cgroups it needs to restore (almost like the way we
>     > tell it about external bind mounts).
> 
>     I was thinking that if we take the root task, check cgroups it lives in and
>     dump the whole subtree starting from it, this would work properly and would
>     not require and CLI hints.
> 
>     Do you mean, that we need to tell criu where in cgroup hierarchy to start
>     recreating the subtree it dumped?
> 
>     > This way we can handle the empty cgroups as well as dumping and restoring on the same
>     > machine versus on a different machine (i.e., migration).  For migration, CRIU definitely
>     > needs to be told how to handle cgroups name collision.
> 
>     But if we ask criu to restore tasks in a fresh new sub-cgroup, why would this
>     collision happen?
> 
>     > This is not something that it can handle at dump time.
>     >
>     > --Saied
> 
> 
> I am not sure if I understand what is meant by "fresh new sub-cgroup".  Since the process 
> has to be restored in the same cgroup, I assume you mean a new mountpoint.  But if the
> cgroup already exists, giving it a private new mountpoint doesn't mean that it will set
> up a new hierarchy.  Consider the following example:
> 
> # cat /sys/fs/cgroup/hugetlb/notify_on_release 
> # mkdir /mnt/foo
> # mount -t cgroup -o hugetlb cgroup /mnt/foo
> # cat /mnt/foo/notify_on_release 
> 0
> # echo 1 > /sys/fs/cgroup/hugetlb/notify_on_release
> # cat /mnt/foo/notify_on_release 
> 1
> # echo 0 > /mnt/foo/notify_on_release 
> # cat /sys/fs/cgroup/hugetlb/notify_on_release 
> 0
> #
> 
> So I think we need a mechanism to tell CRIU whether it should expect the cgroup already existing
> (e.g., restore on the same machine) or not (e.g., restore after reboot or on a different machine).
> 
> I am not a cgroups expert, but I hope it's more clear now.

Yes, thank you :) My understanding of cgroups tells me that we don't need special option 
for that. AFAIU LXC and OpenVZ don't fail if they create cgroup that already exists,
neither should CRIU.

Thanks,
Pavel



More information about the CRIU mailing list