<div dir="ltr">Glad things are clearer now and we're converging... The only remaining decision is whether to use the same cgroup as before or not (/cpuset/lxc/u1 or /cpuset/lxc/u1.1 in your example). I would argue that since the state of a process after restore should be the same as before dump, it should be placed in /cpuset/lxc/u1.<div>
<br></div><div>With a CLI option we tell CRIU:</div><div><br></div><div>1. Expect the cgroup to already exist, just put the process back in it. If cgroup doesn't exist, fail.</div><div>2. Expect the cgroup not to exist, create it and put the process in it. If cgroup exists, fail.</div>
<div><br></div><div>Hope this makes sense.</div><div><br></div><div>--Saied</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jun 24, 2014 at 1:12 PM, Serge Hallyn <span dir="ltr"><<a href="mailto:serge.hallyn@ubuntu.com" target="_blank">serge.hallyn@ubuntu.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Quoting Pavel Emelyanov (<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>):<br>
> On 06/24/2014 11:34 PM, Saied Kazemi wrote:<br>
> ><br>
> ><br>
> ><br>
> > On Tue, Jun 24, 2014 at 10:05 AM, Pavel Emelyanov <<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> <mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>>> wrote:<br>
> ><br>
> > On 06/24/2014 09:01 PM, Saied Kazemi wrote:<br>
> > ><br>
> > ><br>
> > ><br>
> > > On Tue, Jun 24, 2014 at 9:26 AM, Pavel Emelyanov <<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> <mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>> <mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> <mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>>>> wrote:<br>
> > ><br>
> > > On 06/24/2014 06:12 PM, Serge Hallyn wrote:<br>
> > ><br>
> > > >> Yes. Emply cgroups cannot be discovered through /proc/pid/cgroup file,<br>
> > > >> we should walk the alive cgroup mount. But the problem is -- we cannot<br>
> > > >> just take the system /sys/fs/cgroup/ directories, since there will be<br>
> > > >> cgroups from other containers as well. We should find the root subdir<br>
> > > >> of the container we dump and walk _this_ subtree.<br>
> > > ><br>
> > > > I volunteer to work on a proper cgroup c/r implementation, once Tycho<br>
> > > > gets the very basics done.<br>
> > ><br>
> > > Serge, Tycho, I think I need to clarify one more thing.<br>
> > ><br>
> > > I believe, that once we do full cgroups hierarchy restore all the<br>
> > > mkdirs would go away from the move_in_cgroup() routine. Instead,<br>
> > > we will have some code, that would construct all the cgroup subtree<br>
> > > before criu will start forking tasks. And once we have it, the<br>
> > > move_in_cgroup() would (should) never fail. Thus this patch would<br>
> > > be effectively reversed.<br>
> > ><br>
> > > Thanks,<br>
> > > Pavel<br>
> > ><br>
> > ><br>
> > > I agree. Creation of the cgroup and its subtree should be done in one place as opposed<br>
> > > to being split apart (i.e., between prepare_cgroup_sfd() and move_in_cgroup() as is done<br>
> > > currently).<br>
> > ><br>
> > > Regarding the 4 items to do for cgroups in your earlier email, I believe that we should<br>
> > > have CLI options to tell CRIU what cgroups it needs to restore (almost like the way we<br>
> > > tell it about external bind mounts).<br>
> ><br>
> > I was thinking that if we take the root task, check cgroups it lives in and<br>
> > dump the whole subtree starting from it, this would work properly and would<br>
> > not require and CLI hints.<br>
> ><br>
> > Do you mean, that we need to tell criu where in cgroup hierarchy to start<br>
> > recreating the subtree it dumped?<br>
> ><br>
> > > This way we can handle the empty cgroups as well as dumping and restoring on the same<br>
> > > machine versus on a different machine (i.e., migration). For migration, CRIU definitely<br>
> > > needs to be told how to handle cgroups name collision.<br>
> ><br>
> > But if we ask criu to restore tasks in a fresh new sub-cgroup, why would this<br>
> > collision happen?<br>
> ><br>
> > > This is not something that it can handle at dump time.<br>
> > ><br>
> > > --Saied<br>
> ><br>
> ><br>
> > I am not sure if I understand what is meant by "fresh new sub-cgroup". Since the process<br>
> > has to be restored in the same cgroup, I assume you mean a new mountpoint. But if the<br>
> > cgroup already exists, giving it a private new mountpoint doesn't mean that it will set<br>
> > up a new hierarchy. Consider the following example:<br>
> ><br>
> > # cat /sys/fs/cgroup/hugetlb/notify_on_release<br>
> > # mkdir /mnt/foo<br>
> > # mount -t cgroup -o hugetlb cgroup /mnt/foo<br>
> > # cat /mnt/foo/notify_on_release<br>
> > 0<br>
> > # echo 1 > /sys/fs/cgroup/hugetlb/notify_on_release<br>
> > # cat /mnt/foo/notify_on_release<br>
> > 1<br>
> > # echo 0 > /mnt/foo/notify_on_release<br>
> > # cat /sys/fs/cgroup/hugetlb/notify_on_release<br>
> > 0<br>
> > #<br>
> ><br>
> > So I think we need a mechanism to tell CRIU whether it should expect the cgroup already existing<br>
> > (e.g., restore on the same machine) or not (e.g., restore after reboot or on a different machine).<br>
> ><br>
> > I am not a cgroups expert, but I hope it's more clear now.<br>
><br>
> Yes, thank you :) My understanding of cgroups tells me that we don't need special option<br>
> for that. AFAIU LXC and OpenVZ don't fail if they create cgroup that already exists,<br>
> neither should CRIU.<br>
<br>
</div></div>Right, if the taskset was under /cpuset/lxc/u1, for instance, then if u1<br>
is running (or /cpuset/lxc/u1 was not cleaned up) then the criu should<br>
simply use /cpuset/lxc/u1.1, then u1.2, etc. Under that, since u1.N did<br>
not exist, there should be no collisions (and if there are it's cause<br>
for failing the restart as we either have a bug, or some race with<br>
another criu instance or another toolset)<br>
<br>
(And of course I agree that we should create and configure all cgroups<br>
before we restart any tasks.)<br>
<span class="HOEnZb"><font color="#888888"><br>
-serge<br>
</font></span></blockquote></div><br></div>