<div dir="ltr">Glad things are clearer now and we&#39;re converging...  The only remaining decision is whether to use the same cgroup as before or not (/cpuset/lxc/u1 or /cpuset/lxc/u1.1 in your example).  I would argue that since the state of a process after restore should be the same as before dump, it should be placed in /cpuset/lxc/u1.<div>

<br></div><div>With a CLI option we tell CRIU:</div><div><br></div><div>1. Expect the cgroup to already exist, just put the process back in it.  If cgroup doesn&#39;t exist, fail.</div><div>2. Expect the cgroup not to exist, create it and put the process in it.  If cgroup exists, fail.</div>

<div><br></div><div>Hope this makes sense.</div><div><br></div><div>--Saied</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jun 24, 2014 at 1:12 PM, Serge Hallyn <span dir="ltr">&lt;<a href="mailto:serge.hallyn@ubuntu.com" target="_blank">serge.hallyn@ubuntu.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">Quoting Pavel Emelyanov (<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>):<br>


&gt; On 06/24/2014 11:34 PM, Saied Kazemi wrote:<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; On Tue, Jun 24, 2014 at 10:05 AM, Pavel Emelyanov &lt;<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> &lt;mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>&gt;&gt; wrote:<br>

&gt; &gt;<br>

&gt; &gt;     On 06/24/2014 09:01 PM, Saied Kazemi wrote:<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt; On Tue, Jun 24, 2014 at 9:26 AM, Pavel Emelyanov &lt;<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> &lt;mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>&gt; &lt;mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a> &lt;mailto:<a href="mailto:xemul@parallels.com">xemul@parallels.com</a>&gt;&gt;&gt; wrote:<br>


&gt; &gt;     &gt;<br>

&gt; &gt;     &gt;     On 06/24/2014 06:12 PM, Serge Hallyn wrote:<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt;     &gt;&gt; Yes. Emply cgroups cannot be discovered through /proc/pid/cgroup file,<br>

&gt; &gt;     &gt;     &gt;&gt; we should walk the alive cgroup mount. But the problem is -- we cannot<br>

&gt; &gt;     &gt;     &gt;&gt; just take the system /sys/fs/cgroup/ directories, since there will be<br>

&gt; &gt;     &gt;     &gt;&gt; cgroups from other containers as well. We should find the root subdir<br>

&gt; &gt;     &gt;     &gt;&gt; of the container we dump and walk _this_ subtree.<br>

&gt; &gt;     &gt;     &gt;<br>

&gt; &gt;     &gt;     &gt; I volunteer to work on a proper cgroup c/r implementation, once Tycho<br>

&gt; &gt;     &gt;     &gt; gets the very basics done.<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt;     Serge, Tycho, I think I need to clarify one more thing.<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt;     I believe, that once we do full cgroups hierarchy restore all the<br>

&gt; &gt;     &gt;     mkdirs would go away from the move_in_cgroup() routine. Instead,<br>

&gt; &gt;     &gt;     we will have some code, that would construct all the cgroup subtree<br>

&gt; &gt;     &gt;     before criu will start forking tasks. And once we have it, the<br>

&gt; &gt;     &gt;     move_in_cgroup() would (should) never fail. Thus this patch would<br>

&gt; &gt;     &gt;     be effectively reversed.<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt;     Thanks,<br>

&gt; &gt;     &gt;     Pavel<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt; I agree.  Creation of the cgroup and its subtree should be done in one place as opposed<br>

&gt; &gt;     &gt; to being split apart (i.e., between prepare_cgroup_sfd() and move_in_cgroup() as is done<br>

&gt; &gt;     &gt; currently).<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt; Regarding the 4 items to do for cgroups in your earlier email, I believe that we should<br>

&gt; &gt;     &gt; have CLI options to tell CRIU what cgroups it needs to restore (almost like the way we<br>

&gt; &gt;     &gt; tell it about external bind mounts).<br>

&gt; &gt;<br>

&gt; &gt;     I was thinking that if we take the root task, check cgroups it lives in and<br>

&gt; &gt;     dump the whole subtree starting from it, this would work properly and would<br>

&gt; &gt;     not require and CLI hints.<br>

&gt; &gt;<br>

&gt; &gt;     Do you mean, that we need to tell criu where in cgroup hierarchy to start<br>

&gt; &gt;     recreating the subtree it dumped?<br>

&gt; &gt;<br>

&gt; &gt;     &gt; This way we can handle the empty cgroups as well as dumping and restoring on the same<br>

&gt; &gt;     &gt; machine versus on a different machine (i.e., migration).  For migration, CRIU definitely<br>

&gt; &gt;     &gt; needs to be told how to handle cgroups name collision.<br>

&gt; &gt;<br>

&gt; &gt;     But if we ask criu to restore tasks in a fresh new sub-cgroup, why would this<br>

&gt; &gt;     collision happen?<br>

&gt; &gt;<br>

&gt; &gt;     &gt; This is not something that it can handle at dump time.<br>

&gt; &gt;     &gt;<br>

&gt; &gt;     &gt; --Saied<br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt; &gt; I am not sure if I understand what is meant by &quot;fresh new sub-cgroup&quot;.  Since the process<br>

&gt; &gt; has to be restored in the same cgroup, I assume you mean a new mountpoint.  But if the<br>

&gt; &gt; cgroup already exists, giving it a private new mountpoint doesn&#39;t mean that it will set<br>

&gt; &gt; up a new hierarchy.  Consider the following example:<br>

&gt; &gt;<br>

&gt; &gt; # cat /sys/fs/cgroup/hugetlb/notify_on_release<br>

&gt; &gt; # mkdir /mnt/foo<br>

&gt; &gt; # mount -t cgroup -o hugetlb cgroup /mnt/foo<br>

&gt; &gt; # cat /mnt/foo/notify_on_release<br>

&gt; &gt; 0<br>

&gt; &gt; # echo 1 &gt; /sys/fs/cgroup/hugetlb/notify_on_release<br>

&gt; &gt; # cat /mnt/foo/notify_on_release<br>

&gt; &gt; 1<br>

&gt; &gt; # echo 0 &gt; /mnt/foo/notify_on_release<br>

&gt; &gt; # cat /sys/fs/cgroup/hugetlb/notify_on_release<br>

&gt; &gt; 0<br>

&gt; &gt; #<br>

&gt; &gt;<br>

&gt; &gt; So I think we need a mechanism to tell CRIU whether it should expect the cgroup already existing<br>

&gt; &gt; (e.g., restore on the same machine) or not (e.g., restore after reboot or on a different machine).<br>

&gt; &gt;<br>

&gt; &gt; I am not a cgroups expert, but I hope it&#39;s more clear now.<br>

&gt;<br>

&gt; Yes, thank you :) My understanding of cgroups tells me that we don&#39;t need special option<br>

&gt; for that. AFAIU LXC and OpenVZ don&#39;t fail if they create cgroup that already exists,<br>

&gt; neither should CRIU.<br>

<br>

</div></div>Right, if the taskset was under /cpuset/lxc/u1, for instance, then if u1<br>

is running (or /cpuset/lxc/u1 was not cleaned up) then the criu should<br>

simply use /cpuset/lxc/u1.1, then u1.2, etc.  Under that, since u1.N did<br>

not exist, there should be no collisions (and if there are it&#39;s cause<br>

for failing the restart as we either have a bug, or some race with<br>

another criu instance or another toolset)<br>

<br>

(And of course I agree that we should create and configure all cgroups<br>

before we restart any tasks.)<br>

<span class="HOEnZb"><font color="#888888"><br>

-serge<br>

</font></span></blockquote></div><br></div>