<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I haven't tried that. I was trying to use the native features within CRIU to do this. I'm not particularly interested in another workaround since i have one, I'm more interested in trying to understand the root cause.<br class=""><div class="">
<br class="Apple-interchange-newline"><span style="color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none;" class="">Mark</span>
</div>
<br class=""><div><blockquote type="cite" class=""><div class="">On Aug 25, 2015, at 10:42 AM, Hui Kang <<a href="mailto:hkang.sunysb@gmail.com" class="">hkang.sunysb@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div class="">On Tue, Aug 25, 2015 at 11:40 AM, Mark <<a href="mailto:fl0yd@me.com" class="">fl0yd@me.com</a>> wrote:<br class=""><blockquote type="cite" class="">I'm using Docker 1.9 experimental, the one that has your network changes in<br class="">it from Boucher's branch.<br class=""><br class="">I've tried CG_MODE_SOFT and FULL and in both cases the ffclose(f) still<br class="">returns Invalid Argument.<br class=""></blockquote><br class="">I remember I used full mode and then manually run "echo 0 ><br class="">/sys/fs/cgroup/cpuset/docker/cpuset.cpus" and "echo 0 ><br class="">/sys/fs/cgroup/cpuset/docker/cpuset.mems"<br class=""><br class="">Then the restore will success. Have you tried this?<br class=""><br class="">- Hui<br class=""><br class=""><blockquote type="cite" class=""><br class="">Mark<br class=""><br class="">On Aug 25, 2015, at 10:36 AM, Hui Kang <<a href="mailto:hkang.sunysb@gmail.com" class="">hkang.sunysb@gmail.com</a>> wrote:<br class=""><br class="">Hi, Mark<br class="">I think the failure is caused by criu restoring the root directory of group.<br class=""><br class="">Which branch of docker you used to restore a container? You probably<br class="">need to set manage-cgroup=full.<br class=""><br class="">- Hui<br class=""><br class="">On Tue, Aug 25, 2015 at 5:44 AM, Pavel Emelyanov <<a href="mailto:xemul@parallels.com" class="">xemul@parallels.com</a>><br class="">wrote:<br class=""><br class="">On 08/22/2015 01:39 AM, Mark wrote:<br class=""><br class="">Hi,<br class=""><br class="">We're seeing some issues doing docker-based restores on AWS machines. On<br class="">the first try the restore.log shows the following output:<br class=""> (00.000551) cg: Preparing cgroups yard (cgroups restore mode 0x4)<br class=""> (00.000607) cg: Opening .criu.cgyard.aRmYI0 as cg yard<br class=""> (00.000617) cg: Making controller dir .criu.cgyard.aRmYI0/cpuset<br class="">(cpuset)<br class=""> (00.000691) cg: Created cgroup dir<br class="">cpuset/system.slice/docker-f00d0fe34bcc352377f4750f99fc4a649bd14db65fc15639df35043c62f7733a.scope<br class=""> (00.000733) Error (cgroup.c:978): cg: Failed closing<br class="">cpuset/system.slice/docker-f<br class="">00d0fe34bcc352377f4750f99fc4a649bd14db65fc15639df35043c62f7733a.scope/cpuset.cpus:<br class="">Invalid argument<br class=""> (00.000737) Error (cgroup.c:1083): cg: Restoring special cpuset props<br class="">failed!<br class=""><br class=""><br class="">Failure to close the file is actually because fprintf fails.<br class=""><br class="">On the 2nd try the restore works because it skips the attempt:<br class=""><br class=""> (00.000785) cg: Preparing cgroups yard (cgroups restore mode 0x4)<br class=""> (00.000840) cg: Opening .criu.cgyard.BQ2bKQ as cg yard<br class=""> (00.000850) cg: Making controller dir .criu.cgyard.BQ2bKQ/cpuset<br class="">(cpuset)<br class=""> (00.000877) cg: Determined cgroup dir<br class="">cpuset/system.slice/docker-404a13eab68e35753ee2c66f636aa727aa2c9a7723671d25cc9ffb0ede574178.scope<br class="">already exist<br class=""> (00.000880) cg: Skip restoring properties on cgroup dir<br class="">cpuset/system.slice/docker-404a13eab68e35753ee2c66f636aa727aa2c9a7723671d25cc9ffb0ede574178.scope<br class=""><br class=""><br class="">Well, yes, this is because the directory was created on first restore.<br class=""><br class="">It appears to be a timing issue on the fclose(f) call in cgroups.c. I've<br class="">tried using CG_MODE_SOFT and CG_MODE_FULL and neither have an affect, the<br class="">1st attempt fails and the 2nd succeeds.<br class=""><br class="">To workaround the issue, we've created a fork with these changes and the<br class="">issue hasn't recurred. In fact there hasn't even been a single "Failed to<br class="">flush..." message printed in the logs, so it seems to be a matter of split<br class="">second timing that the for loop allows enough time for the handle to flush.<br class=""><br class="">diff --git a/cgroup.c b/cgroup.c<br class="">index a4e0146..9495206 100644<br class="">--- a/cgroup.c<br class="">+++ b/cgroup.c<br class="">@@ -950,6 +950,8 @@ static int restore_cgroup_prop(const CgroupPropEntry *<br class="">cg_prop_entry_p,<br class="">{<br class=""> FILE *f;<br class=""> int cg;<br class="">+ int flushcounter=0;<br class="">+ int maxtries=500;<br class=""><br class=""> if (!cg_prop_entry_p->value) {<br class=""> pr_err("cg_prop_entry->value was empty when should have had a<br class="">value");<br class="">@@ -974,9 +976,26 @@ static int restore_cgroup_prop(const CgroupPropEntry *<br class="">cg_prop_entry_p,<br class=""> return -1;<br class=""> }<br class=""><br class="">+ /* The fclose() below was failing intermittently with EINVAL at<br class="">AWS*/<br class="">+ /* So we try fflush() in a loop until it succeeds or we've */<br class="">+ /* tried it a bunch. */<br class="">+ for (;;) {<br class="">+ flushcounter++;<br class="">+ if (fflush(f) == 0) {<br class="">+ break;<br class="">+ }<br class="">+ if (flushcounter > maxtries) {<br class="">+ pr_perror("Max fflush() tries %d exceeded. Moving<br class="">along anyway.\n",maxtries);<br class="">+ break;<br class="">+ }<br class="">+ if (fflush(f) != 0) {<br class="">+ pr_perror("Failed to flush %s [%d/%d]\n", path,<br class="">flushcounter,maxtries);<br class="">+ }<br class="">+ }<br class="">+<br class=""><br class=""><br class="">Does this help?!<br class=""><br class=""> if (fclose(f) != 0) {<br class="">- pr_perror("Failed closing %s", path);<br class="">- return -1;<br class="">+ pr_perror("Failed closing %s\n",path);<br class="">+ return -1;<br class=""> }<br class=""><br class="">Can anyone reproduce the issue of offer a suggestion on how we should<br class="">proceed?<br class=""><br class=""><br class="">Hui (in Cc) sees similar in his experiments.<br class=""><br class="">-- Pavel<br class=""><br class=""><br class=""></blockquote></div></div></blockquote></div><br class=""></body></html>