<div dir="ltr"><div>These are the things we considered when doing CMT</div><div><br></div><div><a href="https://github.com/marcosnils/cmt#what-kind-of-validations-does-cmt-do">https://github.com/marcosnils/cmt#what-kind-of-validations-does-cmt-do</a><br></div><div><br></div><div>Marcos.</div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Jul 10, 2016 at 12:20 PM, Pavel Emelyanov <span dir="ltr"><<a href="mailto:xemul@virtuozzo.com" target="_blank">xemul@virtuozzo.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">On 07/08/2016 05:46 PM, Adrian Reber wrote:<br>
><br>
> Sometimes migration of a process is not possible and it is not really<br>
> CRIU faults that it doesn't work.<br>
<br>
</span>Yup :)<br>
<span class=""><br>
> Simplest reasons are the binary is not<br>
> available or different size, libraries are missing, files are missing or<br>
<br>
</span>The above are "FS mismatch errors". BTW, we don't C/R filesystem in CRIU and<br>
this seems to cause certain problems to people. I planned to discuss this on<br>
C/R miniconf on Plumbers.<br>
<span class=""><br>
> cgroup structure are not available.<br>
<br>
</span>And there's also CPU mismatch and kernel modules missing (filesystems/networking).<br>
<span class=""><br>
> I was thinking of adding something to CRIU which checks all these<br>
> obvious reasons for restoration failures.<br>
<br>
</span>Maybe to p.haul? We have code in p.haul that checks for CPUs being compatible<br>
(with the help of criu, of course).<br>
<span class=""><br>
> My first idea was to add an option to make a light-dump and a light-restore.<br>
> It would be checkpoint without ending the process and without dumping<br>
> the memory. This light-checkpoint could then be transferred to the<br>
> destination system to make sure all the simple and obvious errors can be<br>
> excluded and that the chances that the process will be restored are much<br>
> higher.<br>
<br>
</span>Hm... As a bugs catcher this would also work good.<br>
<span class=""><br>
> That the binary and the libraries do exist on the destination system,<br>
> that they have the same size and MD5/SHA1/SHAsomething, that all<br>
> required files are available and that enough free memory is available<br>
> can easily be done in external tools, but as CRIU knows how to collect<br>
> these informations already it seems to make sense to include this into<br>
> CRIU.<br>
<br>
</span>Yes, but since FS checks will go over files that were opened/mapped at<br>
the time of light-dump and can be not such at real dump time.<br>
<br>
Also, there are situations that CRIU cannot C/R (data in NL sockets or<br>
in-flight TCP handshake) but that tend to disappear by themselves.<br>
<span class=""><br>
> Does this sound useful or maybe even completely wrong?<br>
<br>
</span>If we can somehow distinguish permanent impossibilities (CPU mismatch,<br>
kernel modules missing, what else?) from temporary, that would be<br>
really helpful.<br>
<span class=""><font color="#888888"><br>
-- Pavel<br>
</font></span><div class=""><div class="h5"><br>
_______________________________________________<br>
CRIU mailing list<br>
<a href="mailto:CRIU@openvz.org">CRIU@openvz.org</a><br>
<a href="https://lists.openvz.org/mailman/listinfo/criu" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/criu</a><br>
</div></div></blockquote></div><br></div></div>