<div dir="ltr"><div><div><div>Just as a follow-up, I tried reproducing the -404 related errors this morning with no luck, so it must have been a temporary (albeit frustrating) hiccup in the NFS connection that was causing issues with reading the image files. <br></div><div><br></div><div>The aforementioned ELF bug of course is still present in both master and criu-dev, but as you mentioned is not present prior to 3.7, so I'll be using that as my base going forward with P.Haul development until this bug is resolved.<br><br></div>Thank you all for the assistance!<br><br></div>Best,<br></div>John Goen<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jan 23, 2018 at 6:38 AM, Mike Rapoport <span dir="ltr"><<a href="mailto:rppt@linux.vnet.ibm.com" target="_blank">rppt@linux.vnet.ibm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Mon, Jan 22, 2018 at 09:01:25PM -0700, John Goen wrote:<br>
> Hello,<br>
><br>
> Since this is a development related issue, I wasn't sure if I should post<br>
> this here or in the GitHub issues, so I opted to post here. Please let me<br>
> know if this fits better elsewhere.<br>
><br>
> I've been working on getting my development environment set up to best<br>
> implement lazy migration support for P.Haul, but have run into a few snags<br>
> related to CRIU's lazy migration feature: namely that it fails in different<br>
> ways both on master and criu-dev branches, which I will outline here.<br>
><br>
> Environment Setup:<br>
> I am using vagrant for all of my VM configuration and 'spin-up'. The<br>
> general configurations are as follows:<br>
> OS: Ubuntu Xenial 64-bit<br>
> Kernel: 4.13.0-26-generic (upgraded via apt's packages from 4.4)<br>
> RAM: 4 GB<br>
><br>
> I run 2 of these VMs with the following setup:<br>
> criuSrc:<br>
> -has NFS mount setup at /mnt/<br>
> criuDest:<br>
> -has mapped the /mnt/ directory of criuSrc via NFS<br>
><br>
> I have the following packages installed on each VM as per the<br>
> suggestions on the wiki:<br>
> libprotobuf-dev libprotobuf-c0-dev protobuf-c-compiler<br>
> protobuf-compiler python-protobuf \<br>
> pkg-config python-ipaddr iproute2 libcap-dev libnl-3-dev libnet-dev<br>
> libaio-dev python-yaml asciidoc xmlto --no-install-recommends<br>
><br>
> With this setup (setup files here if comfortable with vagrant:<br>
> <a href="https://github.com/jtgoen/vagrant/tree/master/criu" rel="noreferrer" target="_blank">https://github.com/jtgoen/<wbr>vagrant/tree/master/criu</a>), I am able to run<br>
> simple single-dump and iterative migrations of a simple looping test<br>
> program similar to the one found in this example video for lazy migration<br>
> without issue:<br>
> <a href="https://asciinema.org/a/146427" rel="noreferrer" target="_blank">https://asciinema.org/a/146427</a><br>
> (the github link above has my commands for simple/iterative/lazy<br>
> migration outlined in simple-proc-live-migration.<wbr>txt)<br>
><br>
> However, when performing the lazy migration steps nearly verbatim from the<br>
> tutorial video, the lazy migration restore command fails when using the<br>
> build of CRIU from master, with the following error:<br>
> $ pie: <procnum>: Error (criu/pie/util-vdso.c:97): vdso: ELF header<br>
> magic mismatch<br>
<br>
</div></div>This seems to be a real bug introduced somewhere between v3.6 and v3.7.<br>
With 3.7 I get the same error as you and with 3.6 migration of a simple<br>
process works fine.<br>
<span class=""><br>
> When running the same migration steps in the criu-dev branch, I encountered<br>
> the following different errors when attempting to start the lazy-pages<br>
> daemon (never being able to reach the proper restore step:<br>
> $ Error (criu/util.c:703): Can't read link of fd -404: No such file or<br>
> directory<br>
> $ Error (criu/protobuf.c:75): Unexpected EOF on (null)<br>
<br>
</span>And these are reported when CRIU cannot find the image files, so it is<br>
probably related to NFS setup or something like that.<br>
Can you verify that when you see -404 error the images are present at<br>
/mnt/dump on the destination and are readable with crit?<br>
<span class=""><br>
> After running the daemon through GDB, I've determined that the -404 file<br>
> descriptor is being produced erroneously via image.c by incorrectly reading<br>
> the lazy flags (or something of that sort, since the lazy variable, set<br>
> here<br>
> <a href="https://github.com/checkpoint-restore/criu/blob/51c4dc7c25b8687b455675dc33d45fbfd7a99689/criu/image.c#L254" rel="noreferrer" target="_blank">https://github.com/checkpoint-<wbr>restore/criu/blob/<wbr>51c4dc7c25b8687b455675dc33d45f<wbr>bfd7a99689/criu/image.c#L254</a><br>
> is never made true).<br>
><br>
> I'm unsure if this is an issue with my command setup (which was only<br>
> slightly modified from the example commands) something wrong with my<br>
> environment, or a current bug, but if any assistance could be provided on<br>
> any of these possibilities, it would be much appreciated.<br>
><br>
> Also, if there are any suggestions on available dev environment setups for<br>
> working on CRIU/P.Haul, they are also welcome.<br>
<br>
</span>The environment you described looks fine, at least for core CRIU<br>
development. I'm not really expert in Python and golang to recommend<br>
anything for P.Haul<br>
<br>
I'd suggest you to re-try lazy migration of your example with CRIU v3.6. If<br>
it works you'll be able to start working on P.Haul until the bug with vdso<br>
will be fixed in criu-dev and master<br>
<br>
> Best,<br>
> John Goen<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
Sincerely yours,<br>
Mike.<br>
<br>
</font></span></blockquote></div><br></div>