[CRIU] Lazy Migration Failure and Environment Questions

John Goen jtgoen at gmail.com
Wed Jan 24 02:35:09 MSK 2018


Just as a follow-up, I tried reproducing the -404 related errors this
morning with no luck, so it must have been a temporary (albeit frustrating)
hiccup in the NFS connection that was causing issues with reading the image
files.

The aforementioned ELF bug of course is still present in both master and
criu-dev, but as you mentioned is not present prior to 3.7, so I'll be
using that as my base going forward with P.Haul development until this bug
is resolved.

Thank you all for the assistance!

Best,
John Goen

On Tue, Jan 23, 2018 at 6:38 AM, Mike Rapoport <rppt at linux.vnet.ibm.com>
wrote:

> On Mon, Jan 22, 2018 at 09:01:25PM -0700, John Goen wrote:
> > Hello,
> >
> > Since this is a development related issue, I wasn't sure if I should post
> > this here or in the GitHub issues, so I opted to post here. Please let me
> > know if this fits better elsewhere.
> >
> > I've been working on getting my development environment set up to best
> > implement lazy migration support for P.Haul, but have run into a few
> snags
> > related to CRIU's lazy migration feature: namely that it fails in
> different
> > ways both on master and criu-dev branches, which I will outline here.
> >
> > Environment Setup:
> >     I am using vagrant for all of my VM configuration and 'spin-up'. The
> > general configurations are as follows:
> >         OS: Ubuntu Xenial 64-bit
> >         Kernel: 4.13.0-26-generic (upgraded via apt's packages from 4.4)
> >         RAM: 4 GB
> >
> >     I run 2 of these VMs with the following setup:
> >         criuSrc:
> >             -has NFS mount setup at /mnt/
> >         criuDest:
> >             -has mapped the /mnt/ directory of criuSrc via NFS
> >
> >     I have the following packages installed on each VM as per the
> > suggestions on the wiki:
> >         libprotobuf-dev libprotobuf-c0-dev protobuf-c-compiler
> > protobuf-compiler python-protobuf \
> >         pkg-config python-ipaddr iproute2 libcap-dev libnl-3-dev
> libnet-dev
> > libaio-dev python-yaml asciidoc xmlto --no-install-recommends
> >
> > With this setup (setup files here if comfortable with vagrant:
> > https://github.com/jtgoen/vagrant/tree/master/criu), I am able to run
> > simple single-dump and iterative migrations of a simple looping test
> > program similar to the one found in this example video for lazy migration
> > without issue:
> >     https://asciinema.org/a/146427
> >     (the github link above has my commands for simple/iterative/lazy
> > migration outlined in simple-proc-live-migration.txt)
> >
> > However, when performing the lazy migration steps nearly verbatim from
> the
> > tutorial video, the lazy migration restore command fails when using the
> > build of CRIU from master, with the following error:
> >     $ pie: <procnum>: Error (criu/pie/util-vdso.c:97): vdso: ELF header
> > magic mismatch
>
> This seems to be a real bug introduced somewhere between v3.6 and v3.7.
> With 3.7 I get the same error as you and with 3.6 migration of a simple
> process works fine.
>
> > When running the same migration steps in the criu-dev branch, I
> encountered
> > the following different errors when attempting to start the lazy-pages
> > daemon (never being able to reach the proper restore step:
> >     $ Error (criu/util.c:703): Can't read link of fd -404: No such file
> or
> > directory
> >     $ Error (criu/protobuf.c:75): Unexpected EOF on (null)
>
> And these are reported when CRIU cannot find the image files, so it is
> probably related to NFS setup or something like that.
> Can you verify that when you see -404 error the images are present at
> /mnt/dump on the destination and are readable with crit?
>
> > After running the daemon through GDB, I've determined that the -404 file
> > descriptor is being produced erroneously via image.c by incorrectly
> reading
> > the lazy flags (or something of that sort, since the lazy variable, set
> > here
> > https://github.com/checkpoint-restore/criu/blob/
> 51c4dc7c25b8687b455675dc33d45fbfd7a99689/criu/image.c#L254
> > is never made true).
> >
> > I'm unsure if this is an issue with my command setup (which was only
> > slightly modified from the example commands) something wrong with my
> > environment, or a current bug, but if any assistance could be provided on
> > any of these possibilities, it would be much appreciated.
> >
> > Also, if there are any suggestions on available dev environment setups
> for
> > working on CRIU/P.Haul, they are also welcome.
>
> The environment you described looks fine, at least for core CRIU
> development. I'm not really expert in Python and golang to recommend
> anything for P.Haul
>
> I'd suggest you to re-try lazy migration of your example with CRIU v3.6. If
> it works you'll be able to start working on P.Haul until the bug with vdso
> will be fixed in criu-dev and master
>
> > Best,
> > John Goen
>
> --
> Sincerely yours,
> Mike.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20180123/39a5d097/attachment.html>


More information about the CRIU mailing list