[CRIU] Lazy Migration Failure and Environment Questions

Mike Rapoport rppt at linux.vnet.ibm.com
Tue Jan 23 16:38:34 MSK 2018


On Mon, Jan 22, 2018 at 09:01:25PM -0700, John Goen wrote:
> Hello,
> 
> Since this is a development related issue, I wasn't sure if I should post
> this here or in the GitHub issues, so I opted to post here. Please let me
> know if this fits better elsewhere.
> 
> I've been working on getting my development environment set up to best
> implement lazy migration support for P.Haul, but have run into a few snags
> related to CRIU's lazy migration feature: namely that it fails in different
> ways both on master and criu-dev branches, which I will outline here.
> 
> Environment Setup:
>     I am using vagrant for all of my VM configuration and 'spin-up'. The
> general configurations are as follows:
>         OS: Ubuntu Xenial 64-bit
>         Kernel: 4.13.0-26-generic (upgraded via apt's packages from 4.4)
>         RAM: 4 GB
> 
>     I run 2 of these VMs with the following setup:
>         criuSrc:
>             -has NFS mount setup at /mnt/
>         criuDest:
>             -has mapped the /mnt/ directory of criuSrc via NFS
> 
>     I have the following packages installed on each VM as per the
> suggestions on the wiki:
>         libprotobuf-dev libprotobuf-c0-dev protobuf-c-compiler
> protobuf-compiler python-protobuf \
>         pkg-config python-ipaddr iproute2 libcap-dev libnl-3-dev libnet-dev
> libaio-dev python-yaml asciidoc xmlto --no-install-recommends
> 
> With this setup (setup files here if comfortable with vagrant:
> https://github.com/jtgoen/vagrant/tree/master/criu), I am able to run
> simple single-dump and iterative migrations of a simple looping test
> program similar to the one found in this example video for lazy migration
> without issue:
>     https://asciinema.org/a/146427
>     (the github link above has my commands for simple/iterative/lazy
> migration outlined in simple-proc-live-migration.txt)
> 
> However, when performing the lazy migration steps nearly verbatim from the
> tutorial video, the lazy migration restore command fails when using the
> build of CRIU from master, with the following error:
>     $ pie: <procnum>: Error (criu/pie/util-vdso.c:97): vdso: ELF header
> magic mismatch

This seems to be a real bug introduced somewhere between v3.6 and v3.7.
With 3.7 I get the same error as you and with 3.6 migration of a simple
process works fine.
 
> When running the same migration steps in the criu-dev branch, I encountered
> the following different errors when attempting to start the lazy-pages
> daemon (never being able to reach the proper restore step:
>     $ Error (criu/util.c:703): Can't read link of fd -404: No such file or
> directory
>     $ Error (criu/protobuf.c:75): Unexpected EOF on (null)

And these are reported when CRIU cannot find the image files, so it is
probably related to NFS setup or something like that.
Can you verify that when you see -404 error the images are present at
/mnt/dump on the destination and are readable with crit?
 
> After running the daemon through GDB, I've determined that the -404 file
> descriptor is being produced erroneously via image.c by incorrectly reading
> the lazy flags (or something of that sort, since the lazy variable, set
> here
> https://github.com/checkpoint-restore/criu/blob/51c4dc7c25b8687b455675dc33d45fbfd7a99689/criu/image.c#L254
> is never made true).
> 
> I'm unsure if this is an issue with my command setup (which was only
> slightly modified from the example commands) something wrong with my
> environment, or a current bug, but if any assistance could be provided on
> any of these possibilities, it would be much appreciated.
> 
> Also, if there are any suggestions on available dev environment setups for
> working on CRIU/P.Haul, they are also welcome.

The environment you described looks fine, at least for core CRIU
development. I'm not really expert in Python and golang to recommend
anything for P.Haul

I'd suggest you to re-try lazy migration of your example with CRIU v3.6. If
it works you'll be able to start working on P.Haul until the bug with vdso
will be fixed in criu-dev and master
 
> Best,
> John Goen

-- 
Sincerely yours,
Mike.



More information about the CRIU mailing list