[CRIU] Lazy Migration Failure and Environment Questions

Dmitry Safonov 0x7f454c46 at gmail.com
Tue Jan 23 15:47:28 MSK 2018


2018-01-23 4:01 GMT+00:00 John Goen <jtgoen at gmail.com>:
> Hello,
>
> Since this is a development related issue, I wasn't sure if I should post
> this here or in the GitHub issues, so I opted to post here. Please let me
> know if this fits better elsewhere.
>
> I've been working on getting my development environment set up to best
> implement lazy migration support for P.Haul, but have run into a few snags
> related to CRIU's lazy migration feature: namely that it fails in different
> ways both on master and criu-dev branches, which I will outline here.
>
> Environment Setup:
>     I am using vagrant for all of my VM configuration and 'spin-up'. The
> general configurations are as follows:
>         OS: Ubuntu Xenial 64-bit
>         Kernel: 4.13.0-26-generic (upgraded via apt's packages from 4.4)
>         RAM: 4 GB
>
>     I run 2 of these VMs with the following setup:
>         criuSrc:
>             -has NFS mount setup at /mnt/
>         criuDest:
>             -has mapped the /mnt/ directory of criuSrc via NFS
>
>     I have the following packages installed on each VM as per the
> suggestions on the wiki:
>         libprotobuf-dev libprotobuf-c0-dev protobuf-c-compiler
> protobuf-compiler python-protobuf \
>         pkg-config python-ipaddr iproute2 libcap-dev libnl-3-dev libnet-dev
> libaio-dev python-yaml asciidoc xmlto --no-install-recommends
>
> With this setup (setup files here if comfortable with vagrant:
> https://github.com/jtgoen/vagrant/tree/master/criu), I am able to run simple
> single-dump and iterative migrations of a simple looping test program
> similar to the one found in this example video for lazy migration without
> issue:
>     https://asciinema.org/a/146427
>     (the github link above has my commands for simple/iterative/lazy
> migration outlined in simple-proc-live-migration.txt)
>
> However, when performing the lazy migration steps nearly verbatim from the
> tutorial video, the lazy migration restore command fails when using the
> build of CRIU from master, with the following error:
>     $ pie: <procnum>: Error (criu/pie/util-vdso.c:97): vdso: ELF header
> magic mismatch

Ugh, that should be my bug somewhere..
Do you try to migrate an app from a node with vdso vma to a node with
vdso=none kernel cmdline? ITOW, do you have vdso linked on the dest
node (if you cat /proc/self/maps?)
Could you provide a full restore.log? And maybe anything particular how-to
reproduce it?

> When running the same migration steps in the criu-dev branch, I encountered
> the following different errors when attempting to start the lazy-pages
> daemon (never being able to reach the proper restore step:
>     $ Error (criu/util.c:703): Can't read link of fd -404: No such file or
> directory
>     $ Error (criu/protobuf.c:75): Unexpected EOF on (null)
>
> After running the daemon through GDB, I've determined that the -404 file
> descriptor is being produced erroneously via image.c by incorrectly reading
> the lazy flags (or something of that sort, since the lazy variable, set here
> https://github.com/checkpoint-restore/criu/blob/51c4dc7c25b8687b455675dc33d45fbfd7a99689/criu/image.c#L254
> is never made true).

+CC Mike, he has a lazy-migration kung-fu ribbon

> I'm unsure if this is an issue with my command setup (which was only
> slightly modified from the example commands) something wrong with my
> environment, or a current bug, but if any assistance could be provided on
> any of these possibilities, it would be much appreciated.
>
> Also, if there are any suggestions on available dev environment setups for
> working on CRIU/P.Haul, they are also welcome.


-- 
             Dmitry


More information about the CRIU mailing list