[CRIU] Lazy Migration Failure and Environment Questions

John Goen jtgoen at gmail.com
Tue Jan 23 07:01:25 MSK 2018


Hello,

Since this is a development related issue, I wasn't sure if I should post
this here or in the GitHub issues, so I opted to post here. Please let me
know if this fits better elsewhere.

I've been working on getting my development environment set up to best
implement lazy migration support for P.Haul, but have run into a few snags
related to CRIU's lazy migration feature: namely that it fails in different
ways both on master and criu-dev branches, which I will outline here.

Environment Setup:
    I am using vagrant for all of my VM configuration and 'spin-up'. The
general configurations are as follows:
        OS: Ubuntu Xenial 64-bit
        Kernel: 4.13.0-26-generic (upgraded via apt's packages from 4.4)
        RAM: 4 GB

    I run 2 of these VMs with the following setup:
        criuSrc:
            -has NFS mount setup at /mnt/
        criuDest:
            -has mapped the /mnt/ directory of criuSrc via NFS

    I have the following packages installed on each VM as per the
suggestions on the wiki:
        libprotobuf-dev libprotobuf-c0-dev protobuf-c-compiler
protobuf-compiler python-protobuf \
        pkg-config python-ipaddr iproute2 libcap-dev libnl-3-dev libnet-dev
libaio-dev python-yaml asciidoc xmlto --no-install-recommends

With this setup (setup files here if comfortable with vagrant:
https://github.com/jtgoen/vagrant/tree/master/criu), I am able to run
simple single-dump and iterative migrations of a simple looping test
program similar to the one found in this example video for lazy migration
without issue:
    https://asciinema.org/a/146427
    (the github link above has my commands for simple/iterative/lazy
migration outlined in simple-proc-live-migration.txt)

However, when performing the lazy migration steps nearly verbatim from the
tutorial video, the lazy migration restore command fails when using the
build of CRIU from master, with the following error:
    $ pie: <procnum>: Error (criu/pie/util-vdso.c:97): vdso: ELF header
magic mismatch

When running the same migration steps in the criu-dev branch, I encountered
the following different errors when attempting to start the lazy-pages
daemon (never being able to reach the proper restore step:
    $ Error (criu/util.c:703): Can't read link of fd -404: No such file or
directory
    $ Error (criu/protobuf.c:75): Unexpected EOF on (null)

After running the daemon through GDB, I've determined that the -404 file
descriptor is being produced erroneously via image.c by incorrectly reading
the lazy flags (or something of that sort, since the lazy variable, set
here
https://github.com/checkpoint-restore/criu/blob/51c4dc7c25b8687b455675dc33d45fbfd7a99689/criu/image.c#L254
is never made true).

I'm unsure if this is an issue with my command setup (which was only
slightly modified from the example commands) something wrong with my
environment, or a current bug, but if any assistance could be provided on
any of these possibilities, it would be much appreciated.

Also, if there are any suggestions on available dev environment setups for
working on CRIU/P.Haul, they are also welcome.

Best,
John Goen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20180122/bd8dbc65/attachment.html>


More information about the CRIU mailing list