[CRIU] Debugging checkpoint failure

Saied Kazemi saied at google.com
Fri Apr 3 15:13:06 PDT 2015


I haven't scoped it out, so I don't know either.  If you're lucky, could be
quick.  From your earlier email, I'd say that the starting point would be
to see how the AUFS VMA paths look inside the container.  I tried to
provide comments both in the commit logs as well as in the code as to what
the issue is and how we're compensating for it.  Hope the comments will be
of help :)

--Saied


On Fri, Apr 3, 2015 at 1:55 PM, Ross Boucher <rboucher at gmail.com> wrote:

> I'm starting to look into it, but out of curiosity do you have any idea of
> the scope of making this work or what the main difficulties will be? I'm
> pretty much entirely unfamiliar with all the parts here so I'm learning as
> I go.
>
> On Fri, Apr 3, 2015 at 1:52 PM, Saied Kazemi <saied at google.com> wrote:
>
>> Thanks for confirming that you can successfully C/R in the global
>> namespace.
>>
>> We haven't tried C/R inside a container and am afraid that it'd be some
>> time before we get to it.  If interested, you're welcome to work on it in
>> the meantime.  Your help would definitely be appreciated.
>>
>> --Saied
>>
>>
>> On Fri, Apr 3, 2015 at 1:38 PM, Ross Boucher <rboucher at gmail.com> wrote:
>>
>>> Just wanted to confirm that I was able to get everything running
>>> directly on the host and able to run the same test you posted. So the issue
>>> is running inside of a docker container.
>>>
>>> On Fri, Apr 3, 2015 at 8:57 AM, Ross Boucher <rboucher at gmail.com> wrote:
>>>
>>>> I'm running criu through the nsinit checkpoint command, but I'm running
>>>> both inside of a Docker container -- that is, I've used the Dockerfile in
>>>> libcontainer to build an image that has criu and nsinit installed, and then
>>>> I launch a container with that image, and inside of that container I then
>>>> ran the nsinit commands to create another container and try to checkpoint
>>>> it.
>>>>
>>>> The problem, I think, is at the Docker layer, based on the aufs items
>>>> in the dump.log?
>>>>
>>>> On Thu, Apr 2, 2015 at 8:45 PM, Saied Kazemi <saied at google.com> wrote:
>>>>
>>>>> Hi Ross,
>>>>>
>>>>> I just tried to re-produce the problem that you see by creating my
>>>>> container's root filesystem at / but both checkpoint and restore succeeded
>>>>> for me (see below).  How exactly are you setting up your container?  Are
>>>>> you running CRIU directly or are you using nsinit to checkpoint your
>>>>> container?
>>>>>
>>>>> Please note that due to a host of other activities, unfortunately I
>>>>> won't be able to respond quickly.
>>>>>
>>>>> --Saied
>>>>>
>>>>> [Terminal A]
>>>>> # cd /
>>>>> # curl -sSL 'https://github.com/jpetazzo/docker-busybox
>>>>> /raw/buildroot-2014.02/rootfs.tar' > /tmp/rootfs.tar
>>>>> # cat /tmp/rootfs.tar | tar -xC "$container_dir" 2> /dev/null
>>>>> # cd /busybox
>>>>> # nsinit exec -- sh -i
>>>>> sh: can't access tty; job control turned off
>>>>> / #
>>>>>
>>>>> [Terminal B]
>>>>> # cd /busybox
>>>>> # nsinit -criu /usr/local/bin/criu checkpoint
>>>>> #
>>>>>
>>>>> [Terminal A]
>>>>> # mount --rbind /busybox /busybox
>>>>> # cd /busybox
>>>>> # nsinit -criu /usr/local/bin/criu restore
>>>>> / #
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Apr 2, 2015 at 4:51 PM, Ross Boucher <rboucher at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'm running into a failure when trying to checkpoint (with the as of
>>>>>> yet unmerged support being worked on libcontainer). The command I'm running
>>>>>> is:
>>>>>>
>>>>>> >>> criu dump -v4 -D /busybox/nsinit/checkpoint -o dump.log --root
>>>>>> /busybox --manage-cgroups --evasive-devices -t 51
>>>>>>
>>>>>> Which eventually ends up with this error:
>>>>>>
>>>>>> (00.017527) Replacing
>>>>>> /var/lib/docker/aufs/diff/df29663e06373227d372bdd13867dbdce00bf02dac9627679ee3d4985021beb0/busybox/bin/busybox
>>>>>> with /busybox/bin/busybox
>>>>>> (00.017537) Saved AUFS paths ./busybox/bin/busybox and
>>>>>> /busybox/busybox/bin/busybox
>>>>>> (00.017575) Error (sysfs_parse.c:304): Failed stat on map 400000
>>>>>> (/busybox/busybox/bin/busybox): No such file or directory
>>>>>> (00.017583) Error (proc_parse.c:589): Can't open 51's mapfile link
>>>>>> 400000: No such file or directory
>>>>>>
>>>>>> (full dump: https://gist.github.com/boucher/d2539bcaa1311c5d5f5c)
>>>>>>
>>>>>> I can see in the criu source that this is part of potentially trying
>>>>>> to workaround a kernel bug (according to the comments around
>>>>>> fixup_aufs_vma_fd). I'm not entirely sure what the expected behavior is, or
>>>>>> why /busybox/bin/busybox seems to get turned into the relative version
>>>>>> ./busybox/bin/busybox.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> CRIU mailing list
>>>>>> CRIU at openvz.org
>>>>>> https://lists.openvz.org/mailman/listinfo/criu
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150403/c53d21d3/attachment.html>


More information about the CRIU mailing list