[CRIU] checkpointing a docker container and restoring the process to a new container

Ross Boucher rboucher at gmail.com
Mon Apr 27 15:55:02 PDT 2015


Just wanted to follow up here. The issue turned out to be that I was
providing the wrong pipe id to inherit_fd (resolved here:
https://github.com/docker/libcontainer/pull/557)

On Fri, Apr 24, 2015 at 8:43 AM, Ross Boucher <rboucher at gmail.com> wrote:

> Using a checkpointed file system doesn't seem to make a difference.
>
> On Fri, Apr 24, 2015 at 8:26 AM, Ross Boucher <rboucher at gmail.com> wrote:
>
>> The containers are started from the same image and don't write to the
>> filesystem (though I suppose something somewhere could be writing without
>> my knowledge).
>>
>> My next step was to use docker commit to checkpoint the filesystem as
>> well, and then create the new container based on that image. I'll try that
>> and see if it changes anything, even though I don't expect it to.
>>
>> On Fri, Apr 24, 2015 at 8:23 AM, Pavel Emelyanov <xemul at parallels.com>
>> wrote:
>>
>>> On 04/24/2015 06:12 PM, Ross Boucher wrote:
>>> > Yeah, but I think there are other problems as well. I'm trying the
>>> same restore process with
>>> > a more complex program and seeing odd behavior: the process gets
>>> restored, but it seems to be
>>> > hung. I have a thread in this program that just prints in a loop every
>>> second and it never
>>> > prints after being restored (again, this works fine if I restore into
>>> the same container).
>>>
>>> Hm... How do you make sure the filesystem of the container you restore
>>> into equals
>>> the filesystem of the container you dumped from?
>>>
>>> The thing is -- if at least one byte in some library changes, criu
>>> doesn't notice it
>>> (as it doesn't mess with filesystems) and maps them back into processes.
>>> They _can_
>>> break due to this. E.g. if you have prelink running in container, it can
>>> make vary
>>> nasty stuff :)
>>>
>>> -- Pavel
>>>
>>> > On Fri, Apr 24, 2015 at 6:59 AM, Pavel Emelyanov <xemul at parallels.com
>>> <mailto:xemul at parallels.com>> wrote:
>>> >
>>> >     On 04/24/2015 04:47 PM, Ross Boucher wrote:
>>> >     > inherit_fd is being used -- this example works fine if I restore
>>> to the same container,
>>> >     > it's only breaking now that I'm attempting to restore into a
>>> completely different container.
>>> >
>>> >     So the pipe doesn't get inherited when you restore into different
>>> container?
>>> >
>>> >     > On Fri, Apr 24, 2015 at 4:50 AM, Pavel Emelyanov <
>>> xemul at parallels.com <mailto:xemul at parallels.com> <mailto:
>>> xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
>>> >     >
>>> >     >     On 04/24/2015 12:11 AM, Ross Boucher wrote:
>>> >     >     > Another update: I was intrigued by the exit code (which
>>> implies SIGPIPE?), since the docker process
>>> >     >     > I was running was indeed piping:
>>> >     >     >
>>> >     >     >     /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i +
>>> 1); sleep 3; done'
>>> >     >     >
>>> >     >     > I tried the same process of checkpointing in one container
>>> and restoring to another by writing to a file instead:
>>> >     >     >
>>> >     >     >     /bin/sh -c 'i=0; while true; do echo $i > /tmp/foo;
>>> i=$(expr $i + 1); sleep 3; done'
>>> >     >     >
>>> >     >     > And this worked correctly! So I've narrowed it done some
>>> more, and I'll continue to look into it.
>>> >     >
>>> >     >     If these are pipes indeed (docker terminals?) then the
>>> --inherit-fd option should be used.
>>> >     >     Saied (from Google) did some work doing this for
>>> docker+criu, he can shed more light, but
>>> >     >     he's on vacation right now :)
>>> >     >
>>> >     >     -- Pavel
>>> >     >
>>> >     >
>>> >
>>> >
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150427/323516ad/attachment-0001.html>


More information about the CRIU mailing list