[CRIU] checkpointing a docker container and restoring the process to a new container
Saied Kazemi
saied at google.com
Tue May 12 03:55:46 PDT 2015
Hi Ross,
As Pavel mentioned, I was out on vacation and have just started
catching up with a ton of email...
I assume that you have fixed the issue by now (haven't looked at
Github yet). FWIW, however, the new container exits because its
standard descriptors (pipes) are not properly set up. This is because
--inherit-fd replaces the "old fd" with the new one to be inherited.
Since you are restoring to a brand new container, there is no "old fd"
and, therefore, --inherit-fd doesn't do anything which means the
process's standard file descriptors are not properly set up, hence the
SIGPIPE.
Sorry for the rant if you've already resolved the issue.
--Saied
On Mon, Apr 27, 2015 at 3:55 PM, Ross Boucher <rboucher at gmail.com> wrote:
> Just wanted to follow up here. The issue turned out to be that I was
> providing the wrong pipe id to inherit_fd (resolved here:
> https://github.com/docker/libcontainer/pull/557)
>
> On Fri, Apr 24, 2015 at 8:43 AM, Ross Boucher <rboucher at gmail.com> wrote:
>>
>> Using a checkpointed file system doesn't seem to make a difference.
>>
>> On Fri, Apr 24, 2015 at 8:26 AM, Ross Boucher <rboucher at gmail.com> wrote:
>>>
>>> The containers are started from the same image and don't write to the
>>> filesystem (though I suppose something somewhere could be writing without my
>>> knowledge).
>>>
>>> My next step was to use docker commit to checkpoint the filesystem as
>>> well, and then create the new container based on that image. I'll try that
>>> and see if it changes anything, even though I don't expect it to.
>>>
>>> On Fri, Apr 24, 2015 at 8:23 AM, Pavel Emelyanov <xemul at parallels.com>
>>> wrote:
>>>>
>>>> On 04/24/2015 06:12 PM, Ross Boucher wrote:
>>>> > Yeah, but I think there are other problems as well. I'm trying the
>>>> > same restore process with
>>>> > a more complex program and seeing odd behavior: the process gets
>>>> > restored, but it seems to be
>>>> > hung. I have a thread in this program that just prints in a loop every
>>>> > second and it never
>>>> > prints after being restored (again, this works fine if I restore into
>>>> > the same container).
>>>>
>>>> Hm... How do you make sure the filesystem of the container you restore
>>>> into equals
>>>> the filesystem of the container you dumped from?
>>>>
>>>> The thing is -- if at least one byte in some library changes, criu
>>>> doesn't notice it
>>>> (as it doesn't mess with filesystems) and maps them back into processes.
>>>> They _can_
>>>> break due to this. E.g. if you have prelink running in container, it can
>>>> make vary
>>>> nasty stuff :)
>>>>
>>>> -- Pavel
>>>>
>>>> > On Fri, Apr 24, 2015 at 6:59 AM, Pavel Emelyanov <xemul at parallels.com
>>>> > <mailto:xemul at parallels.com>> wrote:
>>>> >
>>>> > On 04/24/2015 04:47 PM, Ross Boucher wrote:
>>>> > > inherit_fd is being used -- this example works fine if I restore
>>>> > to the same container,
>>>> > > it's only breaking now that I'm attempting to restore into a
>>>> > completely different container.
>>>> >
>>>> > So the pipe doesn't get inherited when you restore into different
>>>> > container?
>>>> >
>>>> > > On Fri, Apr 24, 2015 at 4:50 AM, Pavel Emelyanov
>>>> > <xemul at parallels.com <mailto:xemul at parallels.com>
>>>> > <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
>>>> > >
>>>> > > On 04/24/2015 12:11 AM, Ross Boucher wrote:
>>>> > > > Another update: I was intrigued by the exit code (which
>>>> > implies SIGPIPE?), since the docker process
>>>> > > > I was running was indeed piping:
>>>> > > >
>>>> > > > /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i +
>>>> > 1); sleep 3; done'
>>>> > > >
>>>> > > > I tried the same process of checkpointing in one container
>>>> > and restoring to another by writing to a file instead:
>>>> > > >
>>>> > > > /bin/sh -c 'i=0; while true; do echo $i > /tmp/foo;
>>>> > i=$(expr $i + 1); sleep 3; done'
>>>> > > >
>>>> > > > And this worked correctly! So I've narrowed it done some
>>>> > more, and I'll continue to look into it.
>>>> > >
>>>> > > If these are pipes indeed (docker terminals?) then the
>>>> > --inherit-fd option should be used.
>>>> > > Saied (from Google) did some work doing this for
>>>> > docker+criu, he can shed more light, but
>>>> > > he's on vacation right now :)
>>>> > >
>>>> > > -- Pavel
>>>> > >
>>>> > >
>>>> >
>>>> >
>>>>
>>>
>>
>
>
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
>
More information about the CRIU
mailing list