[CRIU] checkpointing a docker container and restoring the process to a new container

Ross Boucher rboucher at gmail.com
Tue May 12 09:50:34 PDT 2015


Yes, thanks, that was the issue and I pushed a fix which is in the
libcontainer criu branch. I have my images restoring into completely new
containers reliably now.

On Tue, May 12, 2015 at 3:55 AM, Saied Kazemi <saied at google.com> wrote:

> Hi Ross,
>
> As Pavel mentioned, I was out on vacation and have just started
> catching up with a ton of email...
>
> I assume that you have fixed the issue by now (haven't looked at
> Github yet).  FWIW, however, the new container exits because its
> standard descriptors (pipes) are not properly set up.  This is because
> --inherit-fd replaces the "old fd" with the new one to be inherited.
> Since you are restoring to a brand new container, there is no "old fd"
> and, therefore, --inherit-fd doesn't do anything which means the
> process's standard file descriptors are not properly set up, hence the
> SIGPIPE.
>
> Sorry for the rant if you've already resolved the issue.
>
> --Saied
>
>
> On Mon, Apr 27, 2015 at 3:55 PM, Ross Boucher <rboucher at gmail.com> wrote:
> > Just wanted to follow up here. The issue turned out to be that I was
> > providing the wrong pipe id to inherit_fd (resolved here:
> > https://github.com/docker/libcontainer/pull/557)
> >
> > On Fri, Apr 24, 2015 at 8:43 AM, Ross Boucher <rboucher at gmail.com>
> wrote:
> >>
> >> Using a checkpointed file system doesn't seem to make a difference.
> >>
> >> On Fri, Apr 24, 2015 at 8:26 AM, Ross Boucher <rboucher at gmail.com>
> wrote:
> >>>
> >>> The containers are started from the same image and don't write to the
> >>> filesystem (though I suppose something somewhere could be writing
> without my
> >>> knowledge).
> >>>
> >>> My next step was to use docker commit to checkpoint the filesystem as
> >>> well, and then create the new container based on that image. I'll try
> that
> >>> and see if it changes anything, even though I don't expect it to.
> >>>
> >>> On Fri, Apr 24, 2015 at 8:23 AM, Pavel Emelyanov <xemul at parallels.com>
> >>> wrote:
> >>>>
> >>>> On 04/24/2015 06:12 PM, Ross Boucher wrote:
> >>>> > Yeah, but I think there are other problems as well. I'm trying the
> >>>> > same restore process with
> >>>> > a more complex program and seeing odd behavior: the process gets
> >>>> > restored, but it seems to be
> >>>> > hung. I have a thread in this program that just prints in a loop
> every
> >>>> > second and it never
> >>>> > prints after being restored (again, this works fine if I restore
> into
> >>>> > the same container).
> >>>>
> >>>> Hm... How do you make sure the filesystem of the container you restore
> >>>> into equals
> >>>> the filesystem of the container you dumped from?
> >>>>
> >>>> The thing is -- if at least one byte in some library changes, criu
> >>>> doesn't notice it
> >>>> (as it doesn't mess with filesystems) and maps them back into
> processes.
> >>>> They _can_
> >>>> break due to this. E.g. if you have prelink running in container, it
> can
> >>>> make vary
> >>>> nasty stuff :)
> >>>>
> >>>> -- Pavel
> >>>>
> >>>> > On Fri, Apr 24, 2015 at 6:59 AM, Pavel Emelyanov <
> xemul at parallels.com
> >>>> > <mailto:xemul at parallels.com>> wrote:
> >>>> >
> >>>> >     On 04/24/2015 04:47 PM, Ross Boucher wrote:
> >>>> >     > inherit_fd is being used -- this example works fine if I
> restore
> >>>> > to the same container,
> >>>> >     > it's only breaking now that I'm attempting to restore into a
> >>>> > completely different container.
> >>>> >
> >>>> >     So the pipe doesn't get inherited when you restore into
> different
> >>>> > container?
> >>>> >
> >>>> >     > On Fri, Apr 24, 2015 at 4:50 AM, Pavel Emelyanov
> >>>> > <xemul at parallels.com <mailto:xemul at parallels.com>
> >>>> > <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
> >>>> >     >
> >>>> >     >     On 04/24/2015 12:11 AM, Ross Boucher wrote:
> >>>> >     >     > Another update: I was intrigued by the exit code (which
> >>>> > implies SIGPIPE?), since the docker process
> >>>> >     >     > I was running was indeed piping:
> >>>> >     >     >
> >>>> >     >     >     /bin/sh -c 'i=0; while true; do echo $i; i=$(expr
> $i +
> >>>> > 1); sleep 3; done'
> >>>> >     >     >
> >>>> >     >     > I tried the same process of checkpointing in one
> container
> >>>> > and restoring to another by writing to a file instead:
> >>>> >     >     >
> >>>> >     >     >     /bin/sh -c 'i=0; while true; do echo $i > /tmp/foo;
> >>>> > i=$(expr $i + 1); sleep 3; done'
> >>>> >     >     >
> >>>> >     >     > And this worked correctly! So I've narrowed it done some
> >>>> > more, and I'll continue to look into it.
> >>>> >     >
> >>>> >     >     If these are pipes indeed (docker terminals?) then the
> >>>> > --inherit-fd option should be used.
> >>>> >     >     Saied (from Google) did some work doing this for
> >>>> > docker+criu, he can shed more light, but
> >>>> >     >     he's on vacation right now :)
> >>>> >     >
> >>>> >     >     -- Pavel
> >>>> >     >
> >>>> >     >
> >>>> >
> >>>> >
> >>>>
> >>>
> >>
> >
> >
> > _______________________________________________
> > CRIU mailing list
> > CRIU at openvz.org
> > https://lists.openvz.org/mailman/listinfo/criu
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150512/c78f824d/attachment-0001.html>


More information about the CRIU mailing list