[CRIU] docker restore from checkpoint - cgroup and mountpoints error
Andrew Vagin
avagin at virtuozzo.com
Fri Jul 22 14:01:30 PDT 2016
On Sat, Jul 16, 2016 at 10:15:06PM -0400, vikram kaul wrote:
> Andrew, Ross, others
>
> I took the hint of 'container init scripts' and added the apache startup
> scripts to the init (CMD in dockerfile) so that it starts the service instead
> of I having to docker exec it. By doing that, the checkpoint and restore worked
> as I wanted it.
>
> So, this lead me experiment with a simple setup where I instantiate the source
> container and then later on exec the sample program (tcpdump in the background)
> as
>
> docker exec -d test-xenial-apache tcpdump -i any -nn -s 0 -w /tmp/f.pcap
>
> Now, I try to just checkpoint it. It does work, but I don't see this process
> being checkpointed. What I have on the host is
>
> root 10908 10884 0 21:32 ? 00:00:00 bash start.sh
> root 10978 10908 0 21:32 ? 00:00:00 /usr/sbin/apache2 -k start
> root 11053 10908 0 21:32 ? 00:00:00 tail -f /dev/null
> root 12499 12482 0 21:36 ? 00:00:00 tcpdump -i eth0 -nn -s 0 -w /
> tmp/f.pcap
>
> And what I have in the container is:
>
> root 1 0 0 01:32 ? 00:00:00 bash start.sh
> root 29 1 0 01:32 ? 00:00:00 /usr/sbin/apache2 -k start
> www-data 32 29 0 01:32 ? 00:00:00 /usr/sbin/apache2 -k start
> www-data 33 29 0 01:32 ? 00:00:00 /usr/sbin/apache2 -k start
> root 90 1 0 01:32 ? 00:00:00 tail -f /dev/null
> root 91 0 0 01:36 ? 00:00:00 tcpdump -i eth0 -nn -s 0 -w /
> tmp
Here you can see that the parent PID for tcpdump is 0, what means that
this process was executed externally (by using docker exec)
>
> But the only 'dumping' in /var/lib/docker/containers/<CID>/checkpoints/
> <CHECKPT>/criu.work/dump.log are
>
> (00.000097) Dumping processes (pid: 10908)
> (00.011147) Dumping path for -3 fd via self 9 [/bin/bash]
> (00.204961) Dumping path for -3 fd via self 9 [/usr/sbin/apache2]
> (00.250838) Dumping path for -3 fd via self 12 [/usr/bin/tail]
>
>
> So, why is tcpdump not getting dumped ? Is this by design ? Why is hostPID
> process 12499 (container PID 91) not being dumped ? I am using latest xemul/
> criu from github.com (ver 2.4).
Unfortunately we don't support external executed processes in a
container. CRIU dumps only relatives of the container init process.
>
> What information can I provide to help you give me some pointers ?
> I can send you the entire dump.log, stats-dump and the dockerfile. From what I
> can see, docker is instantiating criu as with the following params
>
> persist open tcp connections = true (default in docker)
> persist unix sockets = true (default in docker)
> exit the container after checkpoint complete = false (because I use
> --leave-running in docker checkpoint)
> checkpoint shell jobs = false (default)
> directory = /var/lib/containers/<CID>/checkpoints/<CHECKPT>/criu.work
> create a namespace,.. = "network"
>
> Could this be related to "checkpoint shell jobs" being set to false by docker ?
>
> I could create a new topic for this specific query, if needed for clarity
>
> Thanks
>
> vikram
>
> On Tue, Jul 12, 2016 at 11:19 PM, vikram kaul <kaul.vikram.kaul at gmail.com>
> wrote:
>
> I am trying to do a C/R on a docker container. In the past I have been
> working with lightweight containers derived from alpine. However, I now
> have to use Ubuntu xenial containers. I have created a stackoverflow
> question for this (link given), but I will provide a summary so that you
> can get some context
>
> http://stackoverflow.com/questions/38341520/
> docker-restore-from-checkpoint-cgroup-and-mountpoints-error
>
> So, I am getting
>
> mount.c:2555): mnt: Unable to statfs ./HOME: No such file or directory
>
> and
>
>
> Error (cgroup.c:1152): cg: No set 1 found
>
> errors when I try to create a docker container from a checkpoint of an
> currently running container. When creating the checkpoint, I keep the
> source container running. Note that if I checkpoint (and shutdown the
> source container) and then restore the same container, it works.
>
> I upgraded to the latest criu/crit from source (ver 2.4) - seeing that
> there are a bunch of changes to cgroup handling - but that did not help.
>
> I presume that since I don't have any trouble with alpine derived
> containers with restoring to new ones while the source is still running,
> it must be something related to Xenial derived containers. But I really
> don't know where to look.
>
> Any help will be appreciated
> Thanks
>
>
More information about the CRIU
mailing list