[CRIU] Restoring lxc-1.1.5 centos 7 container with httpd fails

Adrian Reber adrian at lisas.de
Mon May 9 03:46:58 PDT 2016


I uploaded a KVM image (details in a separate mail). I used the
following packages:

https://copr-be.cloud.fedoraproject.org/results/adrian/lxc-stable-for-epel/epel-7-x86_64/00189839-lxc/lxc-2.0.0-1.el7.centos.x86_64.rpm
https://copr-be.cloud.fedoraproject.org/results/adrian/lxc-stable-for-epel/epel-7-x86_64/00189839-lxc/lua-lxc-2.0.0-1.el7.centos.x86_64.rpm
https://copr-be.cloud.fedoraproject.org/results/adrian/lxc-stable-for-epel/epel-7-x86_64/00189839-lxc/lxc-libs-2.0.0-1.el7.centos.x86_64.rpm
https://copr-be.cloud.fedoraproject.org/results/adrian/lxc-stable-for-epel/epel-7-x86_64/00189839-lxc/lxc-templates-2.0.0-1.el7.centos.x86_64.rpm

and then the following to reproduce the error:

# lxc-create -t centos -n c7 -- -R 7
# lxc-start -n c7
# lxc-info -n c7 -H -i
10.0.3.218
# ssh root at 10.0.3.218

Now in the container

# yum install httpd
# systemctl enable httpd
# systemctl start httpd

Outside of the container

# curl 10.0.3.218
# export PATH=/root/criu/criu:$PATH
# lxc-checkpoint -n c7 -D /tmp/cp -vvvv -s
checkpoint: criu.c: do_restore: 711 criu process exited 1, output:


lxc-checkpoint: criu.c: __criu_restore: 966 restore process died
Restoring c7 failed.

And then a mount error:

(00.061574)      1: Error (mount.c:2478): mnt: Can't mount at .criu.mntns.al8aJz/13/sys/fs/cgroup/systemd: No such file or directory
(00.061596)      1: Error (mount.c:2383): mnt: Unable to remove /tmp/cr-tmpfs.aR2NjF: Device or resource busy
(00.061711) Error (cr-restore.c:1404): 28087 exited, status=1
(00.061758) Switching to new ns to clean ghosts
(00.062337) Error (cr-restore.c:2251): Restoring FAILED.

		Adrian

On Fri, May 06, 2016 at 04:01:16PM -0700, Andrey Vagin wrote:
> Hi Adrian,
> 
> Can you create a kvm VM where I will be able to reproduce the problem?
> 
> I tried to reproduce it by myself, but it works for me.
> 
> 18435 pts/0    S      0:00 [lxc monitor] /var/lib/lxc centos
> 18441 ?        Ss     0:00  \_ /sbin/init
> 18475 ?        Ss     0:00      \_ /usr/lib/systemd/systemd-journald
> 18476 ?        Ss     0:00      \_ /usr/lib/systemd/systemd-udevd
> 18477 ?        Ss     0:00      \_ /usr/lib/systemd/systemd-logind
> 18478 ?        Ss     0:00      \_ /bin/dbus-daemon --system
> --address=systemd: --nofork --nopidfile --systemd-activation
> 18479 ?        Ssl    0:00      \_ /usr/sbin/rsyslogd -n
> 18480 ?        Ss     0:00      \_ /usr/sbin/sshd -D
> 18481 ?        Ss     0:00      \_ /usr/sbin/httpd -DFOREGROUND
> 18482 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
> 18483 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
> 18484 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
> 18485 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
> 18486 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
> 19812 ?        Ss     0:00 /usr/sbin/crond -n
> [root at localhost lxc]# LD_LIBRARY_PATH=/usr/local/lib64/ ./src/lxc/lxc-checkpoint -s -n centos -D /root/images -v
> [root at localhost lxc]# LD_LIBRARY_PATH=/usr/local/lib64/ ./src/lxc/lxc-checkpoint -r -n centos -D /root/images -v
> [root at localhost lxc]# echo $?
> 0
> [root at localhost lxc]# git describe HEAD
> lxc-2.0.0
> 
> 
> On Fri, May 06, 2016 at 10:25:12PM +0200, Adrian Reber wrote:
> > > > > We've discussed with Adrian in irc and he promised to give more
> > > > > info about this issue.
> > > > > 
> > > > > To investiage this sort of bugs I add sleep(1000) after pr_err() to
> > > > > freeze processes in a moment of the error and try to find what is wrong
> > > > > here via /proc/PID/root.
> > > > 
> > > > With the sleep after the last pr_err() I see two criu processes:
> > > > 
> > > > # ls -la /proc/10183/root
> > > > lrwxrwxrwx. 1 root root 0 May  6 08:47 /proc/10183/root -> /
> > > > # ls -la /proc/10188/root
> > > > lrwxrwxrwx. 1 root root 0 May  6 08:47 /proc/10188/root -> /
> > > 
> > > I mean that you need to try to resolve source and target argumnets of a
> > > mount syscall which returns an error.
> > 
> > I am not really sure what to do. The current restore fails with:
> > 
> > (00.080566)      1: mnt: 	Bind /tmp/cr-tmpfs.KD4sxa/hugetlb to .criu.mntns.07hSc4/13/sys/fs/cgroup/hugetlb
> > (00.080585)      1: Error (mount.c:2479): mnt: Can't mount at .criu.mntns.07hSc4/13/sys/fs/cgroup/hugetlb: No such file or directory
> > 
> > The directory /proc/4745/root/var/lib/lxc/c7/rootfs/.criu.mntns.07hSc4
> > is empty. So that seems to explain why the mount is not working.
> > 
> > The other directory exists:
> > 
> > # ls  /proc/4745/root/tmp/cr-tmpfs.KD4sxa/hugetlb/
> > lxc
> > 
> > So is the first empty directory the problem? As the path contains
> > 'mntns'... Does this need some special mount namespace support. This is
> > running on the CentOS 7 kernel. Which might be missing some features.
> > 
> > 
> > 		Adrian


More information about the CRIU mailing list