[CRIU] Restoring lxc-1.1.5 centos 7 container with httpd fails

Pavel Emelyanov xemul at virtuozzo.com
Thu May 12 07:45:05 PDT 2016


On 05/12/2016 09:12 AM, Adrian Reber wrote:
> On Wed, May 11, 2016 at 02:32:23PM -0700, Andrey Vagin wrote:
>> On Wed, May 11, 2016 at 09:35:54AM +0200, Adrian Reber wrote:
>>> Hello Andrey,
>>>
>>> I applied your last three patches from the CRIU ML:
>>>
>>> mount: create a clean mount only if a sub directory is bind-mounted
>>> mount: don't overmount a mount if it should be bind-mounted somewhere
>>> mount: dump a file system only if a mount point isn't overmounted
>>>
>>> and I can checkpoint and restore a lxc container with httpd, mongodb and
>>> postgresql running in it. I haven't yet checked if all patches are
>>> necessary for my problem, but I can look further into it if you want.
>>>
>>> If I start mariadb I still get
>>
>> https://gist.github.com/avagin/5972076d9ae5aac5d3053a646c01bfe9
>>
>> There is something wrong with aio. Kirill, could you help us with this
>> error? I know that you met the same problem in VZ.
>>
>> pie: Error (pie/restorer.c:582): wrong aio parametrs: tail=0x0 head=0x>
>> pie: 0 nr=0xbe len=0x3000
> 
> I should have looked closer at the log file. This is a RHEL/CentOS
> problem. I have to carry following patch in the CRIU rpm package to deal
> with AIO:

Would you submit this patch with proper commit message? :)

> --- a/criu/aio.c        2015-07-01 11:02:50.360004543 -0400
> +++ b/criu/aio.c        2015-07-01 11:03:33.099757812 -0400
> @@ -61,7 +61,7 @@
>          * up back to the k_max_reqs.
>          */
>  
> -       return (k_max_reqs - 2) / 2;
> +       return (k_max_reqs - 2);
>  }
>  
>  unsigned long aio_rings_args_size(struct vm_area_list *vmas)
> 
> https://git.centos.org/blob/rpms!criu/d3d9a8a7e792a6c594097d8d35312a9d1735afa5/SOURCES!aio-fix.patch
> 
> https://git.centos.org/blob/rpms!criu/d3d9a8a7e792a6c594097d8d35312a9d1735afa5/SPECS!criu.spec#L11
> 
> With this patch applied on your mount patches on top of criu-dev, I can
> now also checkpoint and restart a mariadb container. In fact I have
> container with mariadb, postgresql, mongodb, httpd and tomcat running
> and I can checkpoint and restore it without a problem.
> 
> Thanks for the help!
> 
> 		Adrian
> 
>>>
>>> (00.142574)    294: Parsed 7f8d95799000-7f8d9579a000 vma
>>> (00.169810) Error (cr-restore.c:1407): 6077 killed by signal 9: Killed
>>> (00.169972) Switching to new ns to clean ghosts
>>> (00.170019) Error (files-reg.c:515):  `- XFail [.criu.mntns.3xE0jR/15var/tmp/ib8i8FJW.cr.1.ghost] ghost: No such file or directory
>>> (00.170024) Error (files-reg.c:515):  `- XFail [.criu.mntns.3xE0jR/15var/tmp/ibHo1ZB4.cr.2.ghost] ghost: No such file or directory
>>> (00.170027) Error (files-reg.c:515):  `- XFail [.criu.mntns.3xE0jR/15var/tmp/ib0eakuc.cr.3.ghost] ghost: No such file or directory
>>> (00.170029) Error (files-reg.c:515):  `- XFail [.criu.mntns.3xE0jR/15var/tmp/ibtHE6fs.cr.4.ghost] ghost: No such file or directory
>>> (00.170031) Error (files-reg.c:515):  `- XFail [.criu.mntns.3xE0jR/15var/tmp/ibmiNsaA.cr.5.ghost] ghost: No such file or directory
>>> (00.170827) Error (cr-restore.c:2251): Restoring FAILED.
>>>
>>> Thanks so far.
>>>
>>> 		Adrian
>>>
>>>
>>> On Fri, May 06, 2016 at 04:01:16PM -0700, Andrey Vagin wrote:
>>>> Hi Adrian,
>>>>
>>>> Can you create a kvm VM where I will be able to reproduce the problem?
>>>>
>>>> I tried to reproduce it by myself, but it works for me.
>>>>
>>>> 18435 pts/0    S      0:00 [lxc monitor] /var/lib/lxc centos
>>>> 18441 ?        Ss     0:00  \_ /sbin/init
>>>> 18475 ?        Ss     0:00      \_ /usr/lib/systemd/systemd-journald
>>>> 18476 ?        Ss     0:00      \_ /usr/lib/systemd/systemd-udevd
>>>> 18477 ?        Ss     0:00      \_ /usr/lib/systemd/systemd-logind
>>>> 18478 ?        Ss     0:00      \_ /bin/dbus-daemon --system
>>>> --address=systemd: --nofork --nopidfile --systemd-activation
>>>> 18479 ?        Ssl    0:00      \_ /usr/sbin/rsyslogd -n
>>>> 18480 ?        Ss     0:00      \_ /usr/sbin/sshd -D
>>>> 18481 ?        Ss     0:00      \_ /usr/sbin/httpd -DFOREGROUND
>>>> 18482 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
>>>> 18483 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
>>>> 18484 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
>>>> 18485 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
>>>> 18486 ?        S      0:00          \_ /usr/sbin/httpd -DFOREGROUND
>>>> 19812 ?        Ss     0:00 /usr/sbin/crond -n
>>>> [root at localhost lxc]# LD_LIBRARY_PATH=/usr/local/lib64/ ./src/lxc/lxc-checkpoint -s -n centos -D /root/images -v
>>>> [root at localhost lxc]# LD_LIBRARY_PATH=/usr/local/lib64/ ./src/lxc/lxc-checkpoint -r -n centos -D /root/images -v
>>>> [root at localhost lxc]# echo $?
>>>> 0
>>>> [root at localhost lxc]# git describe HEAD
>>>> lxc-2.0.0
>>>>
>>>>
>>>> On Fri, May 06, 2016 at 10:25:12PM +0200, Adrian Reber wrote:
>>>>>>>> We've discussed with Adrian in irc and he promised to give more
>>>>>>>> info about this issue.
>>>>>>>>
>>>>>>>> To investiage this sort of bugs I add sleep(1000) after pr_err() to
>>>>>>>> freeze processes in a moment of the error and try to find what is wrong
>>>>>>>> here via /proc/PID/root.
>>>>>>>
>>>>>>> With the sleep after the last pr_err() I see two criu processes:
>>>>>>>
>>>>>>> # ls -la /proc/10183/root
>>>>>>> lrwxrwxrwx. 1 root root 0 May  6 08:47 /proc/10183/root -> /
>>>>>>> # ls -la /proc/10188/root
>>>>>>> lrwxrwxrwx. 1 root root 0 May  6 08:47 /proc/10188/root -> /
>>>>>>
>>>>>> I mean that you need to try to resolve source and target argumnets of a
>>>>>> mount syscall which returns an error.
>>>>>
>>>>> I am not really sure what to do. The current restore fails with:
>>>>>
>>>>> (00.080566)      1: mnt: 	Bind /tmp/cr-tmpfs.KD4sxa/hugetlb to .criu.mntns.07hSc4/13/sys/fs/cgroup/hugetlb
>>>>> (00.080585)      1: Error (mount.c:2479): mnt: Can't mount at .criu.mntns.07hSc4/13/sys/fs/cgroup/hugetlb: No such file or directory
>>>>>
>>>>> The directory /proc/4745/root/var/lib/lxc/c7/rootfs/.criu.mntns.07hSc4
>>>>> is empty. So that seems to explain why the mount is not working.
>>>>>
>>>>> The other directory exists:
>>>>>
>>>>> # ls  /proc/4745/root/tmp/cr-tmpfs.KD4sxa/hugetlb/
>>>>> lxc
>>>>>
>>>>> So is the first empty directory the problem? As the path contains
>>>>> 'mntns'... Does this need some special mount namespace support. This is
>>>>> running on the CentOS 7 kernel. Which might be missing some features.
>>>>>
>>>>>
>>>>> 		Adrian
> .
> 



More information about the CRIU mailing list