<div dir="ltr">Thanks for the quick feedback. I am using CRIU 1.3-rc2 (at commit e1b56c8fa) with Docker version 1.1.0 on Ubuntu 14.04 which does not provide mnt_id in /proc/pid/fdinfo/fd files.<div><br></div><div>I will look into Docker source today. Assuming that it does open /dev/null before moving into the namespaces, can CRIU handle it?</div>
<div><br></div><div>--Saied</div><div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Jul 15, 2014 at 5:24 AM, Pavel Emelyanov <span dir="ltr"><<a href="mailto:xemul@parallels.com" target="_blank">xemul@parallels.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On 07/15/2014 03:51 PM, Pavel Emelyanov wrote:<br>
> On 07/15/2014 10:30 AM, Saied Kazemi wrote:<br>
>> Hi Pavel,<br>
>><br>
>> There seems to be a problem in or below parasite_drain_fds_seized() when seizing a process's open file descriptors. Here is the problem I ran into:<br>
>><br>
>> When a Docker container is started in the detached mode (-d flag), its stdin inside its own mount<br>
>> namespace is set to its /dev/null as you can see below:<br>
<br>
</div>Actually we do this regularly in our zdtm tests. If you start the ns/static/env00 one you'd see<br>
<br>
# ps<br>
2843 ? Ss 0:00 ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST<br>
2846 ? Ss 0:00 \_ ./env00 --pidfile=env00.pid --outfile=env00.out --envname=ENV_00_TEST<br>
<br>
These are container's init 2343 and the test itself 2846.<br>
If we compare the namespaces<br>
<br>
[root@localhost test]# ls -l /proc/self/ns/mnt<br>
lrwxrwxrwx 1 root root 0 Jul 15 16:21 /proc/self/ns/mnt -> mnt:<a href="tel:%5B4026531840" value="+14026531840">[4026531840</a>]<br>
[root@localhost test]# ls -l /proc/2846/ns/mnt<br>
lrwxrwxrwx 1 root root 0 Jul 15 16:21 /proc/2846/ns/mnt -> mnt:<a href="tel:%5B4026532201" value="+14026532201">[4026532201</a>]<br>
<br>
we see they live in different ones. And the test does opens /dev/null<br>
<br>
[root@localhost test]# ls -l /proc/2846/fd<br>
total 0<br>
lrwx------ 1 root root 64 Jul 15 16:21 0 -> /dev/null<br>
l-wx------ 1 root root 64 Jul 15 16:21 1 -> /zdtm/live/static/env00.out.inprogress<br>
l-wx------ 1 root root 64 Jul 15 16:21 2 -> /zdtm/live/static/env00.out.inprogress<br>
<br>
which is<br>
<br>
[root@localhost test]# stat -L /proc/2846/fd/0<br>
File: ‘/proc/2846/fd/0’<br>
<div class=""> Size: 0 Blocks: 0 IO Block: 4096 character special file<br>
</div>Device: fd01h/64769d Inode: 40940 Links: 1 Device type: 1,3<br>
...<br>
<br>
And the host's /dev/null is<br>
<br>
[root@localhost test]# stat /dev/null<br>
<div class=""> File: ‘/dev/null’<br>
Size: 0 Blocks: 0 IO Block: 4096 character special file<br>
</div>Device: 5h/5d Inode: 6073 Links: 1 Device type: 1,3<br>
...<br>
<br>
And this tests gets dumped successfully. It looks like docker does open the /dev/null<br>
from host before diving into namespaces.<br>
<div class="HOEnZb"><div class="h5"><br>
>> $ docker run -d ubuntu:latest /bin/sh -c 'ls -l /proc/self/fd >> /LOG; stat /dev/null >> /LOG; sleep 3000'<br>
>> 64bb55e56db391c11d3d8442fdb2f960252ce4c8edc6349d59d73b692d1b0b6c<br>
>> $<br>
>><br>
>> $ sudo cat /var/lib/docker/vfs/dir/64bb55e56db391c11d3d8442fdb2f960252ce4c8edc6349d59d73b692d1b0b6c/LOG<br>
>> total 0<br>
>> lr-x------ 1 root root 64 Jul 15 05:59 0 -> /dev/null<br>
>> l-wx------ 1 root root 64 Jul 15 05:59 1 -> /LOG<br>
>> l-wx------ 1 root root 64 Jul 15 05:59 2 -> pipe:[47269]<br>
>> lr-x------ 1 root root 64 Jul 15 05:59 3 -> /proc/9/fd<br>
>> File: '/dev/null'<br>
>> Size: 0 Blocks: 0 IO Block: 4096 character special file<br>
>> Device: 2ah/42dInode: 47496 Links: 1 Device type: 1,3<br>
>> Access: (0666/crw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root)<br>
>> Access: 2014-07-15 05:59:48.235291004 +0000<br>
>> Modify: 2014-07-15 05:59:48.235291004 +0000<br>
>> Change: 2014-07-15 05:59:48.235291004 +0000<br>
>> Birth: -<br>
>> $<br>
>><br>
>> Apparently, what is recorded as the open file descriptor 0 during dump is the system's /dev/null in the global mount namespace, not the /dev/null in the container's mount namespace. As a result, we get the following error in check_map_remap():<br>
>><br>
>> (00.061198) Error (files-reg.c:605): Unaccessible path ./dev/null opened 42:47496, need 5:5294<br>
><br>
> OK, so this means, that path refers to 42:47496 file while descriptor to 5:5294. What version of criu do you use?<br>
> Does your kernel exposes the mnt_id in /proc/pid/fdinfo/fd files?<br>
><br>
>> Notice that 5:5294 is system's /dev/null in the global mount namespace (see the stat command below) whereas 42:47496 is the container's /dev/null.<br>
>><br>
>> $ stat /dev/null<br>
>> File: ‘/dev/null’<br>
>> Size: 0 Blocks: 0 IO Block: 4096 character special file<br>
>> Device: 5h/5dInode: 5294 Links: 1 Device type: 1,3<br>
>> Access: (0666/crw-rw-rw-) Uid: ( 0/ root) Gid: ( 0/ root)<br>
>> Access: 2014-07-14 11:20:13.847273000 -0700<br>
>> Modify: 2014-07-14 11:20:13.847273000 -0700<br>
>> Change: 2014-07-14 11:20:13.847273000 -0700<br>
>> Birth: -<br>
>> $<br>
>><br>
>> Attached is dump.log. Does this analysis make sense or am I missing something?<br>
>><br>
>> --Saied<br>
><br>
<br>
</div></div></blockquote></div><br></div>