[CRIU] link_remap_ok

Pavel Emelyanov xemul at parallels.com
Mon Mar 2 01:07:03 PST 2015


On 02/28/2015 12:50 AM, beproject criu wrote:
> I have edited mount.c. Patch is attached with the mail.
> 
> 1. Where really is your container's root? Is it node's / or is it /usr/local/lib/lxc/rootfs/L
>    or w/o "L" at the end of previous path? Or where? And how does it get there -- with the
>    chroot() call or with the pivot_root() one?
> 
> ----- host root: "/" container's root: "/sdcard/L/rootfs"   -> for pivot root it is mounted at "/usr/local/lib/lxc/rootfs"

OK. I've got the point of what's going on there. The thing is that container
lives in another mount namespace, but its root is set with the chroot() system
call, not the pivot_root() as it should.

And this your hunk

@@ -2022,8 +2028,9 @@ int __mntns_get_root_fd(pid_t pid)

        path[ret] = '\0';

-       if (ret != 1 || path[0] != '/') {
+       if ((ret != 1 || path[0] != '/') && (opts.root == NULL || strcmp(opts.root, path))) {
                pr_err("The root task has another root than mntns: %s\n", path);
+               snprintf(path, sizeof(path), "/proc/%d/root", pid);
                close_pid_proc();
                return -1;
        }

proves that I'm right :) If you had told us this from the very beginning, the 
investigation would have been MUCH easier :\  Next time remember mentioning
that you're using modified CRIU sources.

Regarding the issue. CRIU is known not to handle such cases properly, the thing
is a) in the mentioned hunk which forces task's root to be "/" in another mount
namespace, b) in the path resolution code sitting in the files-reg.c which
expects the root to point to "/", not to some other path, c) in the mount.c
code which tries to handle _all_ the mount point it meet, while those, sitting
above the tasks' root can (should) be just ignored.


Since you've already started investigating the issue :) the next step should be
in fixing the files-reg.c code to resolve file's paths properly. The hunk in
fill_fdlink() from your patch looks to be to proper direction.

Thanks,
Pavel

> 2. How did you manage to get several rootfs mountpoints inside container? Do you use
>    somehow patched kernel?
> not sure. may be host's mounpoints are visible inside the container? There was a discussion about this a while back on CRIU
> https://lists.linuxcontainers.org/pipermail/lxc-devel/2014-October/010571.html
> 
> 
> /sdcard # ls -ld /proc/self/root/
> drwxrwxr-x   15 1001     1001           340 Feb 27 13:15 /proc/self/root/
> 
> /sdcard # lxc-attach -n L -- ls -ld /proc/self/root/
> drwxrwxr-x   15 1001     1001           340 Feb 27 13:15 /proc/self/root/
> 
> On Sat, Feb 28, 2015 at 2:13 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com>> wrote:
> 
>     On 02/27/2015 11:09 PM, beproject criu wrote:
>     > Is this ok?
> 
>     Yes, that's OK and this ... is even more strange than I expected %)
>     But anyway, let's try to debug that.
> 
>     > /sdcard # lxc-attach -n L -- cat /proc/self/mountinfo
>     > 16 16 0:2 / / rw - rootfs rootfs rw,size=372900k,nr_inodes=93225
>     > 17 16 0:4 / /proc rw,relatime - proc proc rw
>     > 18 16 0:11 / /sys rw,relatime - sysfs sysfs rw
>     > 19 18 0:12 / /sys/fs/cgroup rw,relatime - cgroup none rw,cpuset,debug,cpu,cpuacct,memory,devices,freezer,blkio,perf_event,clone_children
>     > 20 18 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
>     > 21 16 0:10 / /dev/pts rw,relatime - devpts devpts rw,mode=600,ptmxmode=000
>     > 22 16 179:0 / /sdcard rw,relatime - ext2 /dev/mmcblk0 rw
>     > 24 16 0:2 /usr/local/lib/lxc/rootfs /usr/local/lib/lxc/rootfs rw - rootfs rootfs rw,size=372900k,nr_inodes=93225
>     > 25 24 0:14 / /usr/local/lib/lxc/rootfs rw,relatime - tmpfs none rw,size=12k,mode=755
>     > 35 25 179:0 /L/rootfs /usr/local/lib/lxc/rootfs/root rw,relatime shared:1 - ext2 /dev/mmcblk0 rw
>     > 36 35 0:13 / /usr/local/lib/lxc/rootfs/root/proc rw,relatime shared:2 - proc none rw
>     > 37 35 0:15 / /usr/local/lib/lxc/rootfs/root/sys rw,relatime shared:3 - sysfs none rw
>     > 38 35 0:2 /dev /usr/local/lib/lxc/rootfs/root/dev rw,relatime shared:6 - rootfs rootfs rw,size=372900k,nr_inodes=93225
>     > 39 38 0:10 /1 /usr/local/lib/lxc/rootfs/root/dev/console rw,relatime shared:7 - devpts devpts rw,mode=600,ptmxmode=000
>     > 26 37 0:6 / /usr/local/lib/lxc/rootfs/root/sys/kernel/debug rw,relatime shared:4 - debugfs debugfs rw
>     > 27 37 0:16 / /usr/local/lib/lxc/rootfs/root/sys/fs/cgroup rw,relatime shared:5 - tmpfs none rw,mode=750,gid=1000
>     > 28 35 0:17 / /usr/local/lib/lxc/rootfs/root/mnt/asec rw,relatime shared:8 - tmpfs tmpfs rw,mode=755,gid=1000
>     > 29 35 0:18 / /usr/local/lib/lxc/rootfs/root/mnt/obb rw,relatime shared:9 - tmpfs tmpfs rw,mode=755,gid=1000
>     > 30 35 31:0 / /usr/local/lib/lxc/rootfs/root/system ro,relatime shared:10 - ext4 /dev/block/mtdblock0 ro,data=ordered
>     > 31 35 31:1 / /usr/local/lib/lxc/rootfs/root/data rw,nosuid,nodev,noatime shared:11 - ext4 /dev/block/mtdblock1 rw,data=ordered
>     > 32 35 31:2 / /usr/local/lib/lxc/rootfs/root/cache rw,nosuid,nodev,noatime shared:12 - ext4 /dev/block/mtdblock2 rw,data=ordered
> 
>     OK, so if we put all bits together I see that some process inside container has a file
>     named /usr/local/lib/lxc/rootfs/root/dev/__properties__ opened and this file is really
>     present inside container. According to the mountpoints this path should exist.
> 
>     Next, CRIU tries to stat this file relative to what it thinks is your container's root
>     (the 1017-th descriptor). And gets the ENOENT error, which in turn means, that the
>     1017 points not to container's root.
> 
>     Need to find out why. But things that I don't understand and need your help (since I
>     don't have your box's console at hands) are:
> 
>     1. Where really is your container's root? Is it node's / or is it /usr/local/lib/lxc/rootfs/L
>        or w/o "L" at the end of previous path? Or where? And how does it get there -- with the
>        chroot() call or with the pivot_root() one?
> 
>     2. How did you manage to get several rootfs mountpoints inside container? Do you use
>        somehow patched kernel?
> 
>     3. In theory this fd should point to container's namespace's root, which _should_
>        be invisible in criu's namespace and, thus, have the path "/". But it's not such.
>        What if you find your container's root task as seen from host and do this:
> 
>        ls -ld /proc/$the-pid-in-question/root
> 
>        ? What path would you see?
> 
>     Thanks,
>     Pavel
> 
>     > /sdcard # cat /proc/self/mountinfo
>     > 1 1 0:2 / / rw - rootfs rootfs rw,size=372900k,nr_inodes=93225
>     > 10 1 0:4 / /proc rw,relatime - proc proc rw
>     > 11 1 0:11 / /sys rw,relatime - sysfs sysfs rw
>     > 12 11 0:12 / /sys/fs/cgroup rw,relatime - cgroup none rw,cpuset,debug,cpu,cpuacct,memory,devices,freezer,blkio,perf_event,clone_children
>     > 13 11 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
>     > 14 1 0:10 / /dev/pts rw,relatime - devpts devpts rw,mode=600,ptmxmode=000
>     > 15 1 179:0 / /sdcard rw,relatime - ext2 /dev/mmcblk0 rw
>     > /sdcard #
>     >
>     >
>     > I have this line in my lxc.conf
>     > lxc.mount.entry=/dev /sdcard/L/rootfs/dev none defaults,bind 0 0
>     >
>     >
>     > Thanks.
>     >
>     > On Sat, Feb 28, 2015 at 1:20 AM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>> wrote:
>     >
>     >     On 02/27/2015 08:05 PM, beproject criu wrote:
>     >     > path [/usr/local/lib/lxc/rootfs/root] read_fd_link() return : 30
>     >     > I have mounted the /dev of host on /dev inside my container.
>     >
>     >     That's interesting. How did you do it? Can you show me the mount namespaces
>     >     layout on your node and inside the container? It's just /proc/$pid/mountinfo
>     >     file contents on some process on host and some process in CT.
>     >
>     >     > (03.347637) Dumping path for 4 fd via self 43 [/usr/local/lib/lxc/rootfs/root/dev/__properties__]
>     >
>     >     OK, this is quite strange. The /usr/local/lib/lxc/rootfs is the path on HOST.
>     >     How did this path become visible inside container?
>     >
>     >     > (03.347847) [nyc_fd] The required root is already opened. get_service_fd() returns : 1017
>     >     > (03.348197) [nyc_fd] Going into fstatat()-> [mntns_root : 1017],[rpath : ./usr/local/lib/lxc/rootfs/root/dev/__properties__]
>     >     > (03.348393) [nyc_fd] Out of fstatat()-> [ret : -1],[rpath : ./usr/local/lib/lxc/rootfs/root/dev/__properties__]
>     >     > (03.348583) [nyc_fd] Going into dump_linked_remap()
>     >     > (03.349177) [nyc_fd] The required root is already opened. get_service_fd() returns : 1017
>     >     > (03.349432) [nyc_fd] Doing linkat() [mntns_root : 1017],[link_name : ./usr/local/lib/lxc/rootfs/root/dev/link_remap.4]
>     >     >
>     >     > (03.349756) [nyc_fd] bad_path [/usr/local/lib/lxc/rootfs/root] ,read_fd_link() return : 30
>     >     >
>     >     > (03.350027) Error (files-reg.c:515): Can't link remap to /usr/local/lib/lxc/rootfs/root/dev/__properties__: No such file or directory
>     >     > (03.351097) [nyc_fd] Error in dump_one_reg_file->check_path_remap()
>     >     > (03.351266) [nyc_fd] Going into dump_task_files_seized() : dump_one_file()
>     >     >
>     >     >
>     >     > On Fri, Feb 27, 2015 at 8:10 PM, Pavel Emelyanov <xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>> <mailto:xemul at parallels.com <mailto:xemul at parallels.com> <mailto:xemul at parallels.com <mailto:xemul at parallels.com>>>> wrote:
>     >     >
>     >     >     On 02/27/2015 05:25 PM, beproject criu wrote:
>     >     >     > This is the flow before error,could you get what's wrong :
>     >     >     >
>     >     >     > (02.352922) Dumping path for 4 fd via self 43 [/usr/local/lib/lxc/rootfs/root/dev/__properties__]
>     >     >     > (02.353079) [nyc_fd] The required root is already opened. get_service_fd() returns : 1017
>     >     >     > (02.353216) [nyc_fd] Going into fstatat()-> [mntns_root : 1017],[rpath : ./usr/local/lib/lxc/rootfs/root/dev/__properties__]
>     >     >     > (02.353398) [nyc_fd] Out of fstatat()-> [ret : -1],[rpath : ./usr/local/lib/lxc/rootfs/root/dev/__properties__]
>     >     >     > (02.353574) [nyc_fd] Going into dump_linked_remap()
>     >     >     > (02.354191) [nyc_fd] The required root is already opened. get_service_fd() returns : 1017
>     >     >     > (02.354431) [nyc_fd] Doing linkat() [mntns_root : 1017],[link_name : ./usr/local/lib/lxc/rootfs/root/dev/link_remap.4]
>     >     >     > (02.354747) Error (files-reg.c:510): Can't link remap to /usr/local/lib/lxc/rootfs/root/dev/__properties__: No such file or directory
>     >     >     > (02.355915) [nyc_fd] Error in dump_one_reg_file->check_path_remap()
>     >     >     > (02.356066) [nyc_fd] Going into dump_task_files_seized() : dump_one_file()
>     >     >
>     >     >     OK, so in both cases we access file af fd 1017 and sub-path "./user/local/lib/lxc/roootfs/root/dev/__properties__"
>     >     >     and you tell that this file actually exists in container, right?
>     >     >
>     >     >     The failing fstatat then means, that the 1017 descriptor points to some bad path. Can you check where? We
>     >     >     have a helper called read_fd_link() helper for that, see the fsnotify.c line 455 for code example.
>     >     >
>     >     >     And one more question -- does your container live in another mount namespace, or shared one with host?
>     >     >
>     >     >     Thanks,
>     >     >     Pavel
>     >     >
>     >     >
>     >
>     >
> 
> 



More information about the CRIU mailing list