[CRIU] --ext-mount-map auto likes MS_BIND too much

Tycho Andersen tycho.andersen at canonical.com
Tue Apr 14 08:44:23 PDT 2015


Hi Oleg,

On Tue, Apr 14, 2015 at 05:09:08PM +0200, Oleg Nesterov wrote:
> Sorry for delay.

No problem, thanks for investigating.

> On 04/13, Oleg Nesterov wrote:
> >
> > So I hit the new problems with criu. I'll write another email,
> > I beleive the recent --ext-mount-map auto were not 100% correct.
> 
> Or I simply do not understand what should it do.
> 
> Lets start with the simplified "test case":
> 
> 	# cat /proc/self/mountinfo
> 	17 38 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 	18 38 0:16 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
> 	19 38 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=16374292k,nr_inodes=4093573,mode=755
> 	21 19 0:17 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
> 	22 19 0:11 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> 	23 38 0:18 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 	24 18 0:19 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:8 - tmpfs tmpfs rw,seclabel,mode=755
> 	25 24 0:20 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 	38 1 253:1 / / rw,relatime shared:1 - xfs /dev/mapper/rhel_ibm--x3650m4--02--vm--02-root rw,seclabel,attr2,inode64,noquota
> 
> 	# unshare -m
> 	26 20 253:1 / / rw,relatime shared:1 - xfs /dev/mapper/rhel_ibm--x3650m4--02--vm--02-root rw,seclabel,attr2,inode64,noquota
> 	27 26 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=16374292k,nr_inodes=4093573,mode=755
> 	28 27 0:17 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
> 	29 27 0:11 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> 	30 26 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 	31 26 0:16 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
> 	32 31 0:19 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:8 - tmpfs tmpfs rw,seclabel,mode=755
> 	33 32 0:20 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> 	34 26 0:18 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 
> 	# perl -e 'close STDIN; close STDOUT; close STDERR; sleep'
> 
> Now, on another console:
> 
> 	# criu dump -D D/ -j -t `pidof perl`
> 	# criu restore -D D/ -j
> 
> Works.
> 
> But what if I pass "--ext-mount-map auto" ? It should not make any harm, yes?
> 
> 	# criu dump -D D/ -j -t `pidof perl` --ext-mount-map auto --enable-external-sharing --enable-external-masters
> 
> yes, this works. But!
> 
> 	# criu restore -D D/ -j --ext-mount-map auto --enable-external-sharing --enable-external-masters
> 
> fails:
> 
> 	Error (mount.c:1844): Can't mount at ./dev/shm: No such file or directory
> 
> 
> The reason looks clear. Lets look at resolve_external_mounts(), it calls
> find_best_external_match() unconditionally, and it always finds the match
> from the "root" ns_id (which has ->pid == getpid(), I do not know the right
> term).
> 
> And this basically means that "autodetected external mount" applies to every
> mountpoint except "/". The relevant part of "dump -vvvvv" is:
> 
> 	autodetected external mount /run/ for ./run
> 	autodetected external mount /sys/fs/cgroup/systemd/ for ./sys/fs/cgroup/systemd
> 	autodetected external mount /sys/fs/cgroup/ for ./sys/fs/cgroup
> 	autodetected external mount /sys/ for ./sys
> 	autodetected external mount /proc/ for ./proc
> 	autodetected external mount /dev/pts/ for ./dev/pts
> 	autodetected external mount /dev/shm/ for ./dev/shm
> 	autodetected external mount /dev/ for ./dev
> 
> And of course, "restore" can't work, -vvvv makes it clear:
> 
> 	Start with 26:./
> 		Mounting unsupported @./ (0)
> 	26:./ private 0 shared 1 slave 0
> 		Mounting devtmpfs @./dev (0)
> 		Bind /dev/ to ./dev
> 	27:./dev private 1 shared 1 slave 0
> 		Mounting tmpfs @./dev/shm (0)
> 		Bind /dev/shm/ to ./dev/shm
> 	Error (mount.c:1844): Can't mount at ./dev/shm: No such file or directory	
> 
> Surely, /dev/ was not remounted correctly.
> 
> Perhaps resolve_external_mounts() should skip the fsroot_mounted() mount
> points at least? Although afaics this is not enough too.

I think we can't do an fsroot_mounted() check as we discussed here:

http://lists.openvz.org/pipermail/criu/2015-April/019744.html

but yes, something does look wrong :). In this case, the mounts are
actually the same mounts, just in different namespaces. Perhaps
mounts_equal() (or at least, the condition we check in
resolve_external_mounts) should compare mount ids as well to check
this case? If the mount ids match, then we should not bind mount,
because the mounts are the same mount just on different sides of an
unshare or clone call.

Tycho

> Plus, if we skip something in resolve_external_mounts(), then I am not
> sure that other m->external checks (say, in collect_shared()) will be
> correct...
> 
> 
> So. This looks "obviously wrong" to me. Or I simply do not understand
> whats going on?
> 
> Help!
> 
> The change below helps in this particular case, but as I said it is not
> correct/enough.
> 
> Oleg.
> 
> --- x/./mount.c
> +++ x/./mount.c
> @@ -718,6 +718,9 @@ static int resolve_external_mounts(struc
>  		if (m->parent == NULL || m->is_ns_root)
>  			continue;
>  
> +		if (fsroot_mounted(m))
> +			continue;
> +
>  		ret = try_resolve_ext_mount(m);
>  		if (ret < 0 && ret != -ENOTSUP) {
>  			return -1;
> 


More information about the CRIU mailing list