[CRIU] --ext-mount-map auto likes MS_BIND too much

Tycho Andersen tycho.andersen at canonical.com
Tue Apr 14 09:05:00 PDT 2015


On Tue, Apr 14, 2015 at 09:44:23AM -0600, Tycho Andersen wrote:
> Hi Oleg,
> 
> On Tue, Apr 14, 2015 at 05:09:08PM +0200, Oleg Nesterov wrote:
> > Sorry for delay.
> 
> No problem, thanks for investigating.
> 
> > On 04/13, Oleg Nesterov wrote:
> > >
> > > So I hit the new problems with criu. I'll write another email,
> > > I beleive the recent --ext-mount-map auto were not 100% correct.
> > 
> > Or I simply do not understand what should it do.
> > 
> > Lets start with the simplified "test case":
> > 
> > 	# cat /proc/self/mountinfo
> > 	17 38 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> > 	18 38 0:16 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
> > 	19 38 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=16374292k,nr_inodes=4093573,mode=755
> > 	21 19 0:17 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
> > 	22 19 0:11 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> > 	23 38 0:18 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> > 	24 18 0:19 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:8 - tmpfs tmpfs rw,seclabel,mode=755
> > 	25 24 0:20 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> > 	38 1 253:1 / / rw,relatime shared:1 - xfs /dev/mapper/rhel_ibm--x3650m4--02--vm--02-root rw,seclabel,attr2,inode64,noquota
> > 
> > 	# unshare -m
> > 	26 20 253:1 / / rw,relatime shared:1 - xfs /dev/mapper/rhel_ibm--x3650m4--02--vm--02-root rw,seclabel,attr2,inode64,noquota
> > 	27 26 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=16374292k,nr_inodes=4093573,mode=755
> > 	28 27 0:17 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
> > 	29 27 0:11 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> > 	30 26 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> > 	31 26 0:16 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
> > 	32 31 0:19 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:8 - tmpfs tmpfs rw,seclabel,mode=755
> > 	33 32 0:20 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> > 	34 26 0:18 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> > 
> > 	# perl -e 'close STDIN; close STDOUT; close STDERR; sleep'
> > 
> > Now, on another console:
> > 
> > 	# criu dump -D D/ -j -t `pidof perl`
> > 	# criu restore -D D/ -j
> > 
> > Works.
> > 
> > But what if I pass "--ext-mount-map auto" ? It should not make any harm, yes?
> > 
> > 	# criu dump -D D/ -j -t `pidof perl` --ext-mount-map auto --enable-external-sharing --enable-external-masters
> > 
> > yes, this works. But!
> > 
> > 	# criu restore -D D/ -j --ext-mount-map auto --enable-external-sharing --enable-external-masters
> > 
> > fails:
> > 
> > 	Error (mount.c:1844): Can't mount at ./dev/shm: No such file or directory
> > 
> > 
> > The reason looks clear. Lets look at resolve_external_mounts(), it calls
> > find_best_external_match() unconditionally, and it always finds the match
> > from the "root" ns_id (which has ->pid == getpid(), I do not know the right
> > term).
> > 
> > And this basically means that "autodetected external mount" applies to every
> > mountpoint except "/". The relevant part of "dump -vvvvv" is:
> > 
> > 	autodetected external mount /run/ for ./run
> > 	autodetected external mount /sys/fs/cgroup/systemd/ for ./sys/fs/cgroup/systemd
> > 	autodetected external mount /sys/fs/cgroup/ for ./sys/fs/cgroup
> > 	autodetected external mount /sys/ for ./sys
> > 	autodetected external mount /proc/ for ./proc
> > 	autodetected external mount /dev/pts/ for ./dev/pts
> > 	autodetected external mount /dev/shm/ for ./dev/shm
> > 	autodetected external mount /dev/ for ./dev
> > 
> > And of course, "restore" can't work, -vvvv makes it clear:
> > 
> > 	Start with 26:./
> > 		Mounting unsupported @./ (0)
> > 	26:./ private 0 shared 1 slave 0
> > 		Mounting devtmpfs @./dev (0)
> > 		Bind /dev/ to ./dev
> > 	27:./dev private 1 shared 1 slave 0
> > 		Mounting tmpfs @./dev/shm (0)
> > 		Bind /dev/shm/ to ./dev/shm
> > 	Error (mount.c:1844): Can't mount at ./dev/shm: No such file or directory	
> > 
> > Surely, /dev/ was not remounted correctly.
> > 
> > Perhaps resolve_external_mounts() should skip the fsroot_mounted() mount
> > points at least? Although afaics this is not enough too.
> 
> I think we can't do an fsroot_mounted() check as we discussed here:
> 
> http://lists.openvz.org/pipermail/criu/2015-April/019744.html
> 
> but yes, something does look wrong :). In this case, the mounts are
> actually the same mounts, just in different namespaces. Perhaps
> mounts_equal() (or at least, the condition we check in
> resolve_external_mounts) should compare mount ids as well to check
> this case? If the mount ids match, then we should not bind mount,
> because the mounts are the same mount just on different sides of an
> unshare or clone call.

Oh, whoops, this won't work either because they get new ids on the
other side of the unshare call. Hmm.

Tycho

> Tycho
> 
> > Plus, if we skip something in resolve_external_mounts(), then I am not
> > sure that other m->external checks (say, in collect_shared()) will be
> > correct...
> > 
> > 
> > So. This looks "obviously wrong" to me. Or I simply do not understand
> > whats going on?
> > 
> > Help!
> > 
> > The change below helps in this particular case, but as I said it is not
> > correct/enough.
> > 
> > Oleg.
> > 
> > --- x/./mount.c
> > +++ x/./mount.c
> > @@ -718,6 +718,9 @@ static int resolve_external_mounts(struc
> >  		if (m->parent == NULL || m->is_ns_root)
> >  			continue;
> >  
> > +		if (fsroot_mounted(m))
> > +			continue;
> > +
> >  		ret = try_resolve_ext_mount(m);
> >  		if (ret < 0 && ret != -ENOTSUP) {
> >  			return -1;
> > 


More information about the CRIU mailing list