[CRIU] --ext-mount-map auto likes MS_BIND too much
Tycho Andersen
tycho.andersen at canonical.com
Tue Apr 14 09:05:00 PDT 2015
On Tue, Apr 14, 2015 at 09:44:23AM -0600, Tycho Andersen wrote:
> Hi Oleg,
>
> On Tue, Apr 14, 2015 at 05:09:08PM +0200, Oleg Nesterov wrote:
> > Sorry for delay.
>
> No problem, thanks for investigating.
>
> > On 04/13, Oleg Nesterov wrote:
> > >
> > > So I hit the new problems with criu. I'll write another email,
> > > I beleive the recent --ext-mount-map auto were not 100% correct.
> >
> > Or I simply do not understand what should it do.
> >
> > Lets start with the simplified "test case":
> >
> > # cat /proc/self/mountinfo
> > 17 38 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> > 18 38 0:16 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
> > 19 38 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=16374292k,nr_inodes=4093573,mode=755
> > 21 19 0:17 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
> > 22 19 0:11 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> > 23 38 0:18 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> > 24 18 0:19 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:8 - tmpfs tmpfs rw,seclabel,mode=755
> > 25 24 0:20 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> > 38 1 253:1 / / rw,relatime shared:1 - xfs /dev/mapper/rhel_ibm--x3650m4--02--vm--02-root rw,seclabel,attr2,inode64,noquota
> >
> > # unshare -m
> > 26 20 253:1 / / rw,relatime shared:1 - xfs /dev/mapper/rhel_ibm--x3650m4--02--vm--02-root rw,seclabel,attr2,inode64,noquota
> > 27 26 0:5 / /dev rw,nosuid shared:2 - devtmpfs devtmpfs rw,seclabel,size=16374292k,nr_inodes=4093573,mode=755
> > 28 27 0:17 / /dev/shm rw,nosuid,nodev shared:3 - tmpfs tmpfs rw,seclabel
> > 29 27 0:11 / /dev/pts rw,nosuid,noexec,relatime shared:4 - devpts devpts rw,seclabel,gid=5,mode=620,ptmxmode=000
> > 30 26 0:3 / /proc rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> > 31 26 0:16 / /sys rw,nosuid,nodev,noexec,relatime shared:6 - sysfs sysfs rw,seclabel
> > 32 31 0:19 / /sys/fs/cgroup rw,nosuid,nodev,noexec shared:8 - tmpfs tmpfs rw,seclabel,mode=755
> > 33 32 0:20 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:9 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
> > 34 26 0:18 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> >
> > # perl -e 'close STDIN; close STDOUT; close STDERR; sleep'
> >
> > Now, on another console:
> >
> > # criu dump -D D/ -j -t `pidof perl`
> > # criu restore -D D/ -j
> >
> > Works.
> >
> > But what if I pass "--ext-mount-map auto" ? It should not make any harm, yes?
> >
> > # criu dump -D D/ -j -t `pidof perl` --ext-mount-map auto --enable-external-sharing --enable-external-masters
> >
> > yes, this works. But!
> >
> > # criu restore -D D/ -j --ext-mount-map auto --enable-external-sharing --enable-external-masters
> >
> > fails:
> >
> > Error (mount.c:1844): Can't mount at ./dev/shm: No such file or directory
> >
> >
> > The reason looks clear. Lets look at resolve_external_mounts(), it calls
> > find_best_external_match() unconditionally, and it always finds the match
> > from the "root" ns_id (which has ->pid == getpid(), I do not know the right
> > term).
> >
> > And this basically means that "autodetected external mount" applies to every
> > mountpoint except "/". The relevant part of "dump -vvvvv" is:
> >
> > autodetected external mount /run/ for ./run
> > autodetected external mount /sys/fs/cgroup/systemd/ for ./sys/fs/cgroup/systemd
> > autodetected external mount /sys/fs/cgroup/ for ./sys/fs/cgroup
> > autodetected external mount /sys/ for ./sys
> > autodetected external mount /proc/ for ./proc
> > autodetected external mount /dev/pts/ for ./dev/pts
> > autodetected external mount /dev/shm/ for ./dev/shm
> > autodetected external mount /dev/ for ./dev
> >
> > And of course, "restore" can't work, -vvvv makes it clear:
> >
> > Start with 26:./
> > Mounting unsupported @./ (0)
> > 26:./ private 0 shared 1 slave 0
> > Mounting devtmpfs @./dev (0)
> > Bind /dev/ to ./dev
> > 27:./dev private 1 shared 1 slave 0
> > Mounting tmpfs @./dev/shm (0)
> > Bind /dev/shm/ to ./dev/shm
> > Error (mount.c:1844): Can't mount at ./dev/shm: No such file or directory
> >
> > Surely, /dev/ was not remounted correctly.
> >
> > Perhaps resolve_external_mounts() should skip the fsroot_mounted() mount
> > points at least? Although afaics this is not enough too.
>
> I think we can't do an fsroot_mounted() check as we discussed here:
>
> http://lists.openvz.org/pipermail/criu/2015-April/019744.html
>
> but yes, something does look wrong :). In this case, the mounts are
> actually the same mounts, just in different namespaces. Perhaps
> mounts_equal() (or at least, the condition we check in
> resolve_external_mounts) should compare mount ids as well to check
> this case? If the mount ids match, then we should not bind mount,
> because the mounts are the same mount just on different sides of an
> unshare or clone call.
Oh, whoops, this won't work either because they get new ids on the
other side of the unshare call. Hmm.
Tycho
> Tycho
>
> > Plus, if we skip something in resolve_external_mounts(), then I am not
> > sure that other m->external checks (say, in collect_shared()) will be
> > correct...
> >
> >
> > So. This looks "obviously wrong" to me. Or I simply do not understand
> > whats going on?
> >
> > Help!
> >
> > The change below helps in this particular case, but as I said it is not
> > correct/enough.
> >
> > Oleg.
> >
> > --- x/./mount.c
> > +++ x/./mount.c
> > @@ -718,6 +718,9 @@ static int resolve_external_mounts(struc
> > if (m->parent == NULL || m->is_ns_root)
> > continue;
> >
> > + if (fsroot_mounted(m))
> > + continue;
> > +
> > ret = try_resolve_ext_mount(m);
> > if (ret < 0 && ret != -ENOTSUP) {
> > return -1;
> >
More information about the CRIU
mailing list