[CRIU] Dealing with other mount types
Pavel Emelyanov
xemul at parallels.com
Wed Mar 25 07:43:49 PDT 2015
On 03/25/2015 05:09 PM, Tycho Andersen wrote:
> On Wed, Mar 25, 2015 at 04:05:57PM +0300, Pavel Emelyanov wrote:
>> On 03/24/2015 09:57 PM, Tycho Andersen wrote:
>>> Hi all,
>>>
>>> [As a preface, I don't understand all the issues at play here, so any
>>> input or corrections are very much welcome.]
>>>
>>> Recent changes in Ubuntu and LXC mean that c/r of LXC containers no longer
>>> works out of the box, so I'd like to fix that. The first step is to fix some of
>>> the mount handling. When I start a container on Vivid with LXC 1.1, I get a
>>> mountinfo that looks like:
>>>
>>> 44 45 253:1 /usr/local/var/lib/lxc/u1/rootfs / rw,relatime master:1 - ext4 /dev/disk/by-uuid/6c5a78e0-95fa-49a8-aa91-a8093d295e58 rw,data=ordered
>>> 78 44 0:36 / /dev rw,relatime - tmpfs none rw,size=100k,mode=755
>>> 79 44 0:38 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
>>> 80 81 0:38 /sys/net /proc/sys/net rw,nosuid,nodev,noexec,relatime - proc proc rw
>>> 81 79 0:38 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
>>> 82 79 0:38 /sysrq-trigger /proc/sysrq-trigger ro,nosuid,nodev,noexec,relatime - proc proc rw
>>> 83 44 0:39 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>>> 84 83 0:39 / /sys ro,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>>> 85 84 0:39 / /sys/devices/virtual/net rw,relatime - sysfs sysfs rw
>>> 86 85 0:39 /devices/virtual/net /sys/devices/virtual/net rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
>>> 87 84 0:34 / /sys/fs/fuse/connections rw,relatime master:23 - fusectl fusectl rw
>>> 88 84 0:7 / /sys/kernel/debug rw,relatime master:25 - debugfs debugfs rw
>>> 89 84 0:11 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime master:8 - securityfs securityfs rw
>>> 90 84 0:23 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime master:11 - pstore pstore rw
>>> 91 84 0:40 / /sys/fs/cgroup rw,relatime - tmpfs cgroup rw,size=12k,mode=755
>>> 92 91 0:21 /cgmanager /sys/fs/cgroup/cgmanager rw - tmpfs tmpfs rw,mode=755
>>> 46 78 0:41 / /dev/pts rw,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=666
>>> 47 44 0:42 / /run rw,nosuid,noexec,relatime - tmpfs none rw,size=199952k,mode=755
>>> 48 47 0:43 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=5120k
>>> 49 47 0:44 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
>>> 50 47 0:45 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=102400k,mode=755
>>>
>>> First, several things (the rootfs, fuse, pstore, etc.) are mounted as slaves.
>>> My understanding is that this happens because systemd remounts / as MS_SHARED
>>> instead of MS_PRIVATE, but it means that we need some way of handling slave
>>> mounts. One thought is to have an argument similar to --ext-mount-map which
>>> tells criu which peer group a particular mount is a slave to. For e.g. pstore
>>> above, this would look like:
>>>
>>> 1. criu ... --slave-mount-map=/sys/fs/pstore:/sys/fs/pstore # source:target
>>
>> But on dump we can find this out without option. Maybe it would be enough just
>> to enable external slave mounts with a single option? And on restore make them
>> slaves to the same paths.
>
> Yes, I was just thinking of the case when they're mounted at different
> paths on the source and target hosts (probably not likely, but
> possible). If it's not something we care about, I think we can just
> have an --enable-external-masters or something.
OK
>>> 2. criu walks the mount tree as usual, and when it sees something in
>>> --slave-mount-map:
>>> 1. criu bind mounts /sys/fs/pstore into $root_yard/sys/fs/pstore
>>> 2. criu sets MS_SLAVE (by calling restore_shared_options())
>>>
>>> Second, for e.g. /proc/sys, the root of the mount is a path that's relative to
>>> it's parent's mountpoint. I think (?) this just means that mount.c's
>>> find_fsroot_mount_for() needs to be a little smarter when it resolves things,
>>> so it should return /'s mountinfo when called for /proc/sys, instead of
>>> complaining about a proper root mount.
>>
>> Isn't it just an external bind mount that can be resolved using existing
>> --ext-mount-map?
>
> I don't think so, because it's not external. In:
>
> 79 44 0:38 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
> 80 81 0:38 /sys/net /proc/sys/net rw,nosuid,nodev,noexec,relatime - proc proc rw
> 81 79 0:38 /sys /proc/sys ro,nosuid,nodev,noexec,relatime - proc proc rw
>
> Here /proc/sys/net is mounted in the /sys dir of it's root fs, which is /proc,
> which itself is mounted at /proc in /. None of those are external bind mounts,
Ah, indeed.
> the paths given as the mount point are just relative to their parent mount
> instead of the rootfs. So I think (?) all we need to do is walk these paths
> correctly, and not ask anything else of the user.
Yes, I agree, it's not external bind mount. But now I don't understand what
the problem is with the find_fsroot_mount_for() :) Can you explain in more
details, please?
-- Pavel
More information about the CRIU
mailing list