[CRIU] overmount confusion

Pavel Emelyanov xemul at parallels.com
Wed Apr 1 00:16:01 PDT 2015


>>> In particular the lines:
>>>
>>> 122 108 0:41 / /sys/fs/cgroup rw,relatime - tmpfs cgroup rw,size=12k,mode=755
>>> 128 122 0:42 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
>>>
>>> 123 122 0:18 /cgmanager /sys/fs/cgroup/cgmanager rw,relatime - tmpfs none rw,size=4k,mode=755
>>> 129 128 0:18 /cgmanager /sys/fs/cgroup/cgmanager rw,relatime - tmpfs none rw,size=4k,mode=755
>>>
>>> are interesting. If I try to dump this container, criu tells me:
>>>
>>> (00.003931) Error (mount.c:636): 123:./sys/fs/cgroup/cgmanager is overmounted
>>
>> Hm... Looks like yes. It's overmounted by the /sys/fs/cgroups itself, isn't it?
> 
> I /think/ so :)

:D

>>> I patched it to have a to allow overmounts (i.e. skip this warning if
>>> a flag is passed), but then it fails to open mount 122 with:
>>>
>>> (00.139107) Error (mount.c:762): The file system 0x29 (0x2a) tmpfs ./sys/fs/cgroup is inaccessible
>>>
>>> so it seems that the current overmount detection code is not
>>> aggressive enough, since it only checks the sibling mounts instead of
>>> the whole mount tree.
>>
>> I think the code is correct. We seek for overmounts on m's parent only
>> because it m is overmounted by something higher, then the m's parent
>> will be overmounted too and CRIU will detect this when checking m's
>> parent itself.
> 
> But shouldn't it detect the /sys/fs/cgroup case above?

Well, I believe it is. You get the "is overmounted" message on unmodified
CRIU sources, don't you?

>>> Two questions:
>>>
>>> 1. Should the overmount code actually check the whole tree? If so, I
>>>    can send a patch.
>>> 2. What can we do in the overmount case? As I understand it, the only
>>>    answer is "add an --allow-overmounts flag and trust the user to
>>>    know what she's doing". Is there something better?
>>
>> I'm not sure that hoping that user knows what he's doing would work.
>> Upon restore we will have to do something with overmounted mounts
>> anyway and ignoring them is probably not an option.
> 
> I guess it works for my case (because I know that the stuff was
> mounted before the container started, so there won't be any open
> files in the undermount), but yeah, that doesn't work generally.
> 
>> We actually have a big issue with overmounts. Overmounted can also be
>> an opened file and we don't dump this case too. Mapped files, cwd-s and
>> unix sockets also complicate things, especially the sockets. In the
>> perfect world we should have some way to look-up a directory by a path
>> with an ability to "dive" under certain mount points along this path.
>> Then we can open() such things and mount() to them. But this is not an
>> API kernel guys would allow us to have :) and I understand why. When
>> thinking how to overcome this with existing kernel APIs, I found only
>> two ways to deal with overmounts.
>>
>> First is to temporarily move required mountpoints off the way, then
>> doing what we need (open/mount/bind/chdir) then moving it back. The 
>> second way would be to open() all the mountpoints we create at the 
>> time we create them and then fixing all the path resolution code to 
>> use openat() instead of open()-s (mountat()-s instead of mount()-s).
> 
> Sorry, I didn't understand the second way; that is a strategy on
> restore, but what happens on dump? How do you get at the underlying
> mountpoint?

Yes, that's the strategy for restore. As far as the dump is concerned -- we
only need to "get" one in terms of reading the information from /proc. One
exception from this rule would be tmpfs which we need to tar. For such cases
we can use the former technique -- move the overmounted mountpoints aside
temporarily.

-- Pavel



More information about the CRIU mailing list