[CRIU] overmount confusion

Pavel Emelyanov xemul at parallels.com
Wed Apr 1 08:00:33 PDT 2015


>>>>> I patched it to have a to allow overmounts (i.e. skip this warning if
>>>>> a flag is passed), but then it fails to open mount 122 with:
>>>>>
>>>>> (00.139107) Error (mount.c:762): The file system 0x29 (0x2a) tmpfs ./sys/fs/cgroup is inaccessible
>>>>>
>>>>> so it seems that the current overmount detection code is not
>>>>> aggressive enough, since it only checks the sibling mounts instead of
>>>>> the whole mount tree.
>>>>
>>>> I think the code is correct. We seek for overmounts on m's parent only
>>>> because it m is overmounted by something higher, then the m's parent
>>>> will be overmounted too and CRIU will detect this when checking m's
>>>> parent itself.
>>>
>>> But shouldn't it detect the /sys/fs/cgroup case above?
>>
>> Well, I believe it is. You get the "is overmounted" message on unmodified
>> CRIU sources, don't you?
> 
> Only for /sys/fs/cgroup/cgmanager (and I set a flag there to avoid
> trying to do any work when dumping it later). For this one it doesn't
> detect that it is overmounted, so that flag isn't set and my code
> doesn't mount it.

Hm... The 22:/sys/fs/cgroup is not overmounted according to mountinfo.
The directory itself has another mount on top of it (128'th one), but
it doesn't count as overmount.

>>>>> Two questions:
>>>>>
>>>>> 1. Should the overmount code actually check the whole tree? If so, I
>>>>>    can send a patch.
>>>>> 2. What can we do in the overmount case? As I understand it, the only
>>>>>    answer is "add an --allow-overmounts flag and trust the user to
>>>>>    know what she's doing". Is there something better?
>>>>
>>>> I'm not sure that hoping that user knows what he's doing would work.
>>>> Upon restore we will have to do something with overmounted mounts
>>>> anyway and ignoring them is probably not an option.
>>>
>>> I guess it works for my case (because I know that the stuff was
>>> mounted before the container started, so there won't be any open
>>> files in the undermount), but yeah, that doesn't work generally.
>>>
>>>> We actually have a big issue with overmounts. Overmounted can also be
>>>> an opened file and we don't dump this case too. Mapped files, cwd-s and
>>>> unix sockets also complicate things, especially the sockets. In the
>>>> perfect world we should have some way to look-up a directory by a path
>>>> with an ability to "dive" under certain mount points along this path.
>>>> Then we can open() such things and mount() to them. But this is not an
>>>> API kernel guys would allow us to have :) and I understand why. When
>>>> thinking how to overcome this with existing kernel APIs, I found only
>>>> two ways to deal with overmounts.
>>>>
>>>> First is to temporarily move required mountpoints off the way, then
>>>> doing what we need (open/mount/bind/chdir) then moving it back. The 
>>>> second way would be to open() all the mountpoints we create at the 
>>>> time we create them and then fixing all the path resolution code to 
>>>> use openat() instead of open()-s (mountat()-s instead of mount()-s).
>>>
>>> Sorry, I didn't understand the second way; that is a strategy on
>>> restore, but what happens on dump? How do you get at the underlying
>>> mountpoint?
>>
>> Yes, that's the strategy for restore. As far as the dump is concerned -- we
>> only need to "get" one in terms of reading the information from /proc. One
>> exception from this rule would be tmpfs which we need to tar. For such cases
>> we can use the former technique -- move the overmounted mountpoints aside
>> temporarily.
> 
> By reading the information from proc here you mean reading things like
> open/ghost fds as well as the actual order in which things are
> mounted, right?

Well, I mean on dump we only need the mounts tree that can be get from
/proc/pid/mountinfo. We don't need to mess with the FS-s themselves. Even
if we dump a file that is opened and then overmounted we don't need to
"dive under" the hiding mountpoint or move it aside -- we just check that 
the path we think this file has (by readlink-ing the /proc/pid/fd link) is
resolved into wrong one (by stat()-ing this path), then we compare the
mount-id of this file (got from /proc/pid/fdinfo/fd) with the information
of mounts tree we have and see that the files is indeed overmounted. That's
(should be) enough for dump. On restore we'll have to open this file "under"
the overmounting mount and for this we would need to "dive under" or move
mounts.

-- Pavel



More information about the CRIU mailing list