[CRIU] race dumping fds in wily LXC containers
Pavel Emelyanov
xemul at parallels.com
Thu Jul 16 02:41:40 PDT 2015
On 07/15/2015 11:39 PM, Tycho Andersen wrote:
> On Wed, Jul 15, 2015 at 01:01:15PM +0300, Pavel Emelyanov wrote:
>>
>>> Right, this fstat() fails because the previous readlink() returned a
>>> "(deleted)", i.e. rpath in the fstatat() call in check_path_remap()
>>> has this "(deleted)".
>>
>> No, we don't check for "(deleted)" to take any decisions. Only stat/fstat
>> results comparisons. The only thing we do with it is strip one from the
>> file path if it's there :)
>
> Not to make any decisions, but the problem here is that the call in
> check_path_remap() looks like:
>
> fstatat(mntns_root, "/sys/fs/cgroup/systemd/lxc/w1-4 (deleted)", &pst, 0);
>
> which doesn't work.
Exactly! The fstatat() reports there's no such file _name_ in the system.
Since there's the "(deleted)" suffix in the name it means, that kernel
sees the respective dentry as unhashed. This happens when one unlink-s
the file.
But if you say that the name wl-4 is there, this means that there's _one_
_more_ file with the same name in the tree.
You can see this situation with this steps:
term-1 $ cat > x
term-2 $ ls -l /proc/$(pidof cat)/fd
...
... 1 -> /home/x
# so the cat has 1 pointing to /home/x, that's correct.
term-2 $ ln x y
term-2 $ rm -f x
term-2 $ ls -l /proc/$(pidof cat)/fd
...
... 1 -> /home/x (deleted)
# now x is deleted and criu will dump it as link remap since there's y
# name holding the inode and providing the n_link being 1 to the file
term-2 $ touch x
#or you can do
term-2 $ ln y x
# now we have created another file named x, in the former case -- just
# a new file, in the latter -- another name for the same inode as old
# x used to have %)
term-2 $ ls -l /proc/$(pidof cat)fd
...
... 1 -> /home/x (deleted)
# See? The cat's x file is still considered to be deleted as the dentry
# that is used by cat was unhashed with the rm -f command. And even if
# there's a new x file in /home, the cat's 1 descriptor should still be
# treated as unlinked.
>>>> After this the link remap will be called.
>>>>
>>>> In you case it seems to be the kernel spoofing the /sys/ files names
>>>> somehow so that criu is not able to stat() the name in the first place.
>>>
>>> I think the initial stat succeeds somehow (since we don't get an error
>>> there and it contiues on), but the subsequent readlink tacks on
>>> "(deleted)" and thus the fstat of that file fails, which doesn't make
>>> much sense to me. The file definitely exists, it's like there is some
>>> problem readlink()ing it (perhaps because it is sent over a unix
>>> socket or something? not sure).
>>
>> The reason for going to link remap is stat (on a file descriptor) succeeded
>> and reported non zero link count AND the subsequent fstat() on file path
>> reported ENOENT. (And an NFS special-care, but I don't think it's the case).
>
> Right, but the problem is that we're stating the wrong file as above.
> What I'm not sure about is why we're getting the wrong thing from
> readlink, since the file exists.
>
> I suspect it has something to do with fuse + sending a fd over unix
> socket, but that's a hunch more than anything. I was hoping you might
> know where to look :)
>
> Tycho
> .
>
More information about the CRIU
mailing list