[CRIU] race dumping fds in wily LXC containers

Pavel Emelyanov xemul at parallels.com
Fri Jul 17 05:08:36 PDT 2015


On 07/16/2015 11:31 PM, Tycho Andersen wrote:
> On Thu, Jul 16, 2015 at 11:12:26PM +0300, Pavel Emelyanov wrote:
>> On 07/16/2015 05:06 PM, Tycho Andersen wrote:
>>> On Thu, Jul 16, 2015 at 12:41:40PM +0300, Pavel Emelyanov wrote:
>>>> On 07/15/2015 11:39 PM, Tycho Andersen wrote:
>>>>> On Wed, Jul 15, 2015 at 01:01:15PM +0300, Pavel Emelyanov wrote:
>>>>>>
>>>>>>> Right, this fstat() fails because the previous readlink() returned a
>>>>>>> "(deleted)", i.e. rpath in the fstatat() call in check_path_remap()
>>>>>>> has this "(deleted)".
>>>>>>
>>>>>> No, we don't check for "(deleted)" to take any decisions. Only stat/fstat
>>>>>> results comparisons. The only thing we do with it is strip one from the
>>>>>> file path if it's there :)
>>>>>
>>>>> Not to make any decisions, but the problem here is that the call in
>>>>> check_path_remap() looks like:
>>>>>
>>>>> fstatat(mntns_root, "/sys/fs/cgroup/systemd/lxc/w1-4 (deleted)", &pst, 0);
>>>>>
>>>>> which doesn't work.
>>>>
>>>> Exactly! The fstatat() reports there's no such file _name_ in the system.
>>>> Since there's the "(deleted)" suffix in the name it means, that kernel
>>>> sees the respective dentry as unhashed. This happens when one unlink-s
>>>> the file.
>>>>
>>>> But if you say that the name wl-4 is there, this means that there's _one_
>>>> _more_ file with the same name in the tree.
>>>>
>>>> You can see this situation with this steps:
>>>>
>>>> term-1 $ cat > x
>>>> term-2 $ ls -l /proc/$(pidof cat)/fd
>>>> ...
>>>> ... 1 -> /home/x
>>>>
>>>> # so the cat has 1 pointing to /home/x, that's correct.
>>>>
>>>> term-2 $ ln x y
>>>> term-2 $ rm -f x
>>>> term-2 $ ls -l /proc/$(pidof cat)/fd
>>>> ...
>>>> ... 1 -> /home/x (deleted)
>>>>
>>>> # now x is deleted and criu will dump it as link remap since there's y
>>>> # name holding the inode and providing the n_link being 1 to the file
>>>>
>>>> term-2 $ touch x
>>>> #or you can do
>>>> term-2 $ ln y x
>>>>
>>>> # now we have created another file named x, in the former case -- just
>>>> # a new file, in the latter -- another name for the same inode as old
>>>> # x used to have %)
>>>>
>>>> term-2 $ ls -l /proc/$(pidof cat)fd
>>>> ...
>>>> ... 1 -> /home/x (deleted)
>>>>
>>>> # See? The cat's x file is still considered to be deleted as the dentry
>>>> # that is used by cat was unhashed with the rm -f command. And even if
>>>> # there's a new x file in /home, the cat's 1 descriptor should still be
>>>> # treated as unlinked.
>>>
>>> Right, except in this case if I readlink the fd that criu is failing
>>> on from /proc after criu fails, it doesn't have a "(deleted)", so I
>>
>> Wait, in my example the "(deleted)" is always there. And in your case
>> you also see the "(deleted)" one:
>>
>>    fstatat(mntns_root, "/sys/fs/cgroup/systemd/lxc/w1-4 (deleted)", &pst, 0);
> 
> Right, but if I readlink the fd that points to that after the restore
> fails (i.e. independent of CRIU), it doesn't show up as "(deleted)",
> even though the fd hasn't been touched. So it's appearing deleted to
> CRIU, but isn't deleted in reality.

Yes, that's strange. Maybe you're right and passing fd to another process
causes problems for fuse.

>>> think it's not been unlinked (indeed it can't be unlinked, lxcfs
>>> doesn't implement unlink). Truly perplexing :(
>>
>> Well, it's not necessarily has to be unlink(). Anything that unhashes the dentry,
>> it can also be rename() or some internal filesystem thing that replaces one
>> dentry with another (d_revalidate for example).
> 
> Ok, I'll have a look, thanks.
> 
> Tycho
> .
> 



More information about the CRIU mailing list