[CRIU] race dumping fds in wily LXC containers

Tycho Andersen tycho.andersen at canonical.com
Thu Jul 16 13:31:02 PDT 2015


On Thu, Jul 16, 2015 at 11:12:26PM +0300, Pavel Emelyanov wrote:
> On 07/16/2015 05:06 PM, Tycho Andersen wrote:
> > On Thu, Jul 16, 2015 at 12:41:40PM +0300, Pavel Emelyanov wrote:
> >> On 07/15/2015 11:39 PM, Tycho Andersen wrote:
> >>> On Wed, Jul 15, 2015 at 01:01:15PM +0300, Pavel Emelyanov wrote:
> >>>>
> >>>>> Right, this fstat() fails because the previous readlink() returned a
> >>>>> "(deleted)", i.e. rpath in the fstatat() call in check_path_remap()
> >>>>> has this "(deleted)".
> >>>>
> >>>> No, we don't check for "(deleted)" to take any decisions. Only stat/fstat
> >>>> results comparisons. The only thing we do with it is strip one from the
> >>>> file path if it's there :)
> >>>
> >>> Not to make any decisions, but the problem here is that the call in
> >>> check_path_remap() looks like:
> >>>
> >>> fstatat(mntns_root, "/sys/fs/cgroup/systemd/lxc/w1-4 (deleted)", &pst, 0);
> >>>
> >>> which doesn't work.
> >>
> >> Exactly! The fstatat() reports there's no such file _name_ in the system.
> >> Since there's the "(deleted)" suffix in the name it means, that kernel
> >> sees the respective dentry as unhashed. This happens when one unlink-s
> >> the file.
> >>
> >> But if you say that the name wl-4 is there, this means that there's _one_
> >> _more_ file with the same name in the tree.
> >>
> >> You can see this situation with this steps:
> >>
> >> term-1 $ cat > x
> >> term-2 $ ls -l /proc/$(pidof cat)/fd
> >> ...
> >> ... 1 -> /home/x
> >>
> >> # so the cat has 1 pointing to /home/x, that's correct.
> >>
> >> term-2 $ ln x y
> >> term-2 $ rm -f x
> >> term-2 $ ls -l /proc/$(pidof cat)/fd
> >> ...
> >> ... 1 -> /home/x (deleted)
> >>
> >> # now x is deleted and criu will dump it as link remap since there's y
> >> # name holding the inode and providing the n_link being 1 to the file
> >>
> >> term-2 $ touch x
> >> #or you can do
> >> term-2 $ ln y x
> >>
> >> # now we have created another file named x, in the former case -- just
> >> # a new file, in the latter -- another name for the same inode as old
> >> # x used to have %)
> >>
> >> term-2 $ ls -l /proc/$(pidof cat)fd
> >> ...
> >> ... 1 -> /home/x (deleted)
> >>
> >> # See? The cat's x file is still considered to be deleted as the dentry
> >> # that is used by cat was unhashed with the rm -f command. And even if
> >> # there's a new x file in /home, the cat's 1 descriptor should still be
> >> # treated as unlinked.
> > 
> > Right, except in this case if I readlink the fd that criu is failing
> > on from /proc after criu fails, it doesn't have a "(deleted)", so I
> 
> Wait, in my example the "(deleted)" is always there. And in your case
> you also see the "(deleted)" one:
> 
>    fstatat(mntns_root, "/sys/fs/cgroup/systemd/lxc/w1-4 (deleted)", &pst, 0);

Right, but if I readlink the fd that points to that after the restore
fails (i.e. independent of CRIU), it doesn't show up as "(deleted)",
even though the fd hasn't been touched. So it's appearing deleted to
CRIU, but isn't deleted in reality.

> > think it's not been unlinked (indeed it can't be unlinked, lxcfs
> > doesn't implement unlink). Truly perplexing :(
> 
> Well, it's not necessarily has to be unlink(). Anything that unhashes the dentry,
> it can also be rename() or some internal filesystem thing that replaces one
> dentry with another (d_revalidate for example).

Ok, I'll have a look, thanks.

Tycho


More information about the CRIU mailing list