[CRIU] race dumping fds in wily LXC containers

Pavel Emelyanov xemul at parallels.com
Mon Jul 13 05:07:10 PDT 2015


On 07/01/2015 12:45 AM, Tycho Andersen wrote:
> Hi all,
> 
> I'm trying to debug a strange race that happens sometimes when
> checkpointing wily ubuntu LXC containers. The symptom of the race is:
> 
> (00.020270) Error (files-reg.c:527): Can't link remap to /sys/fs/cgroup/systemd/lxc/w1 (deleted): Operation not permitted

Ouch... This is a file on sysfs :)

> The problem here seems to be that the readlink on criu's
> /proc/self/fd/$the_fd_for_that_file gives a "(deleted)" result, which
> subsequently confuses things. (In fact, I'm a little confused about
> how dump_linked_remap() works at all, given that just before it is
> called the fstatat() fails; but let's ignore that for now.)
> 
> The strangest part of all this is that after the dump fails, I can
> attach to the container and do a readlink on the /proc/pid/fd/$fd for
> the pid in question, and it gives me the right (i.e. non-"(deleted)")
> answer.

The link remap idea is like this.

fd = open("/foo/bar")
link("/foo/bar", "/foo/bar2")
unlink("/foo/bar")

In this case you will have an inode with two names -- "/foo/bar"
and "/foo/bar2", but the former name is not visible, since you
have unlinked one.

This situation will be detected by CRIU like this:

first we will check for the /proc/pid/fd/fd link for the name. It
will be "/foo/bar (deleted)". The "deleted" suffix is added by the
kernel when it sees that the dentry in question is not hashed, which
is the case for the "bar" dentry.

Then CRIU will try to stat() this name to check whether the file can
be still accessed by one. For the "/foo/bar" it will not be the case.

Then CRIU will fstat() the descriptor and will see the n_link count
being 1 ("/foo/bar2" name is alive).

After this the link remap will be called.

In you case it seems to be the kernel spoofing the /sys/ files names
somehow so that criu is not able to stat() the name in the first place.

> Any ideas as to what's going on here? My best guess is a kernel bug
> related to sending fds (the underlying filesystem is lxcfs, a fuse
> filesystem, not the traditional cgroup fs), but that's just a hunch.
> 
> Any thoughts would be appreciated.
> 
> Tycho
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
> .
> 



More information about the CRIU mailing list