[CRIU] Ghost file: no such file or directory
Tycho Andersen
tycho.andersen at canonical.com
Wed Mar 30 09:42:57 PDT 2016
On Wed, Mar 30, 2016 at 06:15:20PM +0300, Pavel Emelyanov wrote:
> On 03/30/2016 06:10 PM, Tycho Andersen wrote:
> > On Wed, Mar 30, 2016 at 06:04:04PM +0300, Pavel Emelyanov wrote:
> >> On 03/30/2016 05:59 PM, Tycho Andersen wrote:
> >>> On Wed, Mar 30, 2016 at 05:50:32PM +0300, Pavel Emelyanov wrote:
> >>>> On 03/30/2016 05:46 PM, Tycho Andersen wrote:
> >>>>> On Wed, Mar 30, 2016 at 05:26:15PM +0300, Pavel Emelyanov wrote:
> >>>>>> On 03/30/2016 03:33 PM, Federico Reghenzani wrote:
> >>>>>>> Hi all!
> >>>>>>>
> >>>>>>> We have problem restoring Open MPI daemons with child processes that uses shared memory:
> >>>>>>>
> >>>>>>> (00.022447) 255: Opening ghost file 0x3 for tmp/openmpi-sessions-root at roaster-vm3_0/60995/1/shared_mem_pool.roaster-vm3.1
> >>>>>>> (00.022479) 255: Error (files-reg.c:139): Can't open ghost file //tmp/openmpi-sessions-root at roaster-vm3_0/60995/1/shared_mem_pool.roaster-vm3.1.cr.3.ghost: No such file or directory
> >>>>>>
> >>>>>> Can you check whether the \dirname of this path exists?
> >>>>>> I mean this -- //tmp/openmpi-sessions-root at roaster-vm3_0/60995/1/
> >>>>>>
> >>>>>> Presumably this is the case when not only the file was removed, but also
> >>>>>> some dir components. And we've fixed it only in 2.0.
> >>>>>
> >>>>> The users I had report it were using post 2.0, so there's some other
> >>>>> bug here. I'm trying to reproduce now again, but not having any luck
> >>>>> :(
> >>>>
> >>>> Do they also see the ENOENT errno from the open(O_CREAT) call?
> >>>
> >>> Which call do you mean here? I don't see anything close to this that
> >>> does an O_CREAT on dump.
> >>
> >> Yes, because the error is on restore :) For 1.8 this is files-reg.c create_ghost()'s
> >
> > Oh, derp, I misread the error. I've seen failures reported on dump
> > here:
> >
> > https://github.com/xemul/criu/blob/master/criu/files-reg.c#L628
>
> Ouch! And what was the errno?! I can hardly imagine the reason for a regular
> file open failure via proc :(
Even more bizarrely, it seems like the fd does actually exist. Here's
a log with a call to cr_system("ls -alh /proc/self/fd") just before we
try to open the file (fd 39):
(00.019478) Dumping path for 15 fd via self 39 [/var/log/upstart/systemd-logind.log.1 (deleted)]
(00.019480) Strip ' (deleted)' tag from './var/log/upstart/systemd-logind.log.1 (deleted)'
(00.019482) Dumping ghost file for fd 39 id 0xb
(00.019484) mnt: Path `/var/log/upstart/systemd-logind.log.1' resolved to `./' mountpoint
(00.019486) Dumping ghost file contents (id 0x2)
total 0
dr-x------ 2 root root 0 Mar 30 18:38 .
dr-xr-xr-x 9 root root 0 Mar 30 18:38 ..
lr-x------ 1 root root 64 Mar 30 18:38 0 -> /proc/20023/fd
l-wx------ 1 root root 64 Mar 30 18:38 1 -> /tmp/lxd_checkpoint_035782765/dump.log
l--------- 1 root root 64 Mar 30 18:38 10 -> /proc/20018
lr-x------ 1 root root 64 Mar 30 18:38 11 -> /lib/x86_64-linux-gnu/libnss_files-2.19.so
lr-x------ 1 root root 64 Mar 30 18:38 12 -> /lib/x86_64-linux-gnu/libnss_nis-2.19.so
lr-x------ 1 root root 64 Mar 30 18:38 13 -> /lib/x86_64-linux-gnu/libnsl-2.19.so
lr-x------ 1 root root 64 Mar 30 18:38 14 -> /lib/x86_64-linux-gnu/libnss_compat-2.19.so
lr-x------ 1 root root 64 Mar 30 18:38 15 -> /lib/x86_64-linux-gnu/libdl-2.19.so
lr-x------ 1 root root 64 Mar 30 18:38 16 -> /lib/x86_64-linux-gnu/libpcre.so.3.13.1
lr-x------ 1 root root 64 Mar 30 18:38 17 -> /lib/x86_64-linux-gnu/libpthread-2.19.so
lr-x------ 1 root root 64 Mar 30 18:38 18 -> /lib/x86_64-linux-gnu/libc-2.19.so
lr-x------ 1 root root 64 Mar 30 18:38 19 -> /lib/x86_64-linux-gnu/librt-2.19.so
l-wx------ 1 root root 64 Mar 30 18:38 2 -> /tmp/lxd_checkpoint_035782765/dump.log
lr-x------ 1 root root 64 Mar 30 18:38 20 -> /lib/x86_64-linux-gnu/libjson-c.so.2.0.0
lr-x------ 1 root root 64 Mar 30 18:38 21 -> /lib/x86_64-linux-gnu/libselinux.so.1
lr-x------ 1 root root 64 Mar 30 18:38 22 -> /lib/x86_64-linux-gnu/libdbus-1.so.3.7.6
lr-x------ 1 root root 64 Mar 30 18:38 23 -> /lib/x86_64-linux-gnu/libnih-dbus.so.1.0.0
lr-x------ 1 root root 64 Mar 30 18:38 24 -> /lib/x86_64-linux-gnu/libnih.so.1.0.0
lr-x------ 1 root root 64 Mar 30 18:38 25 -> /lib/x86_64-linux-gnu/ld-2.19.so
lr-x------ 1 root root 64 Mar 30 18:38 26 -> /lib/x86_64-linux-gnu/ld-2.19.so
l-wx------ 1 root root 64 Mar 30 18:38 27 -> /tmp/lxd_checkpoint_035782765/pipes.img
l-wx------ 1 root root 64 Mar 30 18:38 28 -> /tmp/lxd_checkpoint_035782765/inotify.img
l-wx------ 1 root root 64 Mar 30 18:38 29 -> /tmp/lxd_checkpoint_035782765/unixsk.img
l-wx------ 1 root root 64 Mar 30 18:38 3 -> /tmp/lxd_checkpoint_035782765/seccomp.img
lrwx------ 1 root root 64 Mar 30 18:38 30 -> socket:[18760]
lrwx------ 1 root root 64 Mar 30 18:38 31 -> socket:[18790]
lrwx------ 1 root root 64 Mar 30 18:38 32 -> socket:[18794]
lrwx------ 1 root root 64 Mar 30 18:38 33 -> socket:[18798]
l-wx------ 1 root root 64 Mar 30 18:38 34 -> /tmp/lxd_checkpoint_035782765/ghost-file-2.img
l-wx------ 1 root root 64 Mar 30 18:38 39 -> /var/log/upstart/systemd-logind.log.1 (deleted)
l--------- 1 root root 64 Mar 30 18:38 4 -> /proc/2139
l-wx------ 1 root root 64 Mar 30 18:38 40 -> /var/log/upstart/acpid.log.1 (deleted)
lrwx------ 1 root root 64 Mar 30 18:38 41 -> /dev/pts/ptmx
l-wx------ 1 root root 64 Mar 30 18:38 42 -> /tmp/lxd_checkpoint_035782765/remap-fpath.img
l-wx------ 1 root root 64 Mar 30 18:38 43 -> /tmp/lxd_checkpoint_035782765/reg-files.img
l-wx------ 1 root root 64 Mar 30 18:38 44 -> /tmp/lxd_checkpoint_035782765/fdinfo-2.img
l-wx------ 1 root root 64 Mar 30 18:38 45 -> /tmp/lxd_checkpoint_035782765/pipes-data.img
lrwx------ 1 root root 64 Mar 30 18:38 5 -> socket:[815446]
l-wx------ 1 root root 64 Mar 30 18:38 6 -> /tmp/lxd_checkpoint_035782765/ids-1.img
lrwx------ 1 root root 64 Mar 30 18:38 7 -> socket:[815426]
lrwx------ 1 root root 64 Mar 30 18:38 8 -> socket:[758994]
lr-x------ 1 root root 64 Mar 30 18:38 9 -> /sbin/init
(00.022884) Error (files-reg.c:631): Can't open ghost original file 39: No such file or directory
(00.022910) ----------------------------------------
(00.022914) Error (cr-dump.c:1303): Dump files (pid: 2139) failed with -1
no idea what's going on here...
Tycho
More information about the CRIU
mailing list