[CRIU] Ghost file: no such file or directory
Tycho Andersen
tycho.andersen at canonical.com
Wed Mar 30 09:53:22 PDT 2016
On Wed, Mar 30, 2016 at 10:42:57AM -0600, Tycho Andersen wrote:
> On Wed, Mar 30, 2016 at 06:15:20PM +0300, Pavel Emelyanov wrote:
> > On 03/30/2016 06:10 PM, Tycho Andersen wrote:
> > > On Wed, Mar 30, 2016 at 06:04:04PM +0300, Pavel Emelyanov wrote:
> > >> On 03/30/2016 05:59 PM, Tycho Andersen wrote:
> > >>> On Wed, Mar 30, 2016 at 05:50:32PM +0300, Pavel Emelyanov wrote:
> > >>>> On 03/30/2016 05:46 PM, Tycho Andersen wrote:
> > >>>>> On Wed, Mar 30, 2016 at 05:26:15PM +0300, Pavel Emelyanov wrote:
> > >>>>>> On 03/30/2016 03:33 PM, Federico Reghenzani wrote:
> > >>>>>>> Hi all!
> > >>>>>>>
> > >>>>>>> We have problem restoring Open MPI daemons with child processes that uses shared memory:
> > >>>>>>>
> > >>>>>>> (00.022447) 255: Opening ghost file 0x3 for tmp/openmpi-sessions-root at roaster-vm3_0/60995/1/shared_mem_pool.roaster-vm3.1
> > >>>>>>> (00.022479) 255: Error (files-reg.c:139): Can't open ghost file //tmp/openmpi-sessions-root at roaster-vm3_0/60995/1/shared_mem_pool.roaster-vm3.1.cr.3.ghost: No such file or directory
> > >>>>>>
> > >>>>>> Can you check whether the \dirname of this path exists?
> > >>>>>> I mean this -- //tmp/openmpi-sessions-root at roaster-vm3_0/60995/1/
> > >>>>>>
> > >>>>>> Presumably this is the case when not only the file was removed, but also
> > >>>>>> some dir components. And we've fixed it only in 2.0.
> > >>>>>
> > >>>>> The users I had report it were using post 2.0, so there's some other
> > >>>>> bug here. I'm trying to reproduce now again, but not having any luck
> > >>>>> :(
> > >>>>
> > >>>> Do they also see the ENOENT errno from the open(O_CREAT) call?
> > >>>
> > >>> Which call do you mean here? I don't see anything close to this that
> > >>> does an O_CREAT on dump.
> > >>
> > >> Yes, because the error is on restore :) For 1.8 this is files-reg.c create_ghost()'s
> > >
> > > Oh, derp, I misread the error. I've seen failures reported on dump
> > > here:
> > >
> > > https://github.com/xemul/criu/blob/master/criu/files-reg.c#L628
> >
> > Ouch! And what was the errno?! I can hardly imagine the reason for a regular
> > file open failure via proc :(
>
> Even more bizarrely, it seems like the fd does actually exist. Here's
> a log with a call to cr_system("ls -alh /proc/self/fd") just before we
> try to open the file (fd 39):
Hmm. Maybe we need an O_NOFOLLOW somehow on the re-open call for the
original ghost file? I'll play around with it.
Tycho
> (00.019478) Dumping path for 15 fd via self 39 [/var/log/upstart/systemd-logind.log.1 (deleted)]
> (00.019480) Strip ' (deleted)' tag from './var/log/upstart/systemd-logind.log.1 (deleted)'
> (00.019482) Dumping ghost file for fd 39 id 0xb
> (00.019484) mnt: Path `/var/log/upstart/systemd-logind.log.1' resolved to `./' mountpoint
> (00.019486) Dumping ghost file contents (id 0x2)
> total 0
> dr-x------ 2 root root 0 Mar 30 18:38 .
> dr-xr-xr-x 9 root root 0 Mar 30 18:38 ..
> lr-x------ 1 root root 64 Mar 30 18:38 0 -> /proc/20023/fd
> l-wx------ 1 root root 64 Mar 30 18:38 1 -> /tmp/lxd_checkpoint_035782765/dump.log
> l--------- 1 root root 64 Mar 30 18:38 10 -> /proc/20018
> lr-x------ 1 root root 64 Mar 30 18:38 11 -> /lib/x86_64-linux-gnu/libnss_files-2.19.so
> lr-x------ 1 root root 64 Mar 30 18:38 12 -> /lib/x86_64-linux-gnu/libnss_nis-2.19.so
> lr-x------ 1 root root 64 Mar 30 18:38 13 -> /lib/x86_64-linux-gnu/libnsl-2.19.so
> lr-x------ 1 root root 64 Mar 30 18:38 14 -> /lib/x86_64-linux-gnu/libnss_compat-2.19.so
> lr-x------ 1 root root 64 Mar 30 18:38 15 -> /lib/x86_64-linux-gnu/libdl-2.19.so
> lr-x------ 1 root root 64 Mar 30 18:38 16 -> /lib/x86_64-linux-gnu/libpcre.so.3.13.1
> lr-x------ 1 root root 64 Mar 30 18:38 17 -> /lib/x86_64-linux-gnu/libpthread-2.19.so
> lr-x------ 1 root root 64 Mar 30 18:38 18 -> /lib/x86_64-linux-gnu/libc-2.19.so
> lr-x------ 1 root root 64 Mar 30 18:38 19 -> /lib/x86_64-linux-gnu/librt-2.19.so
> l-wx------ 1 root root 64 Mar 30 18:38 2 -> /tmp/lxd_checkpoint_035782765/dump.log
> lr-x------ 1 root root 64 Mar 30 18:38 20 -> /lib/x86_64-linux-gnu/libjson-c.so.2.0.0
> lr-x------ 1 root root 64 Mar 30 18:38 21 -> /lib/x86_64-linux-gnu/libselinux.so.1
> lr-x------ 1 root root 64 Mar 30 18:38 22 -> /lib/x86_64-linux-gnu/libdbus-1.so.3.7.6
> lr-x------ 1 root root 64 Mar 30 18:38 23 -> /lib/x86_64-linux-gnu/libnih-dbus.so.1.0.0
> lr-x------ 1 root root 64 Mar 30 18:38 24 -> /lib/x86_64-linux-gnu/libnih.so.1.0.0
> lr-x------ 1 root root 64 Mar 30 18:38 25 -> /lib/x86_64-linux-gnu/ld-2.19.so
> lr-x------ 1 root root 64 Mar 30 18:38 26 -> /lib/x86_64-linux-gnu/ld-2.19.so
> l-wx------ 1 root root 64 Mar 30 18:38 27 -> /tmp/lxd_checkpoint_035782765/pipes.img
> l-wx------ 1 root root 64 Mar 30 18:38 28 -> /tmp/lxd_checkpoint_035782765/inotify.img
> l-wx------ 1 root root 64 Mar 30 18:38 29 -> /tmp/lxd_checkpoint_035782765/unixsk.img
> l-wx------ 1 root root 64 Mar 30 18:38 3 -> /tmp/lxd_checkpoint_035782765/seccomp.img
> lrwx------ 1 root root 64 Mar 30 18:38 30 -> socket:[18760]
> lrwx------ 1 root root 64 Mar 30 18:38 31 -> socket:[18790]
> lrwx------ 1 root root 64 Mar 30 18:38 32 -> socket:[18794]
> lrwx------ 1 root root 64 Mar 30 18:38 33 -> socket:[18798]
> l-wx------ 1 root root 64 Mar 30 18:38 34 -> /tmp/lxd_checkpoint_035782765/ghost-file-2.img
> l-wx------ 1 root root 64 Mar 30 18:38 39 -> /var/log/upstart/systemd-logind.log.1 (deleted)
> l--------- 1 root root 64 Mar 30 18:38 4 -> /proc/2139
> l-wx------ 1 root root 64 Mar 30 18:38 40 -> /var/log/upstart/acpid.log.1 (deleted)
> lrwx------ 1 root root 64 Mar 30 18:38 41 -> /dev/pts/ptmx
> l-wx------ 1 root root 64 Mar 30 18:38 42 -> /tmp/lxd_checkpoint_035782765/remap-fpath.img
> l-wx------ 1 root root 64 Mar 30 18:38 43 -> /tmp/lxd_checkpoint_035782765/reg-files.img
> l-wx------ 1 root root 64 Mar 30 18:38 44 -> /tmp/lxd_checkpoint_035782765/fdinfo-2.img
> l-wx------ 1 root root 64 Mar 30 18:38 45 -> /tmp/lxd_checkpoint_035782765/pipes-data.img
> lrwx------ 1 root root 64 Mar 30 18:38 5 -> socket:[815446]
> l-wx------ 1 root root 64 Mar 30 18:38 6 -> /tmp/lxd_checkpoint_035782765/ids-1.img
> lrwx------ 1 root root 64 Mar 30 18:38 7 -> socket:[815426]
> lrwx------ 1 root root 64 Mar 30 18:38 8 -> socket:[758994]
> lr-x------ 1 root root 64 Mar 30 18:38 9 -> /sbin/init
> (00.022884) Error (files-reg.c:631): Can't open ghost original file 39: No such file or directory
> (00.022910) ----------------------------------------
> (00.022914) Error (cr-dump.c:1303): Dump files (pid: 2139) failed with -1
>
> no idea what's going on here...
>
> Tycho
More information about the CRIU
mailing list