[CRIU] criu_restore() in Open MPI problems
Adrian Reber
adrian at lisas.de
Wed Mar 19 07:05:45 PDT 2014
On Wed, Mar 19, 2014 at 05:41:40PM +0400, Andrew Vagin wrote:
> On Wed, Mar 19, 2014 at 11:29:51AM +0100, Adrian Reber wrote:
> > On Wed, Mar 19, 2014 at 12:33:30PM +0400, Andrew Vagin wrote:
> > > Could you try out the attached patch?
> >
> > With this patch it actually tries to restore process but fails with:
> >
> > (00.026193) 15852: tty: open type pts id 0x2 index 11 (master 0 sid 0 pgrp 0 inherit 1)
> > (00.026198) 15852: Error (tty.c:541): tty: Can't dup SELF_STDIN_OFF: Bad file descriptor
> > (00.026782) Error (cr-restore.c:1035): 15852 exited, status=255
> > (00.026810) Error (cr-restore.c:1577): Restoring FAILED.
> >
> > Full log at http://lisas.de/~adrian/criu.log
> >
> > Which is probably related to the way Open MPI handles stdout/stderr of
> > its child processes. I need to find out how this exactly works.
>
> As far as I understand you are executing criu as a service, aren't you?
Yes. Criu as a service and libcriu linked into Open MPI. The code is
something like:
criu_set_images_dir_fd(fd);
criu_set_log_file(mca_crs_criu_component.log_file);
criu_set_log_level(mca_crs_criu_component.log_level);
criu_set_tcp_established(mca_crs_criu_component.tcp_established);
criu_set_shell_job(mca_crs_criu_component.shell_job);
criu_set_ext_unix_sk(mca_crs_criu_component.ext_unix_sk);
criu_set_leave_running(mca_crs_criu_component.leave_running);
ret = criu_restore();
> We have understood that the shell_job option on restore can't work
> correctly in this case, because a link on parent and a session can't be
> restored correctly. Both this parameters can be inhereted and can not be
> set.
>
> Looks like we have only one way to execute "criu restore" directly.
> Maybe we will need to set the suid bit on criu, because it requires
> CAP_SYS_ADMIN and CAP_SYS_RESOURCE.
>
> Adrian, I want to know a bit more about structure of a process tree,
> could you provide a bit more info:
>
> * ps axf -o sid,gid,pid,cmd,uid,gid
9042 500 19148 500 500 | \_ /home/adrian/devel/openmpi-trunk/bin/orterun --am ft-enable-cr -np 1 orte-test2
9042 500 19151 500 500 | \_ orte-test2
> * lsof for a process and its parent
[adrian at dcbz ~]$ lsof -p 19148
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
orterun 19148 adrian cwd DIR 8,6 4096 4276256 /home/adrian/devel/mpitest
orterun 19148 adrian rtd DIR 8,6 4096 2 /
orterun 19148 adrian txt REG 8,6 130809 4853952 /home/adrian/devel/openmpi-trunk/bin/orterun
orterun 19148 adrian mem REG 8,6 57976 6436012 /usr/lib64/libnss_files-2.18.so
orterun 19148 adrian mem REG 8,6 69712 6457664 /usr/lib64/libprotobuf-c.so.0.0.0
orterun 19148 adrian mem REG 8,6 2097264 6426159 /usr/lib64/libc-2.18.so
orterun 19148 adrian mem REG 8,6 147544 6439986 /usr/lib64/libpthread-2.18.so
orterun 19148 adrian mem REG 8,6 1159944 6435339 /usr/lib64/libm-2.18.so
orterun 19148 adrian mem REG 8,6 14608 6440555 /usr/lib64/libutil-2.18.so
orterun 19148 adrian mem REG 8,6 113320 6435410 /usr/lib64/libnsl-2.18.so
orterun 19148 adrian mem REG 8,6 44048 6440309 /usr/lib64/librt-2.18.so
orterun 19148 adrian mem REG 8,6 19512 6433536 /usr/lib64/libdl-2.18.so
orterun 19148 adrian mem REG 8,6 31832 6422555 /usr/lib64/libcriu.so.1.0
orterun 19148 adrian mem REG 8,6 2615952 4725410 /home/adrian/devel/openmpi-trunk/lib/libopen-pal.so.0.0.0
orterun 19148 adrian mem REG 8,6 5260036 4726410 /home/adrian/devel/openmpi-trunk/lib/libopen-rte.so.0.0.0
orterun 19148 adrian mem REG 8,6 154992 6422554 /usr/lib64/ld-2.18.so
orterun 19148 adrian 0u CHR 136,6 0t0 9 /dev/pts/6
orterun 19148 adrian 1u CHR 136,6 0t0 9 /dev/pts/6
orterun 19148 adrian 2u CHR 136,6 0t0 9 /dev/pts/6
orterun 19148 adrian 3u unix 0xffff8802106faa00 0t0 10591539 socket
orterun 19148 adrian 4u unix 0xffff8802106fa680 0t0 10591540 socket
orterun 19148 adrian 5u a_inode 0,9 0 7173 [eventfd]
orterun 19148 adrian 6u REG 0,17 0 10591541 /dev/shm/open_mpi.0000 (deleted)
orterun 19148 adrian 7r FIFO 0,8 0t0 10591543 pipe
orterun 19148 adrian 8w FIFO 0,8 0t0 10591543 pipe
orterun 19148 adrian 9r DIR 8,6 4096 2 /
orterun 19148 adrian 10u IPv4 10590043 0t0 TCP *:53823 (LISTEN)
orterun 19148 adrian 11r FIFO 0,32 0t0 10590050 /tmp/openmpi-sessions-adrian at dcbz_0/42683/0/debugger_attach_fifo
orterun 19148 adrian 12u CHR 5,2 0t0 8661 /dev/ptmx
orterun 19148 adrian 13u IPv4 10592260 0t0 TCP edur0000.hs-esslingen.de:53823->edur0000.hs-esslingen.de:47855 (ESTABLISHED)
orterun 19148 adrian 15w FIFO 0,8 0t0 10590053 pipe
orterun 19148 adrian 16r FIFO 0,8 0t0 10590054 pipe
orterun 19148 adrian 18r FIFO 0,8 0t0 10590055 pipe
[adrian at dcbz ~]$ lsof -p 19151
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
orte-test 19151 adrian cwd DIR 8,6 4096 4276256 /home/adrian/devel/mpitest
orte-test 19151 adrian rtd DIR 8,6 4096 2 /
orte-test 19151 adrian txt REG 8,6 8550 4241596 /home/adrian/devel/mpitest/orte-test2
orte-test 19151 adrian mem REG 8,6 57976 6436012 /usr/lib64/libnss_files-2.18.so
orte-test 19151 adrian mem REG 8,6 69712 6457664 /usr/lib64/libprotobuf-c.so.0.0.0
orte-test 19151 adrian mem REG 8,6 1159944 6435339 /usr/lib64/libm-2.18.so
orte-test 19151 adrian mem REG 8,6 14608 6440555 /usr/lib64/libutil-2.18.so
orte-test 19151 adrian mem REG 8,6 113320 6435410 /usr/lib64/libnsl-2.18.so
orte-test 19151 adrian mem REG 8,6 44048 6440309 /usr/lib64/librt-2.18.so
orte-test 19151 adrian mem REG 8,6 19512 6433536 /usr/lib64/libdl-2.18.so
orte-test 19151 adrian mem REG 8,6 31832 6422555 /usr/lib64/libcriu.so.1.0
orte-test 19151 adrian mem REG 8,6 2615952 4725410 /home/adrian/devel/openmpi-trunk/lib/libopen-pal.so.0.0.0
orte-test 19151 adrian mem REG 8,6 5260036 4726410 /home/adrian/devel/openmpi-trunk/lib/libopen-rte.so.0.0.0
orte-test 19151 adrian mem REG 8,6 2097264 6426159 /usr/lib64/libc-2.18.so
orte-test 19151 adrian mem REG 8,6 147544 6439986 /usr/lib64/libpthread-2.18.so
orte-test 19151 adrian mem REG 8,6 19630134 4725415 /home/adrian/devel/openmpi-trunk/lib/libmpi.so.0.0.0
orte-test 19151 adrian mem REG 8,6 154992 6422554 /usr/lib64/ld-2.18.so
orte-test 19151 adrian 0r FIFO 0,8 0t0 10590053 pipe
orte-test 19151 adrian 1u CHR 136,11 0t0 14 /dev/pts/11
orte-test 19151 adrian 2w FIFO 0,8 0t0 10590054 pipe
orte-test 19151 adrian 3u unix 0xffff8803ea4f7800 0t0 10590635 socket
orte-test 19151 adrian 4u unix 0xffff8803ea4f7480 0t0 10590636 socket
orte-test 19151 adrian 5u a_inode 0,9 0 7173 [eventfd]
orte-test 19151 adrian 6u REG 0,17 0 10590637 /dev/shm/open_mpi.0000 (deleted)
orte-test 19151 adrian 7u unix 0xffff8801a2044000 0t0 10590639 socket
orte-test 19151 adrian 8u unix 0xffff8801a2046d80 0t0 10590640 socket
orte-test 19151 adrian 9u a_inode 0,9 0 7173 [eventfd]
orte-test 19151 adrian 10u IPv4 10590642 0t0 TCP *:38026 (LISTEN)
orte-test 19151 adrian 11u IPv4 10591547 0t0 TCP edur0000.hs-esslingen.de:47855->edur0000.hs-esslingen.de:53823 (ESTABLISHED)
orte-test 19151 adrian 12u IPv4 10590649 0t0 TCP *:1024 (LISTEN)
orte-test 19151 adrian 19w FIFO 0,8 0t0 10590055 pipe
[adrian at dcbz ~]$
> Thanks.
>
> >
> > > On Wed, Mar 19, 2014 at 12:19:43PM +0400, Andrew Vagin wrote:
> > > > On Tue, Mar 18, 2014 at 10:42:41PM +0400, Cyrill Gorcunov wrote:
> > > > > On Tue, Mar 18, 2014 at 07:22:55PM +0100, Adrian Reber wrote:
> > > > > > On Tue, Mar 18, 2014 at 09:15:04PM +0400, Cyrill Gorcunov wrote:
> > > > > > > On Tue, Mar 18, 2014 at 06:03:18PM +0100, Adrian Reber wrote:
> > > > > > > > Now that dumping works from Open MPII am trying to restore.
> > > > > > > > Right now it fails with:
> > > > > > > >
> > > > > > > > (00.000119) TCP queue memory limits are 2097152:3145728
> > > > > > > > (00.000303) cpu: fpu:1 fxsr:1 xsave:1
> > > > > > > > (00.000399) vdso: Parsing at 7fff84c27000 7fff84c29000
> > > > > > > > (00.000407) vdso: Base address ffffffffff700000
> > > > > > > > (00.000440) Reading image tree
> > > > > > > > (00.000468) Migrating process tree (GID 25983->29676 SID 9042->29676)
> > > > > > > > (00.000475) Will restore in 0 namespaces
> > > > > > > > (00.000479) NS mask to use 0
> > > > > > > > (00.000487) Collecting 41/21 (flags 0)
> > > > > > > > (00.000514) `- ... done
> > > > > > > > (00.000520) Error (tty.c:1213): tty: Standard stream is not a terminal, aborting
> > > > > > > >
> > > > > > > > I am not sure what this really means, but I suspect it has to do
> > > > > > > > something with dumping with criu_set_shell_job(true) and restoring from
> > > > > > > > inside a program instead of the command line. Running the command line
> > > > > > > > tool instead of the criu_restore() works much better but fails in the
> > > > > > > > end with:
> > > > > > >
> > > > > > > Have you been dumping with --shell_job option? If yes, would it do the
> > > > > > > trick without this option?
> > > > > >
> > > > > > Yes, I dumped with the shell_job option. Without shell_job it does not dump:
> > > > > >
> > > > > > Error (pstree.c:196): The root process 26660 is not a session leader. Consider using --shell-job option
> > > > >
> > > > > Heh ;-) Could you please show ls -l /proc/<pid>/fd where <pid> is the pid of a process
> > > > > you're dumping (and also try dump with -v4 --shell-job and show complete dump log).
> > > >
> > > > Steps to reproduce:
> > > > sleep 1000 &> /dev/null < /dev/null &
> > > > ./criu dump -t $! -D tmp --shell-job
> > > > ./criu restore -D tmp -o r.log --shell-job < /dev/null &> /dev/null
> > > >
> > > > shell-job tries to find a current terminal even if it is not required
> > > > for restore.
> > > >
> > > > > _______________________________________________
> > > > > CRIU mailing list
> > > > > CRIU at openvz.org
> > > > > https://lists.openvz.org/mailman/listinfo/criu
> >
> > > diff --git a/tty.c b/tty.c
> > > index 5fca74c..660b847 100644
> > > --- a/tty.c
> > > +++ b/tty.c
> > > @@ -1215,8 +1215,8 @@ int tty_prep_fds(void)
> > > return 0;
> > >
> > > if (!isatty(STDIN_FILENO)) {
> > > - pr_err("Standard stream is not a terminal, aborting\n");
> > > - return -1;
> > > + pr_warn("Standard stream is not a terminal\n");
> > > + return 0;
> > > }
> > >
> > > if (install_service_fd(SELF_STDIN_OFF, STDIN_FILENO) < 0) {
Adrian
--
Adrian Reber <adrian at lisas.de> http://lisas.de/~adrian/
Finagle's Second Law:
No matter what the anticipated result, there will always be
someone eager to (a) misinterpret it, (b) fake it, or (c) believe it
happened according to his own pet theory.
More information about the CRIU
mailing list