[CRIU] criu_restore() in Open MPI problems

Adrian Reber adrian at lisas.de
Wed Mar 19 07:05:45 PDT 2014


On Wed, Mar 19, 2014 at 05:41:40PM +0400, Andrew Vagin wrote:
> On Wed, Mar 19, 2014 at 11:29:51AM +0100, Adrian Reber wrote:
> > On Wed, Mar 19, 2014 at 12:33:30PM +0400, Andrew Vagin wrote:
> > > Could you try out the attached patch?
> > 
> > With this patch it actually tries to restore process but fails with:
> > 
> > (00.026193)  15852: tty: open type pts id 0x2 index 11 (master 0 sid 0 pgrp 0 inherit 1)
> > (00.026198)  15852: Error (tty.c:541): tty: Can't dup SELF_STDIN_OFF: Bad file descriptor
> > (00.026782) Error (cr-restore.c:1035): 15852 exited, status=255
> > (00.026810) Error (cr-restore.c:1577): Restoring FAILED.
> > 
> > Full log at http://lisas.de/~adrian/criu.log
> > 
> > Which is probably related to the way Open MPI handles stdout/stderr of
> > its child processes. I need to find out how this exactly works.
> 
> As far as I understand you are executing criu as a service, aren't you?

Yes. Criu as a service and libcriu linked into Open MPI. The code is
something like:

    criu_set_images_dir_fd(fd);

    criu_set_log_file(mca_crs_criu_component.log_file);
    criu_set_log_level(mca_crs_criu_component.log_level);
    criu_set_tcp_established(mca_crs_criu_component.tcp_established);
    criu_set_shell_job(mca_crs_criu_component.shell_job);
    criu_set_ext_unix_sk(mca_crs_criu_component.ext_unix_sk);
    criu_set_leave_running(mca_crs_criu_component.leave_running);

    ret = criu_restore();


> We have understood that the shell_job option on restore can't work
> correctly in this case, because a link on parent and a session can't be
> restored correctly. Both this parameters can be inhereted and can not be
> set.
> 
> Looks like we have only one way to execute "criu restore" directly.
> Maybe we will need to set the suid bit on criu, because it requires
> CAP_SYS_ADMIN and CAP_SYS_RESOURCE.
> 
> Adrian, I want to know a bit more about structure of a process tree,
> could you provide a bit more info:
> 
> * ps axf -o sid,gid,pid,cmd,uid,gid

 9042   500 19148   500   500  |   \_ /home/adrian/devel/openmpi-trunk/bin/orterun --am ft-enable-cr -np 1 orte-test2
 9042   500 19151   500   500  |       \_ orte-test2


> * lsof for a process and its parent

[adrian at dcbz ~]$ lsof -p 19148
COMMAND   PID   USER   FD      TYPE             DEVICE SIZE/OFF     NODE NAME
orterun 19148 adrian  cwd       DIR                8,6     4096  4276256 /home/adrian/devel/mpitest
orterun 19148 adrian  rtd       DIR                8,6     4096        2 /
orterun 19148 adrian  txt       REG                8,6   130809  4853952 /home/adrian/devel/openmpi-trunk/bin/orterun
orterun 19148 adrian  mem       REG                8,6    57976  6436012 /usr/lib64/libnss_files-2.18.so
orterun 19148 adrian  mem       REG                8,6    69712  6457664 /usr/lib64/libprotobuf-c.so.0.0.0
orterun 19148 adrian  mem       REG                8,6  2097264  6426159 /usr/lib64/libc-2.18.so
orterun 19148 adrian  mem       REG                8,6   147544  6439986 /usr/lib64/libpthread-2.18.so
orterun 19148 adrian  mem       REG                8,6  1159944  6435339 /usr/lib64/libm-2.18.so
orterun 19148 adrian  mem       REG                8,6    14608  6440555 /usr/lib64/libutil-2.18.so
orterun 19148 adrian  mem       REG                8,6   113320  6435410 /usr/lib64/libnsl-2.18.so
orterun 19148 adrian  mem       REG                8,6    44048  6440309 /usr/lib64/librt-2.18.so
orterun 19148 adrian  mem       REG                8,6    19512  6433536 /usr/lib64/libdl-2.18.so
orterun 19148 adrian  mem       REG                8,6    31832  6422555 /usr/lib64/libcriu.so.1.0
orterun 19148 adrian  mem       REG                8,6  2615952  4725410 /home/adrian/devel/openmpi-trunk/lib/libopen-pal.so.0.0.0
orterun 19148 adrian  mem       REG                8,6  5260036  4726410 /home/adrian/devel/openmpi-trunk/lib/libopen-rte.so.0.0.0
orterun 19148 adrian  mem       REG                8,6   154992  6422554 /usr/lib64/ld-2.18.so
orterun 19148 adrian    0u      CHR              136,6      0t0        9 /dev/pts/6
orterun 19148 adrian    1u      CHR              136,6      0t0        9 /dev/pts/6
orterun 19148 adrian    2u      CHR              136,6      0t0        9 /dev/pts/6
orterun 19148 adrian    3u     unix 0xffff8802106faa00      0t0 10591539 socket
orterun 19148 adrian    4u     unix 0xffff8802106fa680      0t0 10591540 socket
orterun 19148 adrian    5u  a_inode                0,9        0     7173 [eventfd]
orterun 19148 adrian    6u      REG               0,17        0 10591541 /dev/shm/open_mpi.0000 (deleted)
orterun 19148 adrian    7r     FIFO                0,8      0t0 10591543 pipe
orterun 19148 adrian    8w     FIFO                0,8      0t0 10591543 pipe
orterun 19148 adrian    9r      DIR                8,6     4096        2 /
orterun 19148 adrian   10u     IPv4           10590043      0t0      TCP *:53823 (LISTEN)
orterun 19148 adrian   11r     FIFO               0,32      0t0 10590050 /tmp/openmpi-sessions-adrian at dcbz_0/42683/0/debugger_attach_fifo
orterun 19148 adrian   12u      CHR                5,2      0t0     8661 /dev/ptmx
orterun 19148 adrian   13u     IPv4           10592260      0t0      TCP edur0000.hs-esslingen.de:53823->edur0000.hs-esslingen.de:47855 (ESTABLISHED)
orterun 19148 adrian   15w     FIFO                0,8      0t0 10590053 pipe
orterun 19148 adrian   16r     FIFO                0,8      0t0 10590054 pipe
orterun 19148 adrian   18r     FIFO                0,8      0t0 10590055 pipe
[adrian at dcbz ~]$ lsof -p 19151
COMMAND     PID   USER   FD      TYPE             DEVICE SIZE/OFF     NODE NAME
orte-test 19151 adrian  cwd       DIR                8,6     4096  4276256 /home/adrian/devel/mpitest
orte-test 19151 adrian  rtd       DIR                8,6     4096        2 /
orte-test 19151 adrian  txt       REG                8,6     8550  4241596 /home/adrian/devel/mpitest/orte-test2
orte-test 19151 adrian  mem       REG                8,6    57976  6436012 /usr/lib64/libnss_files-2.18.so
orte-test 19151 adrian  mem       REG                8,6    69712  6457664 /usr/lib64/libprotobuf-c.so.0.0.0
orte-test 19151 adrian  mem       REG                8,6  1159944  6435339 /usr/lib64/libm-2.18.so
orte-test 19151 adrian  mem       REG                8,6    14608  6440555 /usr/lib64/libutil-2.18.so
orte-test 19151 adrian  mem       REG                8,6   113320  6435410 /usr/lib64/libnsl-2.18.so
orte-test 19151 adrian  mem       REG                8,6    44048  6440309 /usr/lib64/librt-2.18.so
orte-test 19151 adrian  mem       REG                8,6    19512  6433536 /usr/lib64/libdl-2.18.so
orte-test 19151 adrian  mem       REG                8,6    31832  6422555 /usr/lib64/libcriu.so.1.0
orte-test 19151 adrian  mem       REG                8,6  2615952  4725410 /home/adrian/devel/openmpi-trunk/lib/libopen-pal.so.0.0.0
orte-test 19151 adrian  mem       REG                8,6  5260036  4726410 /home/adrian/devel/openmpi-trunk/lib/libopen-rte.so.0.0.0
orte-test 19151 adrian  mem       REG                8,6  2097264  6426159 /usr/lib64/libc-2.18.so
orte-test 19151 adrian  mem       REG                8,6   147544  6439986 /usr/lib64/libpthread-2.18.so
orte-test 19151 adrian  mem       REG                8,6 19630134  4725415 /home/adrian/devel/openmpi-trunk/lib/libmpi.so.0.0.0
orte-test 19151 adrian  mem       REG                8,6   154992  6422554 /usr/lib64/ld-2.18.so
orte-test 19151 adrian    0r     FIFO                0,8      0t0 10590053 pipe
orte-test 19151 adrian    1u      CHR             136,11      0t0       14 /dev/pts/11
orte-test 19151 adrian    2w     FIFO                0,8      0t0 10590054 pipe
orte-test 19151 adrian    3u     unix 0xffff8803ea4f7800      0t0 10590635 socket
orte-test 19151 adrian    4u     unix 0xffff8803ea4f7480      0t0 10590636 socket
orte-test 19151 adrian    5u  a_inode                0,9        0     7173 [eventfd]
orte-test 19151 adrian    6u      REG               0,17        0 10590637 /dev/shm/open_mpi.0000 (deleted)
orte-test 19151 adrian    7u     unix 0xffff8801a2044000      0t0 10590639 socket
orte-test 19151 adrian    8u     unix 0xffff8801a2046d80      0t0 10590640 socket
orte-test 19151 adrian    9u  a_inode                0,9        0     7173 [eventfd]
orte-test 19151 adrian   10u     IPv4           10590642      0t0      TCP *:38026 (LISTEN)
orte-test 19151 adrian   11u     IPv4           10591547      0t0      TCP edur0000.hs-esslingen.de:47855->edur0000.hs-esslingen.de:53823 (ESTABLISHED)
orte-test 19151 adrian   12u     IPv4           10590649      0t0      TCP *:1024 (LISTEN)
orte-test 19151 adrian   19w     FIFO                0,8      0t0 10590055 pipe
[adrian at dcbz ~]$ 


> Thanks.
> 
> > 
> > > On Wed, Mar 19, 2014 at 12:19:43PM +0400, Andrew Vagin wrote:
> > > > On Tue, Mar 18, 2014 at 10:42:41PM +0400, Cyrill Gorcunov wrote:
> > > > > On Tue, Mar 18, 2014 at 07:22:55PM +0100, Adrian Reber wrote:
> > > > > > On Tue, Mar 18, 2014 at 09:15:04PM +0400, Cyrill Gorcunov wrote:
> > > > > > > On Tue, Mar 18, 2014 at 06:03:18PM +0100, Adrian Reber wrote:
> > > > > > > > Now that dumping works from Open MPII am trying to restore.
> > > > > > > > Right now it fails with:
> > > > > > > > 
> > > > > > > > (00.000119) TCP queue memory limits are 2097152:3145728
> > > > > > > > (00.000303) cpu: fpu:1 fxsr:1 xsave:1
> > > > > > > > (00.000399) vdso: Parsing at 7fff84c27000 7fff84c29000
> > > > > > > > (00.000407) vdso: Base address ffffffffff700000
> > > > > > > > (00.000440) Reading image tree
> > > > > > > > (00.000468) Migrating process tree (GID 25983->29676 SID 9042->29676)
> > > > > > > > (00.000475) Will restore in 0 namespaces
> > > > > > > > (00.000479) NS mask to use 0
> > > > > > > > (00.000487) Collecting 41/21 (flags 0)
> > > > > > > > (00.000514)  `- ... done
> > > > > > > > (00.000520) Error (tty.c:1213): tty: Standard stream is not a terminal, aborting
> > > > > > > > 
> > > > > > > > I am not sure what this really means, but I suspect it has to do
> > > > > > > > something with dumping with criu_set_shell_job(true) and restoring from
> > > > > > > > inside a program instead of the command line. Running the command line
> > > > > > > > tool instead of the criu_restore() works much better but fails in the
> > > > > > > > end with:
> > > > > > > 
> > > > > > > Have you been dumping with --shell_job option? If yes, would it do the
> > > > > > > trick without this option?
> > > > > > 
> > > > > > Yes, I dumped with the shell_job option. Without shell_job it does not dump:
> > > > > > 
> > > > > > Error (pstree.c:196): The root process 26660 is not a session leader.  Consider using --shell-job option
> > > > > 
> > > > > Heh ;-) Could you please show ls -l /proc/<pid>/fd where <pid> is the pid of a process
> > > > > you're dumping (and also try dump with -v4 --shell-job and show complete dump log).
> > > > 
> > > > Steps to reproduce:
> > > > sleep 1000 &> /dev/null < /dev/null &
> > > > ./criu dump -t $! -D tmp --shell-job
> > > > ./criu restore -D tmp -o r.log --shell-job < /dev/null &> /dev/null
> > > > 
> > > > shell-job tries to find a current terminal even if it is not required
> > > > for restore.
> > > > 
> > > > > _______________________________________________
> > > > > CRIU mailing list
> > > > > CRIU at openvz.org
> > > > > https://lists.openvz.org/mailman/listinfo/criu
> > 
> > > diff --git a/tty.c b/tty.c
> > > index 5fca74c..660b847 100644
> > > --- a/tty.c
> > > +++ b/tty.c
> > > @@ -1215,8 +1215,8 @@ int tty_prep_fds(void)
> > >  		return 0;
> > >  
> > >  	if (!isatty(STDIN_FILENO)) {
> > > -		pr_err("Standard stream is not a terminal, aborting\n");
> > > -		return -1;
> > > +		pr_warn("Standard stream is not a terminal\n");
> > > +		return 0;
> > >  	}
> > >  
> > >  	if (install_service_fd(SELF_STDIN_OFF, STDIN_FILENO) < 0) {

		Adrian

-- 
Adrian Reber <adrian at lisas.de>            http://lisas.de/~adrian/
Finagle's Second Law:
	No matter what the anticipated result, there will always be
	someone eager to (a) misinterpret it, (b) fake it, or (c) believe it
	happened according to his own pet theory.


More information about the CRIU mailing list