[CRIU] criu_restore() in Open MPI problems
Adrian Reber
adrian at lisas.de
Wed Apr 9 06:51:01 PDT 2014
On Mon, Mar 24, 2014 at 03:07:11PM +0400, Andrew Vagin wrote:
> > [adrian at dcbz ~]$ lsof -p 19148
> > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> > orterun 19148 adrian cwd DIR 8,6 4096 4276256 /home/adrian/devel/mpitest
> > orterun 19148 adrian rtd DIR 8,6 4096 2 /
> > orterun 19148 adrian txt REG 8,6 130809 4853952 /home/adrian/devel/openmpi-trunk/bin/orterun
> > orterun 19148 adrian mem REG 8,6 57976 6436012 /usr/lib64/libnss_files-2.18.so
> > orterun 19148 adrian mem REG 8,6 69712 6457664 /usr/lib64/libprotobuf-c.so.0.0.0
> > orterun 19148 adrian mem REG 8,6 2097264 6426159 /usr/lib64/libc-2.18.so
> > orterun 19148 adrian mem REG 8,6 147544 6439986 /usr/lib64/libpthread-2.18.so
> > orterun 19148 adrian mem REG 8,6 1159944 6435339 /usr/lib64/libm-2.18.so
> > orterun 19148 adrian mem REG 8,6 14608 6440555 /usr/lib64/libutil-2.18.so
> > orterun 19148 adrian mem REG 8,6 113320 6435410 /usr/lib64/libnsl-2.18.so
> > orterun 19148 adrian mem REG 8,6 44048 6440309 /usr/lib64/librt-2.18.so
> > orterun 19148 adrian mem REG 8,6 19512 6433536 /usr/lib64/libdl-2.18.so
> > orterun 19148 adrian mem REG 8,6 31832 6422555 /usr/lib64/libcriu.so.1.0
> > orterun 19148 adrian mem REG 8,6 2615952 4725410 /home/adrian/devel/openmpi-trunk/lib/libopen-pal.so.0.0.0
> > orterun 19148 adrian mem REG 8,6 5260036 4726410 /home/adrian/devel/openmpi-trunk/lib/libopen-rte.so.0.0.0
> > orterun 19148 adrian mem REG 8,6 154992 6422554 /usr/lib64/ld-2.18.so
> > orterun 19148 adrian 0u CHR 136,6 0t0 9 /dev/pts/6
> > orterun 19148 adrian 1u CHR 136,6 0t0 9 /dev/pts/6
> > orterun 19148 adrian 2u CHR 136,6 0t0 9 /dev/pts/6
> > orterun 19148 adrian 3u unix 0xffff8802106faa00 0t0 10591539 socket
> > orterun 19148 adrian 4u unix 0xffff8802106fa680 0t0 10591540 socket
> > orterun 19148 adrian 5u a_inode 0,9 0 7173 [eventfd]
> > orterun 19148 adrian 6u REG 0,17 0 10591541 /dev/shm/open_mpi.0000 (deleted)
> > orterun 19148 adrian 7r FIFO 0,8 0t0 10591543 pipe
> > orterun 19148 adrian 8w FIFO 0,8 0t0 10591543 pipe
> > orterun 19148 adrian 9r DIR 8,6 4096 2 /
> > orterun 19148 adrian 10u IPv4 10590043 0t0 TCP *:53823 (LISTEN)
> > orterun 19148 adrian 11r FIFO 0,32 0t0 10590050 /tmp/openmpi-sessions-adrian at dcbz_0/42683/0/debugger_attach_fifo
> > orterun 19148 adrian 12u CHR 5,2 0t0 8661 /dev/ptmx
> > orterun 19148 adrian 13u IPv4 10592260 0t0 TCP edur0000.hs-esslingen.de:53823->edur0000.hs-esslingen.de:47855 (ESTABLISHED)
> > orterun 19148 adrian 15w FIFO 0,8 0t0 10590053 pipe
> > orterun 19148 adrian 16r FIFO 0,8 0t0 10590054 pipe
> > orterun 19148 adrian 18r FIFO 0,8 0t0 10590055 pipe
> > [adrian at dcbz ~]$ lsof -p 19151
> > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> > orte-test 19151 adrian cwd DIR 8,6 4096 4276256 /home/adrian/devel/mpitest
> > orte-test 19151 adrian rtd DIR 8,6 4096 2 /
> > orte-test 19151 adrian txt REG 8,6 8550 4241596 /home/adrian/devel/mpitest/orte-test2
> > orte-test 19151 adrian mem REG 8,6 57976 6436012 /usr/lib64/libnss_files-2.18.so
> > orte-test 19151 adrian mem REG 8,6 69712 6457664 /usr/lib64/libprotobuf-c.so.0.0.0
> > orte-test 19151 adrian mem REG 8,6 1159944 6435339 /usr/lib64/libm-2.18.so
> > orte-test 19151 adrian mem REG 8,6 14608 6440555 /usr/lib64/libutil-2.18.so
> > orte-test 19151 adrian mem REG 8,6 113320 6435410 /usr/lib64/libnsl-2.18.so
> > orte-test 19151 adrian mem REG 8,6 44048 6440309 /usr/lib64/librt-2.18.so
> > orte-test 19151 adrian mem REG 8,6 19512 6433536 /usr/lib64/libdl-2.18.so
> > orte-test 19151 adrian mem REG 8,6 31832 6422555 /usr/lib64/libcriu.so.1.0
> > orte-test 19151 adrian mem REG 8,6 2615952 4725410 /home/adrian/devel/openmpi-trunk/lib/libopen-pal.so.0.0.0
> > orte-test 19151 adrian mem REG 8,6 5260036 4726410 /home/adrian/devel/openmpi-trunk/lib/libopen-rte.so.0.0.0
> > orte-test 19151 adrian mem REG 8,6 2097264 6426159 /usr/lib64/libc-2.18.so
> > orte-test 19151 adrian mem REG 8,6 147544 6439986 /usr/lib64/libpthread-2.18.so
> > orte-test 19151 adrian mem REG 8,6 19630134 4725415 /home/adrian/devel/openmpi-trunk/lib/libmpi.so.0.0.0
> > orte-test 19151 adrian mem REG 8,6 154992 6422554 /usr/lib64/ld-2.18.so
> > orte-test 19151 adrian 0r FIFO 0,8 0t0 10590053 pipe
> > orte-test 19151 adrian 1u CHR 136,11 0t0 14 /dev/pts/11
> > orte-test 19151 adrian 2w FIFO 0,8 0t0 10590054 pipe
> > orte-test 19151 adrian 3u unix 0xffff8803ea4f7800 0t0 10590635 socket
> > orte-test 19151 adrian 4u unix 0xffff8803ea4f7480 0t0 10590636 socket
> > orte-test 19151 adrian 5u a_inode 0,9 0 7173 [eventfd]
> > orte-test 19151 adrian 6u REG 0,17 0 10590637 /dev/shm/open_mpi.0000 (deleted)
> > orte-test 19151 adrian 7u unix 0xffff8801a2044000 0t0 10590639 socket
> > orte-test 19151 adrian 8u unix 0xffff8801a2046d80 0t0 10590640 socket
> > orte-test 19151 adrian 9u a_inode 0,9 0 7173 [eventfd]
> > orte-test 19151 adrian 10u IPv4 10590642 0t0 TCP *:38026 (LISTEN)
> > orte-test 19151 adrian 11u IPv4 10591547 0t0 TCP edur0000.hs-esslingen.de:47855->edur0000.hs-esslingen.de:53823 (ESTABLISHED)
> > orte-test 19151 adrian 12u IPv4 10590649 0t0 TCP *:1024 (LISTEN)
> > orte-test 19151 adrian 19w FIFO 0,8 0t0 10590055 pipe
>
> You can see that orterun and ort-test2 have tree common pipes. They are
> created by orterun. As I understand, orterun is not dumped, so
> these pipes are external resources for CRIU and we will need to write a
> plugin for restoring them.
>
> I think the restore scheme should look like this:
> We run orterun, which prepare pipes and executes "CRIU restore".
> The OpenMPI plugin takes preparate pipes and restores them in a proper
> file descriptors.
It took me a while, but now I tried to restore a process from
orterun/mpirun with exec()ing 'criu restore' I still get this error:
(00.030652) 4277: tty: open type pts id 0x2 index 14 (master 0 sid 0 pgrp 0 inherit 1)
(00.030654) 4277: Error (tty.c:541): tty: Can't dup SELF_STDIN_OFF: Bad file descriptor
(00.031052) Error (cr-restore.c:1035): 4277 exited, status=255
(00.031072) Error (cr-restore.c:1577): Restoring FAILED.
You are talking about a plugin which restores the missing pipes. How
would such a plugin have to look like? Do you have any examples on
re-creating pipes in a plugin?
Adrian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 811 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20140409/b9bec29b/attachment-0001.sig>
More information about the CRIU
mailing list