[CRIU] criu_restore() in Open MPI problems

Adrian Reber adrian at lisas.de
Wed Mar 19 03:29:51 PDT 2014


On Wed, Mar 19, 2014 at 12:33:30PM +0400, Andrew Vagin wrote:
> Could you try out the attached patch?

With this patch it actually tries to restore process but fails with:

(00.026193)  15852: tty: open type pts id 0x2 index 11 (master 0 sid 0 pgrp 0 inherit 1)
(00.026198)  15852: Error (tty.c:541): tty: Can't dup SELF_STDIN_OFF: Bad file descriptor
(00.026782) Error (cr-restore.c:1035): 15852 exited, status=255
(00.026810) Error (cr-restore.c:1577): Restoring FAILED.

Full log at http://lisas.de/~adrian/criu.log

Which is probably related to the way Open MPI handles stdout/stderr of
its child processes. I need to find out how this exactly works.

> On Wed, Mar 19, 2014 at 12:19:43PM +0400, Andrew Vagin wrote:
> > On Tue, Mar 18, 2014 at 10:42:41PM +0400, Cyrill Gorcunov wrote:
> > > On Tue, Mar 18, 2014 at 07:22:55PM +0100, Adrian Reber wrote:
> > > > On Tue, Mar 18, 2014 at 09:15:04PM +0400, Cyrill Gorcunov wrote:
> > > > > On Tue, Mar 18, 2014 at 06:03:18PM +0100, Adrian Reber wrote:
> > > > > > Now that dumping works from Open MPII am trying to restore.
> > > > > > Right now it fails with:
> > > > > > 
> > > > > > (00.000119) TCP queue memory limits are 2097152:3145728
> > > > > > (00.000303) cpu: fpu:1 fxsr:1 xsave:1
> > > > > > (00.000399) vdso: Parsing at 7fff84c27000 7fff84c29000
> > > > > > (00.000407) vdso: Base address ffffffffff700000
> > > > > > (00.000440) Reading image tree
> > > > > > (00.000468) Migrating process tree (GID 25983->29676 SID 9042->29676)
> > > > > > (00.000475) Will restore in 0 namespaces
> > > > > > (00.000479) NS mask to use 0
> > > > > > (00.000487) Collecting 41/21 (flags 0)
> > > > > > (00.000514)  `- ... done
> > > > > > (00.000520) Error (tty.c:1213): tty: Standard stream is not a terminal, aborting
> > > > > > 
> > > > > > I am not sure what this really means, but I suspect it has to do
> > > > > > something with dumping with criu_set_shell_job(true) and restoring from
> > > > > > inside a program instead of the command line. Running the command line
> > > > > > tool instead of the criu_restore() works much better but fails in the
> > > > > > end with:
> > > > > 
> > > > > Have you been dumping with --shell_job option? If yes, would it do the
> > > > > trick without this option?
> > > > 
> > > > Yes, I dumped with the shell_job option. Without shell_job it does not dump:
> > > > 
> > > > Error (pstree.c:196): The root process 26660 is not a session leader.  Consider using --shell-job option
> > > 
> > > Heh ;-) Could you please show ls -l /proc/<pid>/fd where <pid> is the pid of a process
> > > you're dumping (and also try dump with -v4 --shell-job and show complete dump log).
> > 
> > Steps to reproduce:
> > sleep 1000 &> /dev/null < /dev/null &
> > ./criu dump -t $! -D tmp --shell-job
> > ./criu restore -D tmp -o r.log --shell-job < /dev/null &> /dev/null
> > 
> > shell-job tries to find a current terminal even if it is not required
> > for restore.
> > 
> > > _______________________________________________
> > > CRIU mailing list
> > > CRIU at openvz.org
> > > https://lists.openvz.org/mailman/listinfo/criu

> diff --git a/tty.c b/tty.c
> index 5fca74c..660b847 100644
> --- a/tty.c
> +++ b/tty.c
> @@ -1215,8 +1215,8 @@ int tty_prep_fds(void)
>  		return 0;
>  
>  	if (!isatty(STDIN_FILENO)) {
> -		pr_err("Standard stream is not a terminal, aborting\n");
> -		return -1;
> +		pr_warn("Standard stream is not a terminal\n");
> +		return 0;
>  	}
>  
>  	if (install_service_fd(SELF_STDIN_OFF, STDIN_FILENO) < 0) {


More information about the CRIU mailing list