[CRIU] criu_restore() in Open MPI problems

Andrew Vagin avagin at parallels.com
Wed Mar 19 06:41:40 PDT 2014


On Wed, Mar 19, 2014 at 11:29:51AM +0100, Adrian Reber wrote:
> On Wed, Mar 19, 2014 at 12:33:30PM +0400, Andrew Vagin wrote:
> > Could you try out the attached patch?
> 
> With this patch it actually tries to restore process but fails with:
> 
> (00.026193)  15852: tty: open type pts id 0x2 index 11 (master 0 sid 0 pgrp 0 inherit 1)
> (00.026198)  15852: Error (tty.c:541): tty: Can't dup SELF_STDIN_OFF: Bad file descriptor
> (00.026782) Error (cr-restore.c:1035): 15852 exited, status=255
> (00.026810) Error (cr-restore.c:1577): Restoring FAILED.
> 
> Full log at http://lisas.de/~adrian/criu.log
> 
> Which is probably related to the way Open MPI handles stdout/stderr of
> its child processes. I need to find out how this exactly works.

As far as I understand you are executing criu as a service, aren't you?

We have understood that the shell_job option on restore can't work
correctly in this case, because a link on parent and a session can't be
restored correctly. Both this parameters can be inhereted and can not be
set.

Looks like we have only one way to execute "criu restore" directly.
Maybe we will need to set the suid bit on criu, because it requires
CAP_SYS_ADMIN and CAP_SYS_RESOURCE.

Adrian, I want to know a bit more about structure of a process tree,
could you provide a bit more info:

* ps axf -o sid,gid,pid,cmd,uid,gid
* lsof for a process and its parent

Thanks.

> 
> > On Wed, Mar 19, 2014 at 12:19:43PM +0400, Andrew Vagin wrote:
> > > On Tue, Mar 18, 2014 at 10:42:41PM +0400, Cyrill Gorcunov wrote:
> > > > On Tue, Mar 18, 2014 at 07:22:55PM +0100, Adrian Reber wrote:
> > > > > On Tue, Mar 18, 2014 at 09:15:04PM +0400, Cyrill Gorcunov wrote:
> > > > > > On Tue, Mar 18, 2014 at 06:03:18PM +0100, Adrian Reber wrote:
> > > > > > > Now that dumping works from Open MPII am trying to restore.
> > > > > > > Right now it fails with:
> > > > > > > 
> > > > > > > (00.000119) TCP queue memory limits are 2097152:3145728
> > > > > > > (00.000303) cpu: fpu:1 fxsr:1 xsave:1
> > > > > > > (00.000399) vdso: Parsing at 7fff84c27000 7fff84c29000
> > > > > > > (00.000407) vdso: Base address ffffffffff700000
> > > > > > > (00.000440) Reading image tree
> > > > > > > (00.000468) Migrating process tree (GID 25983->29676 SID 9042->29676)
> > > > > > > (00.000475) Will restore in 0 namespaces
> > > > > > > (00.000479) NS mask to use 0
> > > > > > > (00.000487) Collecting 41/21 (flags 0)
> > > > > > > (00.000514)  `- ... done
> > > > > > > (00.000520) Error (tty.c:1213): tty: Standard stream is not a terminal, aborting
> > > > > > > 
> > > > > > > I am not sure what this really means, but I suspect it has to do
> > > > > > > something with dumping with criu_set_shell_job(true) and restoring from
> > > > > > > inside a program instead of the command line. Running the command line
> > > > > > > tool instead of the criu_restore() works much better but fails in the
> > > > > > > end with:
> > > > > > 
> > > > > > Have you been dumping with --shell_job option? If yes, would it do the
> > > > > > trick without this option?
> > > > > 
> > > > > Yes, I dumped with the shell_job option. Without shell_job it does not dump:
> > > > > 
> > > > > Error (pstree.c:196): The root process 26660 is not a session leader.  Consider using --shell-job option
> > > > 
> > > > Heh ;-) Could you please show ls -l /proc/<pid>/fd where <pid> is the pid of a process
> > > > you're dumping (and also try dump with -v4 --shell-job and show complete dump log).
> > > 
> > > Steps to reproduce:
> > > sleep 1000 &> /dev/null < /dev/null &
> > > ./criu dump -t $! -D tmp --shell-job
> > > ./criu restore -D tmp -o r.log --shell-job < /dev/null &> /dev/null
> > > 
> > > shell-job tries to find a current terminal even if it is not required
> > > for restore.
> > > 
> > > > _______________________________________________
> > > > CRIU mailing list
> > > > CRIU at openvz.org
> > > > https://lists.openvz.org/mailman/listinfo/criu
> 
> > diff --git a/tty.c b/tty.c
> > index 5fca74c..660b847 100644
> > --- a/tty.c
> > +++ b/tty.c
> > @@ -1215,8 +1215,8 @@ int tty_prep_fds(void)
> >  		return 0;
> >  
> >  	if (!isatty(STDIN_FILENO)) {
> > -		pr_err("Standard stream is not a terminal, aborting\n");
> > -		return -1;
> > +		pr_warn("Standard stream is not a terminal\n");
> > +		return 0;
> >  	}
> >  
> >  	if (install_service_fd(SELF_STDIN_OFF, STDIN_FILENO) < 0) {


More information about the CRIU mailing list