[CRIU] criu_restore() in Open MPI problems
Pavel Emelyanov
xemul at parallels.com
Tue Mar 18 11:46:20 PDT 2014
On 03/18/2014 10:41 PM, Adrian Reber wrote:
> On Tue, Mar 18, 2014 at 09:56:44PM +0400, Pavel Emelyanov wrote:
>> On 03/18/2014 09:03 PM, Adrian Reber wrote:
>>> Now that dumping works from Open MPII am trying to restore.
>>> Right now it fails with:
>>>
>>> (00.000119) TCP queue memory limits are 2097152:3145728
>>> (00.000303) cpu: fpu:1 fxsr:1 xsave:1
>>> (00.000399) vdso: Parsing at 7fff84c27000 7fff84c29000
>>> (00.000407) vdso: Base address ffffffffff700000
>>> (00.000440) Reading image tree
>>> (00.000468) Migrating process tree (GID 25983->29676 SID 9042->29676)
>>> (00.000475) Will restore in 0 namespaces
>>> (00.000479) NS mask to use 0
>>> (00.000487) Collecting 41/21 (flags 0)
>>> (00.000514) `- ... done
>>> (00.000520) Error (tty.c:1213): tty: Standard stream is not a terminal, aborting
>>>
>>> I am not sure what this really means, but I suspect it has to do
>>> something with dumping with criu_set_shell_job(true) and restoring from
>>> inside a program instead of the command line. Running the command line
>>
>> Do you run it with the -j option?
>
> Yes, I am using -j on the restore.
Then you should call criu_set_shell_job on restore too.
>>> tool instead of the criu_restore() works much better but fails in the
>>> end with:
>>>
>>> pie: Restoring EXE link
>>> pie: Restoring scheduler params 0.0.0
>>> pie: 25983: Restored
>>> pie: Error (pie/restorer.c:277): Thread pid mismatch 25986/25985
>>
>> Does it _always_ ends up like this? This means that we've failed to obtain
>> the desired pid and task was created with another one.
>
> I thought it always ended up like this. But now that you said it I
> retried it and now it successfully restores the process. It then crashes
> somewhere in the Open MPI libraries because it needs to be restored by
> Open MPI which does set up the environment.
What kind of environment is it? Does it look like the issue we have with
LXC? I.e. -- when we restore container, we need to "reparent" it to an LXC
daemon, so that it thinks the container is alive. But since the parent
of the newly restored CT is crtools process, we plan to teach crtools
to call execve() after restore on LXC to tell it "hey, here's a container
restore, reattach to it and handle with care" :)
> So I can restore my process
> from the command line using criu but with the library not because of
>
> (00.000520) Error (tty.c:1213): tty: Standard stream is not a terminal, aborting
>
> It seems criu expects running on a terminal when using -j?
Well, yes. Being not a session leader was considered valid only when
run from shell, i.e. -- with a terminal. In the OpenMPI case it seems
to be the case as well.
Can you show us the process tree (with sids and pgids) before and after
you dump and restore task? With the parent process of the task you dump.
> Adrian
Thanks,
Pavel
More information about the CRIU
mailing list