[CRIU] About re-parenting
Pavel Emelyanov
xemul at parallels.com
Tue Mar 25 06:25:48 PDT 2014
On 03/25/2014 05:08 PM, Adrian Reber wrote:
> On Tue, Mar 25, 2014 at 01:54:53AM +0400, Pavel Emelyanov wrote:
>> On 03/25/2014 01:21 AM, Adrian Reber wrote:
>>> Thanks so far for all your help. I am still unsure how to handle
>>> restarting in Open MPI in regards to stdin/stdout redirection and
>>> re-parenting. This is a partly to the fact that I still do not
>>> understand it completely. Too many too complex components (Open MPI and
>>> criu). I think the stdin/stdout problem can be solved but I am not sure
>>> how the re-parenting can/must/should work.
>>>
>>> Open MPI has orte-restart which analyzes the metadata from the previous
>>> checkpoint and then starts mpirun with the correct parameters. mpirun
>>> starts the corresponding numbers of child processes by starting multiple
>>> copies of opal-restart. opal-restart then tries to restart the
>>> checkpointed process using CRIU. What it expects is that after some
>>> initialization it calls criu_restore() and is then replaced by the
>>> checkpointed process like exec() would do.
>>
>> Hmm... If I get it right, the restore process might look like this.
>>
>> 1. someone exec()-utes orte-restart with options
>> 2. orte-restart exec()-utes criu with suid bit on it and with action
>> "restore" and exec-cmd (recently committed this patch from Deyan)
>> telling that after restore it should proceed to step 4
>> 3. criu does regular restore process. With suid bit it should be possible
>> 4. criu calls exec() on orte-restart again with options telling it that
>> there's a new subtree alive under it
>>
>> Service would not be suitable for that, as it can only create detached
>> subtree, we don't have any API in the kernel to re-parent tasks :(
>
> Okay. Good to know.
>
>> IOW the syscalls and process tree would look like this
>>
>> 1. exec("orte-restart", "restore", ...)
>>
>> 12 open-mpi-engine
>> 13 `- orte-restart
>>
>> 2. orte calls criu with exec("criu", "restore", ...)
>>
>> 12 open-mpi-engine
>> 13 `- criu restore
>>
>> 3. criu does restoring -- forking tasks and restoring it
>>
>> 12 open-mpi-engine
>> 13 `- criu restore
>> 125 `- my-openmpi-process
>>
>> 4. criu calls exec("orte-restart", "continue-watching-your-kids", ...)
>>
>> 12 open-mpi-engine
>> 13 `- orte-restart
>> 125 `- my-openmpi-process
>>
>> Does it look like what we want? The question how to preserve the pipes is
>> still open, but let's sort out how to restore the process linkage first.
>
> This is almost what we want and I agree resolving process linkage is the
> more important part.
>
> At the end it should look like this:
>
> 12 open-mpi-engine
> 13 `- my-openmpi-process
Hm, but the pid of my-openmpi-process should be not generated, but
restored to the exact value it used to have, i.e. the tree should
look like
12 open-mpi-engine
125 `- my-openmpi-process
> The restart process should be replaced by the process which has been
> restored and the restored process should be the child of the Open MPI
> runtime.
I see. In other words the orte-restore should transform itself into the
process we want to restore, right?
Thanks,
Pavel
More information about the CRIU
mailing list