[CRIU] criu_restore() in Open MPI problems

Adrian Reber adrian at lisas.de
Tue Apr 29 04:30:06 PDT 2014


On Thu, Apr 10, 2014 at 09:13:46PM +0400, Pavel Emelyanov wrote:
> On 04/10/2014 07:36 PM, Adrian Reber wrote:
> > On Wed, Apr 09, 2014 at 07:17:15PM +0400, Pavel Emelyanov wrote:
> >>>> I think the restore scheme should look like this:
> >>>> We run orterun, which prepare pipes and executes "CRIU restore".
> >>>> The OpenMPI plugin takes preparate pipes and restores them in a proper
> >>>> file descriptors.
> >>>
> >>> It took me a while, but now I tried to restore a process from
> >>> orterun/mpirun with exec()ing 'criu restore' 
> >>
> >> Thanks for doing this! :)
> >>
> >>> I still get this error:
> >>>
> >>> (00.030652)   4277: tty: open type pts id 0x2 index 14 (master 0 sid 0 pgrp 0 inherit 1)
> >>> (00.030654)   4277: Error (tty.c:541): tty: Can't dup SELF_STDIN_OFF: Bad file descriptor
> >>> (00.031052) Error (cr-restore.c:1035): 4277 exited, status=255
> >>> (00.031072) Error (cr-restore.c:1577): Restoring FAILED.
> >>>
> >>> You are talking about a plugin which restores the missing pipes. How
> >>> would such a plugin have to look like? Do you have any examples on
> >>> re-creating pipes in a plugin?
> >>
> >> Well, we don't have the pipe-restoring plugins. But we have examples of
> >> unknown file descriptors plugins at test/zdtm/live/static/criu-rtc.c. This
> >> one sits on the cr_plugin_dump_file/cr_plugin_restore_file hooks. I think
> >> it's perfectly possible to add needed for pipes.
> >>
> >> You might also be interested in this mailing thread:
> >> http://lists.openvz.org/pipermail/criu/2014-March/012929.html
> >>
> >> In it Cyrill tried to provide hooks for intercepting dump of TCP sockets.
> >> Intercepting of pipes dump should probably look similar.
> >>
> >> BTW, since pipes we're talking about are really external (one end sits
> >> outside of our dump tree) I think we should detect this fact and call
> >> plugins for external pipes.
> >>
> >> Hope that helps :) If you need more info -- feel free to ask.
> > 
> > Another thought. I am running mpirun which starts opal-restart which
> > execvp()'s 'criu restore'. opal-restart's FDs (stdout/stderr) are
> > already connected to the correct pipes of mpirun. Is there a way for
> > criu to inherit those FDs. So instead of re-opening stdout/stderr it
> > just uses the FDs 'criu restore' already uses? So maybe the plugin just
> > detects that at the same FD there is already a pipe and just re-uses
> > that pipe?
> 
> I think that plugin should detect that we're having a PIPE via environment
> and use one on restore. What do you think, would that be possible?

I am now exporting the IDs of the pipes from Open MPI and can read
it in criu via getenv().

I see pipes.c:305 open_pipe() in which criu wants to restore the
pipes with the old ID.. Would it be enough to replace pi->pe->pipe_id
with the new IDs of the pipes or does it require more. At what place
would a plugin need to be placed to detect that an environment variable
dos exist and replace the old pipe IDs with the new pipe IDs specified
in the environment variable.

		Adrian


More information about the CRIU mailing list