[CRIU] CRIU <---> OpenMPI
Jeff Squyres (jsquyres)
jsquyres at cisco.com
Mon Nov 4 14:16:54 PST 2013
> From: Pavel Emelyanov [mailto:xemul at parallels.com]
>
>> "Support one of MPI implementations - worth starting with OpenMPI"
>> Is anyone actively working on this?
>
> Right now -- no :( At least in Parallels.
Greetings. I'm one of the developers of Open MPI.
FWIW: a user asked about CRIU integration on the Open MPI mailing lists a while ago, but I am not aware of anyone working on it.
> I've only heard from several people, that work with OpenMPI that they have plans to participate in CRIU development with this, but nothing more.
Can I ask who you talked to? As I said, I'm unaware of anyone working on CRIU integration with Open MPI, but my knowledge is certainly not absolute. :-)
>> In the recent LinuxCon/Plumbers conferences Pavel mentioned a few
>> times the HPC periodic snapshot use case. Is this use case related to
>> the above TODO item?
>
> As I see it -- yes. When we do periodic snapshot we would have to take the MPI context with us. But how should it look -- is to be found out.
Let me give a little background for checkpoint/restart support in Open MPI...
Capturing the MPI state is pretty difficult, especially in the presents of hardware-offload networks. Meaning: the state is not completely discoverable in software; there's a non-trivial amount of state in hardware, too. (...skipping a much larger discussion about this kind of stuff; I can explain if anyone cares...)
I forget what version introduced general checkpoint/restart support for parallel Open MPI jobs, but it's been available for quite a while.
Open MPI is fundamentally based on plugins, and support for the underlying checkpoint service is no exception. Open MPI has two existing plugins (with a third close to completion): for the BLCR checkpointer (http://crd.lbl.gov/groups-depts/ftg/projects/current-projects/BLCR), "self" (basically for applications that can do self-checkpointing), and DMTCP (http://dmtcp.sourceforge.net/).
Adding support for CRIU to Open MPI is hypothetically pretty simple: just add a "crs" plugin for CRIU to Open MPI. The API that a CRS plugin has to support is relatively simple -- the main work is basically calling the underlying "checkpoint me now" functionality, and then providing a hook for when the process is restarted.
To be clear: all the other infrastructure for saving and restoring the MPI state is already provided by Open MPI (even for hardware-offload networks). That infrastructure basically calls the CRS plugin to actually do the checkpoint, come back from the restore, ...and a few other miscellaneous things.
All that being said, we did a fairly major architecture revamp in our SVN development trunk (and v1.7 release branch) recently for one of Open MPI's main subsystems, and the checkpoint/restart infrastructure is currently flat-out broken. :-( It's on the to-do list to fix, but it's going to take a little while.
Hence, adding a new CRIU CRS plugin would need to be done on an older version of Open MPI -- th 1.6.x series. But the good news is that forward porting a CRIU CRS plugin from v1.6 to Open MPI's dev trunk/v1.7 branch is pretty straightforward (I'd even volunteer to help with that).
--
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
More information about the CRIU
mailing list