[CRIU] CRIU <---> OpenMPI

Jeff Squyres (jsquyres) jsquyres at cisco.com
Mon Nov 4 14:16:54 PST 2013


> From: Pavel Emelyanov [mailto:xemul at parallels.com] 
> 
>> 	"Support one of MPI implementations  - worth starting with OpenMPI"
>> Is anyone actively working on this?
> 
> Right now -- no :( At least in Parallels.

Greetings.  I'm one of the developers of Open MPI.  

FWIW: a user asked about CRIU integration on the Open MPI mailing lists a while ago, but I am not aware of anyone working on it.

> I've only heard from several people, that work with OpenMPI that they have plans to participate in CRIU development with this, but nothing more.

Can I ask who you talked to?  As I said, I'm unaware of anyone working on CRIU integration with Open MPI, but my knowledge is certainly not absolute.  :-)

>> In the recent LinuxCon/Plumbers conferences Pavel mentioned a few 
>> times the HPC periodic snapshot use case. Is this use case related to 
>> the above TODO item?
> 
> As I see it -- yes. When we do periodic snapshot we would have to take the MPI context with us. But how should it look -- is to be found out.

Let me give a little background for checkpoint/restart support in Open MPI...

Capturing the MPI state is pretty difficult, especially in the presents of hardware-offload networks.  Meaning: the state is not completely discoverable in software; there's a non-trivial amount of state in hardware, too.  (...skipping a much larger discussion about this kind of stuff; I can explain if anyone cares...)

I forget what version introduced general checkpoint/restart support for parallel Open MPI jobs, but it's been available for quite a while.  

Open MPI is fundamentally based on plugins, and support for the underlying checkpoint service is no exception.  Open MPI has two existing plugins (with a third close to completion): for the BLCR checkpointer (http://crd.lbl.gov/groups-depts/ftg/projects/current-projects/BLCR), "self" (basically for applications that can do self-checkpointing), and DMTCP (http://dmtcp.sourceforge.net/). 

Adding support for CRIU to Open MPI is hypothetically pretty simple: just add a "crs" plugin for CRIU to Open MPI.  The API that a CRS plugin has to support is relatively simple -- the main work is basically calling the underlying "checkpoint me now" functionality, and then providing a hook for when the process is restarted.

To be clear: all the other infrastructure for saving and restoring the MPI state is already provided by Open MPI (even for hardware-offload networks).  That infrastructure basically calls the CRS plugin to actually do the checkpoint, come back from the restore, ...and a few other miscellaneous things.

All that being said, we did a fairly major architecture revamp in our SVN development trunk (and v1.7 release branch) recently for one of Open MPI's main subsystems, and the checkpoint/restart infrastructure is currently flat-out broken.  :-(  It's on the to-do list to fix, but it's going to take a little while.  

Hence, adding a new CRIU CRS plugin would need to be done on an older version of Open MPI -- th 1.6.x series.  But the good news is that forward porting a CRIU CRS plugin from v1.6 to Open MPI's dev trunk/v1.7 branch is pretty straightforward (I'd even volunteer to help with that).

-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/




More information about the CRIU mailing list