[CRIU] CRIU and Open MPI

Jeff Squyres (jsquyres) jsquyres at cisco.com
Tue Feb 18 10:18:11 PST 2014


On Feb 18, 2014, at 11:58 AM, Adrian Reber <adrian at lisas.de> wrote:

>> (00.034116) Dumping path for -3 fd via self 35 [/tmp/openmpi-sessions-adrian at dcbz_0/54685/1/shared_mem_pool.dcbz (deleted)]
>> (00.034120) Dumping ghost file for fd 35 id 0x17
>> (00.034123) Error (files-reg.c:305): Can't dump ghost file /tmp/openmpi-sessions-adrian at dcbz_0/54685/1/shared_mem_pool.dcbz (deleted) of 67108872 size  
> 
> As far as I understand it this shared memory is used for the
> communication between the involved components in Open MPI.

Correct.  That is an mmap'ed file that we attached to multiple processes on the same server and use it for shared memory-based communication.  

Note that mmap isn't the only mechanism that OMPI uses for shared memory -- it may also use sysv or posix shared memory, too.  Specifically, we choose which mechanism to use during the MPI_INIT function call (which does all the setup/initialization), and then use that consistently for the rest of the life of the process.  IIRC, mmap is the default on Linux.

> Jeff, the checkpoint fails with --np > 1 because criu trips over the
> deleted file (see above) can you tell us the meaning of this file.


Also IIRC (I'll have to check to be sure), I believe we mmap attach the file to all MPI processes on the same server, and then we unlink the file.  Hence, all the MPI processes are sharing memory, but there's no leftover remnant in the filesystem.  I think there was a performance benefit to this (i.e., the filesystem wasn't actually trying to continually write out modifications to the file), and there's also the benefit that if the MPI process crashes (think: development and debugging), there's no giant file left in the filesystem.

-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/




More information about the CRIU mailing list