[CRIU] CRIU and Open MPI

Pavel Emelyanov xemul at parallels.com
Tue Feb 18 10:52:55 PST 2014


On 02/18/2014 10:18 PM, Jeff Squyres (jsquyres) wrote:
> On Feb 18, 2014, at 11:58 AM, Adrian Reber <adrian at lisas.de> wrote:
> 
>>> (00.034116) Dumping path for -3 fd via self 35 [/tmp/openmpi-sessions-adrian at dcbz_0/54685/1/shared_mem_pool.dcbz (deleted)]
>>> (00.034120) Dumping ghost file for fd 35 id 0x17
>>> (00.034123) Error (files-reg.c:305): Can't dump ghost file /tmp/openmpi-sessions-adrian at dcbz_0/54685/1/shared_mem_pool.dcbz
> (deleted) of 67108872 size  
>>
>> As far as I understand it this shared memory is used for the
>> communication between the involved components in Open MPI.
> 
> Correct.  That is an mmap'ed file that we attached to multiple processes on the same server and use it for shared memory-based
> communication.  
> 
> Note that mmap isn't the only mechanism that OMPI uses for shared memory -- it may also use sysv or posix shared memory, too.
> Specifically, we choose which mechanism to use during the MPI_INIT function call (which does all the setup/initialization), and 
> then use that consistently for the rest of the life of the process.  IIRC, mmap is the default on Linux.
> 
>> Jeff, the checkpoint fails with --np > 1 because criu trips over the
>> deleted file (see above) can you tell us the meaning of this file.
> 
> 
> Also IIRC (I'll have to check to be sure), I believe we mmap attach the file to all MPI processes on the same server, and then we 
> unlink the file.  Hence, all the MPI processes are sharing memory, but there's no leftover remnant in the filesystem.  I think there
> was a performance benefit to this (i.e., the filesystem wasn't actually trying to continually write out modifications to the file),
> and there's also the benefit that if the MPI process crashes (think: development and debugging), there's no giant file left in the
> filesystem.

Ah, I see. Thanks for the explanation.

In that case we definitely shouldn't treat that file as ghost. Otherwise we would
detach that shared memory from the other holder. We should instead properly configure
the name remapping for that file on dump and attach it to proper shared memory back
on restore.

Thanks,
Pavel


More information about the CRIU mailing list