[CRIU] CRIU and Open MPI

Adrian Reber adrian at lisas.de
Tue Feb 18 08:58:34 PST 2014


On Tue, Feb 18, 2014 at 07:42:34PM +0400, Pavel Emelyanov wrote:
> On 02/18/2014 07:38 PM, Cyrill Gorcunov wrote:
> > On Tue, Feb 18, 2014 at 07:32:42PM +0400, Pavel Emelyanov wrote:
> >>
> >> This error means, that there's a file opened and unlinked and it weights 67Mb.
> >> Such files (opened and unlinked) should be copied to the images dir, since they
> >> will be removed from disk once tasks we dump die.
> >>
> >> I've put a ... magic constant :) limiting the size of such files to a couple of
> >> megs just not to put too big files into images, since such copying will take time.
> > 
> > Maybe we could pass a command line argument which would allow to set the size limit?
> 
> I want to say "no way", but I'd like to know more about this strange file first.

> (00.034116) Dumping path for -3 fd via self 35 [/tmp/openmpi-sessions-adrian at dcbz_0/54685/1/shared_mem_pool.dcbz (deleted)]
> (00.034120) Dumping ghost file for fd 35 id 0x17
> (00.034123) Error (files-reg.c:305): Can't dump ghost file /tmp/openmpi-sessions-adrian at dcbz_0/54685/1/shared_mem_pool.dcbz (deleted) of 67108872 size  

As far as I understand it this shared memory is used for the
communication between the involved components in Open MPI.

I am including Jeff in this discussion because he probably knows best.

Jeff, the checkpoint fails with --np > 1 because criu trips over the
deleted file (see above) can you tell us the meaning of this file.

		Adrian


More information about the CRIU mailing list