[Devel] Re: [RFC][PATCH 2/2] CR: handle a single task with private memory maps
Louis Rilling
Louis.Rilling at kerlabs.com
Thu Aug 7 02:29:43 PDT 2008
On Wed, Aug 06, 2008 at 09:15:46AM -0700, Joseph Ruscio wrote:
>
> On Aug 5, 2008, at 9:23 AM, Dave Hansen wrote:
>
>> On Mon, 2008-08-04 at 20:51 -0700, Joseph Ruscio wrote:
>>> It might be desirable for the checkpointing implementation to be
>>> modular enough that a userspace application or library could select
>>> to
>>> handle certain resources on their own. Memory is the primary one that
>>> comes to mind.
>>
>> How would you propose making it modular?
>>
>> -- Dave
>>
>
>
> Well it seems to me that the initial focus here is in live migration of
> traditional enterprise applications, e.g. databases, app-servers, etc. I
> think this is the right focus given how much utility the general
> enterprise is finding in capabilities like VMotion. Providing this
> mobility to applications without the overhead of traditional VM's would
> be very valuable.
>
> On the other hand I've been primarily focused in checkpointing large-
> scale MPI jobs to provide fault tolerance, and that use-case is somewhat
> different then the live-migration one. These checkpoints have huge RAM
> footprints (in-core checkpointing is not an option), require
> coordination across large numbers of servers, some number of open files
> on an enormous parallel filesystem, and some scratch files open on the
> local disk/ramdisk. They generally have very simple process trees with
> one process per core, or one process with a thread for each core.
>
> To support these kinds of jobs, one would ideally instruct the Container
> checkpointer to ignore network resources, dynamically allocated private
> memory, and the contents of open files. You'd be relying on the Container
> checkpointer to recreate processes, open file descriptors, threads,
> thread synchronization primitives, IPC mechanisms (including shm).
>
> As far as the mechanism is concerned, I'd defer to the more experienced
> kernel developers here. I assume that passing a bitmask of flags as an
> argument into the checkpoint syscall would be frowned upon, and anyways
> redundant, as its unlikely that the mask would change within a container
> from checkpoint to checkpoint. If each container is going to have a
> CGroup filesystem directory, then we could have a file(s) along the lines
> of /proc/sys/kernel/randomize_va_space that turn features off for that
> Container. The default settings after Container creation would be a
> complete in-kernel checkpoint/migration.
Did you think about mechanisms/interfaces making the kernel's checkpointing
sub-system and the application/run-time interact to efficiently build the
checkpoint image and restart from it?
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.openvz.org/pipermail/devel/attachments/20080807/e6a86eef/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list