[Devel] Re: [RFC][PATCH 2/2] CR: handle a single task with private memory maps
Joseph Ruscio
jruscio at evergrid.com
Wed Aug 6 09:15:46 PDT 2008
On Aug 5, 2008, at 9:23 AM, Dave Hansen wrote:
> On Mon, 2008-08-04 at 20:51 -0700, Joseph Ruscio wrote:
>> It might be desirable for the checkpointing implementation to be
>> modular enough that a userspace application or library could select
>> to
>> handle certain resources on their own. Memory is the primary one that
>> comes to mind.
>
> How would you propose making it modular?
>
> -- Dave
>
Well it seems to me that the initial focus here is in live migration
of traditional enterprise applications, e.g. databases, app-servers,
etc. I think this is the right focus given how much utility the
general enterprise is finding in capabilities like VMotion. Providing
this mobility to applications without the overhead of traditional VM's
would be very valuable.
On the other hand I've been primarily focused in checkpointing large-
scale MPI jobs to provide fault tolerance, and that use-case is
somewhat different then the live-migration one. These checkpoints have
huge RAM footprints (in-core checkpointing is not an option), require
coordination across large numbers of servers, some number of open
files on an enormous parallel filesystem, and some scratch files open
on the local disk/ramdisk. They generally have very simple process
trees with one process per core, or one process with a thread for each
core.
To support these kinds of jobs, one would ideally instruct the
Container checkpointer to ignore network resources, dynamically
allocated private memory, and the contents of open files. You'd be
relying on the Container checkpointer to recreate processes, open file
descriptors, threads, thread synchronization primitives, IPC
mechanisms (including shm).
As far as the mechanism is concerned, I'd defer to the more
experienced kernel developers here. I assume that passing a bitmask of
flags as an argument into the checkpoint syscall would be frowned
upon, and anyways redundant, as its unlikely that the mask would
change within a container from checkpoint to checkpoint. If each
container is going to have a CGroup filesystem directory, then we
could have a file(s) along the lines of /proc/sys/kernel/
randomize_va_space that turn features off for that Container. The
default settings after Container creation would be a complete in-
kernel checkpoint/migration.
thanks,
Joe
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list