[Devel] Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation
Oren Laadan
orenl at cs.columbia.edu
Wed Mar 18 02:15:05 PDT 2009
An alternative: the task that created the container namely, is the parent
(outside the container) of the container init(1). In turn, init(1) creates
a special 'monitor' thread that monitors the restart, and the outside task
reaps the exit status of that thread (and only that thread).
[Hmmm... thinking about this - what happens if the container init(1) calls
clone() with CLONE_PARENT ?? does it not generate sort of a competing
container init(1) ??!!
Oren.
Cedric Le Goater wrote:
>> Again, how would 'cr' obtain exit status for these tasks, and how would
>> it distinguish failure from normal operation?
>
> Here's our solution to this issue.
>
> mcr maintains in its kernel container object an exitcode attribute for
> the mcr-restart process. This process is detached from the fork tree of
> the restarted application.
>
> when the restart is finished, an mcr-wait command can be called to reap
> this exitcode. This make it possible to distinguish an exit of the
> application process from an exit of the mcr-restart process.
>
> This is a must-have for batch managers in an HPC environment.
>
> Cheers,
>
> C.
>
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list