[Devel] Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

Oren Laadan orenl at cs.columbia.edu
Wed Mar 18 02:15:05 PDT 2009


An alternative: the task that created the container namely, is the parent
(outside the container) of the container init(1). In turn, init(1) creates
a special 'monitor' thread that monitors the restart, and the outside task
reaps the exit status of that thread (and only that thread).

[Hmmm... thinking about this - what happens if the container init(1) calls
clone() with CLONE_PARENT ??  does it not generate sort of a competing
container init(1) ??!!

Oren.


Cedric Le Goater wrote:
>> Again, how would 'cr' obtain exit status for these tasks, and how would
>> it distinguish failure from normal operation?
> 
> Here's our solution to this issue.
> 
> mcr maintains in its kernel container object an exitcode attribute for 
> the mcr-restart process. This process is detached from the fork tree of 
> the restarted application.  
> 
> when the restart is finished, an mcr-wait command can be called to reap 
> this exitcode. This make it possible to distinguish an exit of the 
> application process from an exit of the mcr-restart process.
> 
> This is a must-have for batch managers in an HPC environment. 
> 
> Cheers,
> 
> C.
> 
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list