[Devel] Re: [RFC v14][PATCH 00/54] Kernel based checkpoint/restart

Matthieu Fertré matthieu.fertre at kerlabs.com
Mon May 4 01:06:19 PDT 2009


Hi,

Louis Rilling a écrit :
> On 29/04/09 18:47 -0400, Oren Laadan wrote:
>> Hi Louis,
>>
>> Louis Rilling wrote:
>>> Hi,
>>>
>>> On 28/04/09 19:23 -0400, Oren Laadan wrote:
>>>> Here is the latest and greatest of checkpoint/restart (c/r) patchset.
>>>> The logic and image format reworked and simplified, code refactored,
>>>> support for PPC, s390, sysvipc, shared memory of all sorts, namespaces
>>>> (uts and ipc).
>>> I should have asked before, but what are the reasons to checkpoint SYSV IPCs
>>> in the same file/stream as tasks? Would it be better to checkpoint them
>>> independently, like the file system state?
>>>
>>> In Kerrighed we chose to checkpoint SYSV IPCs independently, a bit like the file
>>> system state, because SYSV IPCs objects' lifetime do not depend on tasks
>>> lifetime, and we can gain more flexibility this way. In particular we envision
>>> cases in which two applications share a state in a SYSV SHM (something like a
>>> producer-consumer scheme), but do not need to be checkpointed together. In such
>>> a case the SYSV SHM itself could even need more high-availability (using
>>> active replication) than a checkpoint/restart facility.
>>>
>> Thanks for the feedback, this is actually an interesting idea.
>>
>> Indeed in the past I also considered SYSV IPC to be a "global" resource
>> that was checkpointed before iterating through the tasks.
>>
>> However, in the presence of namespaces, the lifetime of an IPC namespace
>> does depend on on tasks lifetime - when the last task referring to a
>> given namespace exits - that namespace is destroyed. Of course, the
>> root namespace is truly global, because init(1) never exits.
>>
>> What would 'checkpoint them independently' mean in this case ?
> 
> I mean that the producer and the consumer could have separate checkpointing
> policies (if any), and the IPC SHM as well.
> 
>> In your use-case, can you restart either application without first
>> restoring the relevant SYSVIPC ?
> 
> Probably not.
> 

Well, it depends. It has no sense to restart the application without
restoring the relevant SHM but it may have for a message queue (this is
application specific of course). Message queue is not linked to the
process, it can disappear during the life of the application.

>> Can you think of other use-cases for such a division ?  Am I right to
>> guess that your use case is specific to the distributed (and SSI-)
>> nature of your system ?  (Active-replication of SYSV_SHM sounds
>> awfully related to DSM :)
> 
> The case of active-replication may be specific to DSM-based systems, but the
> case of independent policies is already interesting in standalone boxes.
> 
>>
>> While not focusing on such use cases, I want to keep the design flexible
>> enough to not exclude them a-priori, and be able to address them later
>> on. Indeed, the code is split such that the the function to save a given
>> IPC namespace does not depend on the task that uses it. Future code
>> could easily use the same functionality.
>>
>> One way to be flexible to support your use case, is by having some
>> mechanism in place to select whether a resource (virtually any) is
>> to be chekcpointed/restored.
>>
>> For example, you could imagine checkpoint(..., CHECKPOINT_SYSVIPC)
>> to checkpoint (also) IPC, and not checkpoint IPC in its absence.
>>
>> So normally you'd have checkpoint(..., CHECKPOINT_ALL). When you don't
>> want IPC, you'd use CHECKPOINT_ALL & ~CHECKPOINT_SYSVIPC. When you
>> want only IPC, you'd use CHECKPOINT_SYSVIPC only.
>>
>> Same thing for restart, only that it will get trickier in the "only IPC"
>> case, since you will need to tell which IPC namespace is affected.
>>
>> Also, I envision a task saying cradvise(CHECKPOINT_SYSVIPC, false),
>> telling the kernel to not c/r its IPC namespace. (Or any other
>> resource). Again there would need to be a way to add a restored
>> namespace.
>>
>> Does this address your concerns ?
> 
> Yes this sounds flexible enough. Thanks for taking this into account.

I see one drawback with this approach if you allow checkpoint of
application that is not isolated in a container. In that case, you may
want to select which IPC objects to dump to not dump all the IPC objects
living in the system. Indeed, this is why we have chosen in Kerrighed to
checkpoint IPC objects independently of tasks, since we have no
container/namespaces support currently.

Regards,

Matthieu

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openvz.org/pipermail/devel/attachments/20090504/eca8e3e6/attachment.sig>
-------------- next part --------------
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers


More information about the Devel mailing list