[Devel] Re: [RFC][PATCH 2/2] CR: handle a single task with private memory maps
Serge E. Hallyn
serue at us.ibm.com
Thu Jul 31 14:25:35 PDT 2008
Quoting Oren Laadan (orenl at cs.columbia.edu):
>
>
> Serge E. Hallyn wrote:
>> Quoting Oren Laadan (orenl at cs.columbia.edu):
>>> +int do_checkpoint(struct cr_ctx *ctx)
>>> +{
>>> + int ret;
>>> +
>>> + /* FIX: need to test whether container is checkpointable */
>>> +
>>> + ret = cr_write_hdr(ctx);
>>> + if (!ret)
>>> + ret = cr_write_task(ctx, current);
>>> + if (!ret)
>>> + ret = cr_write_tail(ctx);
>>> +
>>> + /* on success, return (unique) checkpoint identifier */
>>> + if (!ret)
>>> + ret = ctx->crid;
>>
>> Does this crid have a purpose?
>
> yes, at least three; both are for the future, but important to set the
> meaning of the return value of the syscall already now. The "crid" is
> the CR-identifier that identifies the checkpoint. Every checkpoint is
> assigned a unique number (using an atomic counter).
>
> 1) if a checkpoint is taken and kept in memory (instead of to a file) then
> this will be the identifier with which the restart (or cleanup) would refer
> to the (in memory) checkpoint image
>
> 2) to reduce downtime of the checkpoint, data will be aggregated on the
> checkpoint context, as well as referenced to (cow-ed) pages. This data can
> persist between calls to sys_checkpoint(), and the 'crid', again, will be
> used to identify the (in-memory-to-be-dumped-to-storage) context.
>
> 3) for incremental checkpoint (where a successive checkpoint will only
> save what has changed since the previous checkpoint) there will be a need
> to identify the previous checkpoints (to be able to know where to take
> data from during restart). Again, a 'crid' is handy.
>
> [in fact, for the 3rd use, it will make sense to write that number as
> part of the checkpoint image header]
>
> Note that by doing so, a process that checkpoints itself (in its own
> context), can use code that is similar to the logic of fork():
>
> ...
> crid = checkpoint(...);
> switch (crid) {
> case -1:
> perror("checkpoint failed");
> break;
> default:
> fprintf(stderr, "checkpoint succeeded, CRID=%d\n", ret);
> /* proceed with execution after checkpoint */
> ...
> break;
> case 0:
> fprintf(stderr, "returned after restart\n");
> /* proceed with action required following a restart */
> ...
> break;
> }
> ...
Thanks - for this and the later explanations in replies to Louis.
Really I had no doubt it had a purpose :) but wasn't sure what it was.
Quite clear now. Thanks.
-serge
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list