[Devel] Re: [RFC][PATCH 2/3][cr][v2]: Checkpoint/restart file leases
Oren Laadan
orenl at cs.columbia.edu
Fri Jul 30 21:52:49 PDT 2010
On 07/30/2010 08:35 PM, Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl at cs.columbia.edu] wrote:
>>
>>
>> john stultz wrote:
>>> On Fri, 2010-07-30 at 15:45 -0400, Oren Laadan wrote:
>>>> Sukadev Bhattiprolu wrote:
>>>>> Oren Laadan [orenl at cs.columbia.edu] wrote:
>>>>> | | | > h->fl_type = lock->fl_type;
>>>>> | > + h->fl_type_prev = lock->fl_type_prev;
>>>>> | > h->fl_flags = lock->fl_flags;
>>>>> | > + if (h->fl_type & F_INPROGRESS && | >
>>>>> + (lock->fl_break_time > jiffies))
>>>>> | > + h->fl_rem_lease = (lock->fl_break_time - jiffies) / HZ;
>>>>> | | Hmmm -- we have a ctx->ktime_begin marking the start of the
>>>>> checkpoint.
>>>>> | It is used for relative-time calculations, for example, the expiry of
>>>>> | restart-blocks and timeouts. I suggest that we use it here too to be
>>>>> | consistent.
>>>>>
>>>>> Well, I started off using ktime_begin but discussed this with John Stultz
>>>>> (CC'd here) who pointed out that mixing different domains of time is likely
>>>>> to cause errors. ktime is an absolute time and fl_break_time uses jiffies -
>>>>> a relative time.
>>>>>
>>>>> I think use of ktime_begin for restart_blocks is fine (since they use
>>>>> ktime_t) but using ktime_t for file leases and converting between jiffies
>>>>> and nanoseconds could be a problem, unless we convert fl_break_time to
>>>>> seconds.
>>>>>
>>>>> IOW, can we leave the above computation of ->fl_rem_lease for now ?
>>>> The data on restart_blocks keep relative time - it's the the time
>>>> to expiry relative to ktime_begin (which is absolute).
>>>>
>>>> ktime_begin merely gives a reference point in time against which
>>>> all other time-related values should be saved. The advantage is
>>>> that all time computation are relative to the same point in time
>>>> at checkpoint/restart - the time when the ktime_begin is set. It's
>>>> more consistent this way.
>>>
>>> First, forgive me for not being very aware of the checkpoint/restart
>>> code.
>>
>> On the contrary, forgive me if I'm stating the obvious below ...
>>
>>>
>>> So, ktime_begin is an absolute CLOCK_MONOTONIC time, relative to the
>>> time the system booted (more or less). And it represents the checkpoint
>>> time, correct?
>>
>> As a rule, all time measurements in the checkpoint image are
>> saved as relative values, using the start-of-checkpoint as the
>> reference point in time (*).
>>
>> So at checkpoint, every absolute time value should be converted
>> to a value relative to the start-of-checkpoint. At restart, every
>> relative time value from the image is converted back (if needed)
>> to an absolute time value using a respective start-of-restart.
>
> One general observation, slightly off-topic. You mention
> "start-of-restart" here and ...
>>
>> This makes sense for the most common case, where if a process
>> had 5 more seconds to sleep at the time of checkpoint, we would
>> like it to have 5 more seconds to sleep after it restarts.
>
> ... "after it restarts" here. These two can be quite different if,
> as you mention below, the C/R is a lengthy operation.
>
> If application had 5 seconds remaining on the lease, and C/R takes
> more than 5 seconds, the application would have spent the remaining
> lease frozen. Of course trying to compute the values relative to the
> "end-of-restart" is theoretically impossible :-).
>
> Maybe the user can adjust the lease-expiry in the checkpoint-image
> to allow for this as well.
Heh ... you cannot pre-adjust it since you don't know how long
restart would take :)
It would certainly be better if we could restore all the expiry
times to be relative to the time when the restart _succeeds_ (and
just before all tasks resume execution), as opposed to when the
restart _begins_. Restart could take long time, and by the time it
concludes leases (and timers) could expire.
However, to implement such behavior, we'll need to keep a list of
expiry-values-to-adjust throughout the restart, then when restart
is about to complete, iterate through the list and adjust all of
them at once relative to the (near) end-time.
While this is possible, I don't think it is worth the additional
complexity in the code. Most restarts will not take that long.
Moreover, it can be viewed as a restart that succeeds quickly
but then the processes are frozen/stopped/descheduled for some
time so leases/timers expire.
The requirement that all expiries will be measured (on checkpoint
and restart) relative to the same point in time is very important
(while the above is nice-to-have).
Oren.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list