[Devel] Re: [RFC][PATCH 2/3][cr][v2]: Checkpoint/restart file leases
Matt Helsley
matthltc at us.ibm.com
Fri Jul 30 18:36:53 PDT 2010
On Fri, Jul 30, 2010 at 05:35:04PM -0700, Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl at cs.columbia.edu] wrote:
> >
> >
> > john stultz wrote:
> >> On Fri, 2010-07-30 at 15:45 -0400, Oren Laadan wrote:
> >>> Sukadev Bhattiprolu wrote:
> >>>> Oren Laadan [orenl at cs.columbia.edu] wrote:
> >>>> | | | > h->fl_type = lock->fl_type;
> >>>> | > + h->fl_type_prev = lock->fl_type_prev;
> >>>> | > h->fl_flags = lock->fl_flags;
> >>>> | > + if (h->fl_type & F_INPROGRESS && | >
> >>>> + (lock->fl_break_time > jiffies))
> >>>> | > + h->fl_rem_lease = (lock->fl_break_time - jiffies) / HZ;
> >>>> | | Hmmm -- we have a ctx->ktime_begin marking the start of the
> >>>> checkpoint.
> >>>> | It is used for relative-time calculations, for example, the expiry of
> >>>> | restart-blocks and timeouts. I suggest that we use it here too to be
> >>>> | consistent.
> >>>>
> >>>> Well, I started off using ktime_begin but discussed this with John Stultz
> >>>> (CC'd here) who pointed out that mixing different domains of time is likely
> >>>> to cause errors. ktime is an absolute time and fl_break_time uses jiffies -
> >>>> a relative time.
> >>>>
> >>>> I think use of ktime_begin for restart_blocks is fine (since they use
> >>>> ktime_t) but using ktime_t for file leases and converting between jiffies
> >>>> and nanoseconds could be a problem, unless we convert fl_break_time to
> >>>> seconds.
> >>>>
> >>>> IOW, can we leave the above computation of ->fl_rem_lease for now ?
> >>> The data on restart_blocks keep relative time - it's the the time
> >>> to expiry relative to ktime_begin (which is absolute).
> >>>
> >>> ktime_begin merely gives a reference point in time against which
> >>> all other time-related values should be saved. The advantage is
> >>> that all time computation are relative to the same point in time
> >>> at checkpoint/restart - the time when the ktime_begin is set. It's
> >>> more consistent this way.
> >>
> >> First, forgive me for not being very aware of the checkpoint/restart
> >> code.
> >
> > On the contrary, forgive me if I'm stating the obvious below ...
> >
> >>
> >> So, ktime_begin is an absolute CLOCK_MONOTONIC time, relative to the
> >> time the system booted (more or less). And it represents the checkpoint
> >> time, correct?
> >
> > As a rule, all time measurements in the checkpoint image are
> > saved as relative values, using the start-of-checkpoint as the
> > reference point in time (*).
> >
> > So at checkpoint, every absolute time value should be converted
> > to a value relative to the start-of-checkpoint. At restart, every
> > relative time value from the image is converted back (if needed)
> > to an absolute time value using a respective start-of-restart.
>
> One general observation, slightly off-topic. You mention
> "start-of-restart" here and ...
> >
> > This makes sense for the most common case, where if a process
> > had 5 more seconds to sleep at the time of checkpoint, we would
> > like it to have 5 more seconds to sleep after it restarts.
>
> ... "after it restarts" here. These two can be quite different if,
> as you mention below, the C/R is a lengthy operation.
Worse! I think the big concern is not the duration of checkpoint but the
amount of time userspace expects to store a checkpoint before
restarting.
It's an arbitrary amount of time. A user could restart 50 days later.
Or 100. etc. Sure, it's _unlikely_, but we (checkpoint/restart implementers)
shouldn't count on seeing only "reasonable" times between completion of
checkpoint and initiation of restart. We need to be robust for all
the times we see.
I think that's why Oren made the choices he did. Making times relative
to ktime_begin certainly helps keep the time values semi-sane for those
crazy arbitrary cases in addition to being nice for the "reasonable"
cases.
Cheers,
-Matt
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list