[Devel] Re: [PATCH RFC] Send checkpoint and restart debug info to a log file (v2)

Mon Oct 26 14:52:38 PDT 2009

Quoting Oren Laadan (orenl at librato.com):
> 
> 
> Matt Helsley wrote:
> > On Wed, Oct 21, 2009 at 07:51:57PM -0500, Serge E. Hallyn wrote:
> >> Quoting Oren Laadan (orenl at librato.com):
> > 
> > <snip>
> > 
> >>>> More practically, requiring userspace to pass over a flag
> >>>> consisting of CKPT_DBG_MEM|CKPT_DBG|FILE|CKPT_DBG|TASK, and
> >>>> handle corresponding usage flags, is not nice.
> >>> I agree with you on about this. Maybe we want a better
> >>> interface ?
> >>>
> >>> Which brings me to this random thought: maybe we want to
> >>> make the fourth argument of sys_{checkpoint,restart} a
> >>> structure, to make it easier to extend it in the future
> >>> without having to go throw a clone3-like hell...
> > 
> > Adding new kernel interfaces is supposed to be somewhat hellish.
> > 
> >>> Specifically, this structure could now be:
> >>>
> >>> struct ckpt_args {
> >>> 	int version;
> >>> 	int logfd;
> >>> 	int logmask;
> >>> };
> >>>
> >>> (or use union checkpoint {} and union restart {} to tell
> >>> between checkpoint- and restart-related args.
> >> Well I don't like passing structs to the kernel actually (and
> > 
> > Let's not do this. I agree that passing structs, when unnecessary,
> > is gross. Especially if it gets used to extend the arguments
> > passed via the syscall interface (new flag values I don't mind).
> 
> Ok, we already allow future extension by being strict about
> which flags are taken or not.
> 
> Then what do we do with logmask ?   I prefer it to be a per-syscall
> value as opposed to a system-wise setting.

Ok well let's be honest with ourselves - we'd like to give end-users
reasonable debug info to figure out common problems.  But we will
always have cases where we have to hand-instrument the kernel for
our own advanced debugging.  Cases where you must print out only a
specific info to catch a race condition probably will fit into
such a case.  So I'd prefer to aim for giving the end-user just one
or a few options - nothing, errors only, or all debugging info that
we care to insert in everyday kernels (which hopefully isn't too
much, but let's a user figure out approximately where the problem
happened, i.e. recreating a socket, opening a file, etc).  Otherwise
too much debugging will (a) deter from the readability of the code
(b) bulk up the kernel and (c) make the api cumberson (too many
flags to checkpoint/restart userspace programs, and passing structs
to syscall).

In fact I'm wondering whether for the shipped c/r kernel we want
to have only ckpt_error()s and make sure we record the location in
the restart file where we errored out.  The ckpt_write_err() info
is already very helpful in tracking down checkpoint errors.  Of course
this becomes a pain whenever we introduce a subte bug and have to
hand-instrument everything.  And then I have to curse all the printks
I have to add.  But it's not reasonabe to have every other line of
c/r code be a debug statement either.

-serge
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers