[Devel] Re: [PATCH RFC] Send checkpoint and restart debug info to a log file (v2)
Serge E. Hallyn
serue at us.ibm.com
Mon Oct 26 14:52:38 PDT 2009
Quoting Oren Laadan (orenl at librato.com):
>
>
> Matt Helsley wrote:
> > On Wed, Oct 21, 2009 at 07:51:57PM -0500, Serge E. Hallyn wrote:
> >> Quoting Oren Laadan (orenl at librato.com):
> >
> > <snip>
> >
> >>>> More practically, requiring userspace to pass over a flag
> >>>> consisting of CKPT_DBG_MEM|CKPT_DBG|FILE|CKPT_DBG|TASK, and
> >>>> handle corresponding usage flags, is not nice.
> >>> I agree with you on about this. Maybe we want a better
> >>> interface ?
> >>>
> >>> Which brings me to this random thought: maybe we want to
> >>> make the fourth argument of sys_{checkpoint,restart} a
> >>> structure, to make it easier to extend it in the future
> >>> without having to go throw a clone3-like hell...
> >
> > Adding new kernel interfaces is supposed to be somewhat hellish.
> >
> >>> Specifically, this structure could now be:
> >>>
> >>> struct ckpt_args {
> >>> int version;
> >>> int logfd;
> >>> int logmask;
> >>> };
> >>>
> >>> (or use union checkpoint {} and union restart {} to tell
> >>> between checkpoint- and restart-related args.
> >> Well I don't like passing structs to the kernel actually (and
> >
> > Let's not do this. I agree that passing structs, when unnecessary,
> > is gross. Especially if it gets used to extend the arguments
> > passed via the syscall interface (new flag values I don't mind).
>
> Ok, we already allow future extension by being strict about
> which flags are taken or not.
>
> Then what do we do with logmask ? I prefer it to be a per-syscall
> value as opposed to a system-wise setting.
Ok well let's be honest with ourselves - we'd like to give end-users
reasonable debug info to figure out common problems. But we will
always have cases where we have to hand-instrument the kernel for
our own advanced debugging. Cases where you must print out only a
specific info to catch a race condition probably will fit into
such a case. So I'd prefer to aim for giving the end-user just one
or a few options - nothing, errors only, or all debugging info that
we care to insert in everyday kernels (which hopefully isn't too
much, but let's a user figure out approximately where the problem
happened, i.e. recreating a socket, opening a file, etc). Otherwise
too much debugging will (a) deter from the readability of the code
(b) bulk up the kernel and (c) make the api cumberson (too many
flags to checkpoint/restart userspace programs, and passing structs
to syscall).
In fact I'm wondering whether for the shipped c/r kernel we want
to have only ckpt_error()s and make sure we record the location in
the restart file where we errored out. The ckpt_write_err() info
is already very helpful in tracking down checkpoint errors. Of course
this becomes a pain whenever we introduce a subte bug and have to
hand-instrument everything. And then I have to curse all the printks
I have to add. But it's not reasonabe to have every other line of
c/r code be a debug statement either.
-serge
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list