[Devel] Re: [PATCH RFC] Send checkpoint and restart debug info to a log file (v2)

Mon Oct 26 16:39:10 PDT 2009

Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl at librato.com):
>>
>> Matt Helsley wrote:
>>> On Wed, Oct 21, 2009 at 07:51:57PM -0500, Serge E. Hallyn wrote:
>>>> Quoting Oren Laadan (orenl at librato.com):
>>> <snip>
>>>
>>>>>> More practically, requiring userspace to pass over a flag
>>>>>> consisting of CKPT_DBG_MEM|CKPT_DBG|FILE|CKPT_DBG|TASK, and
>>>>>> handle corresponding usage flags, is not nice.
>>>>> I agree with you on about this. Maybe we want a better
>>>>> interface ?
>>>>>
>>>>> Which brings me to this random thought: maybe we want to
>>>>> make the fourth argument of sys_{checkpoint,restart} a
>>>>> structure, to make it easier to extend it in the future
>>>>> without having to go throw a clone3-like hell...
>>> Adding new kernel interfaces is supposed to be somewhat hellish.
>>>
>>>>> Specifically, this structure could now be:
>>>>>
>>>>> struct ckpt_args {
>>>>> 	int version;
>>>>> 	int logfd;
>>>>> 	int logmask;
>>>>> };
>>>>>
>>>>> (or use union checkpoint {} and union restart {} to tell
>>>>> between checkpoint- and restart-related args.
>>>> Well I don't like passing structs to the kernel actually (and
>>> Let's not do this. I agree that passing structs, when unnecessary,
>>> is gross. Especially if it gets used to extend the arguments
>>> passed via the syscall interface (new flag values I don't mind).
>> Ok, we already allow future extension by being strict about
>> which flags are taken or not.
>>
>> Then what do we do with logmask ?   I prefer it to be a per-syscall
>> value as opposed to a system-wise setting.
> 
> Ok well let's be honest with ourselves - we'd like to give end-users
> reasonable debug info to figure out common problems.  But we will
> always have cases where we have to hand-instrument the kernel for
> our own advanced debugging.  Cases where you must print out only a
> specific info to catch a race condition probably will fit into
> such a case.  So I'd prefer to aim for giving the end-user just one
> or a few options - nothing, errors only, or all debugging info that
> we care to insert in everyday kernels (which hopefully isn't too
> much, but let's a user figure out approximately where the problem
> happened, i.e. recreating a socket, opening a file, etc).  Otherwise

I'm ok with this nothing-error-all scheme.

Two questions are:

What's "reasonable debug info" ?

How does it differ than looking at the image (checkpoint/restart) at
the specific file position and error message ?

> too much debugging will (a) deter from the readability of the code
> (b) bulk up the kernel and (c) make the api cumberson (too many
> flags to checkpoint/restart userspace programs, and passing structs
> to syscall).
> 
> In fact I'm wondering whether for the shipped c/r kernel we want
> to have only ckpt_error()s and make sure we record the location in
> the restart file where we errored out.  The ckpt_write_err() info
> is already very helpful in tracking down checkpoint errors.  Of course
> this becomes a pain whenever we introduce a subte bug and have to
> hand-instrument everything.  And then I have to curse all the printks
> I have to add.  But it's not reasonabe to have every other line of
> c/r code be a debug statement either.

So you suggest ckpt_error() and ckpt_debug(). Fair enough.

Yet I'm not willing to give up my existing (too noisy for the user)
debug statements for a chance to curse at their removal.

So I suggest also _ckpt_debug(), only with CONFIG_CHECKPOINT_DEBUG.
They are there for developers, they can be used by a user who is
having an issue and works with a developer, and they prevent Serge
from cursing unnecessarily :p

(or can change the naming:  ckpt_error, ckpt_log, ckpt_debug, etc).

Oren.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers