[Devel] Re: [PATCH RFC] Send checkpoint and restart debug info to a log file (v2)
Oren Laadan
orenl at librato.com
Mon Oct 26 16:39:10 PDT 2009
Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl at librato.com):
>>
>> Matt Helsley wrote:
>>> On Wed, Oct 21, 2009 at 07:51:57PM -0500, Serge E. Hallyn wrote:
>>>> Quoting Oren Laadan (orenl at librato.com):
>>> <snip>
>>>
>>>>>> More practically, requiring userspace to pass over a flag
>>>>>> consisting of CKPT_DBG_MEM|CKPT_DBG|FILE|CKPT_DBG|TASK, and
>>>>>> handle corresponding usage flags, is not nice.
>>>>> I agree with you on about this. Maybe we want a better
>>>>> interface ?
>>>>>
>>>>> Which brings me to this random thought: maybe we want to
>>>>> make the fourth argument of sys_{checkpoint,restart} a
>>>>> structure, to make it easier to extend it in the future
>>>>> without having to go throw a clone3-like hell...
>>> Adding new kernel interfaces is supposed to be somewhat hellish.
>>>
>>>>> Specifically, this structure could now be:
>>>>>
>>>>> struct ckpt_args {
>>>>> int version;
>>>>> int logfd;
>>>>> int logmask;
>>>>> };
>>>>>
>>>>> (or use union checkpoint {} and union restart {} to tell
>>>>> between checkpoint- and restart-related args.
>>>> Well I don't like passing structs to the kernel actually (and
>>> Let's not do this. I agree that passing structs, when unnecessary,
>>> is gross. Especially if it gets used to extend the arguments
>>> passed via the syscall interface (new flag values I don't mind).
>> Ok, we already allow future extension by being strict about
>> which flags are taken or not.
>>
>> Then what do we do with logmask ? I prefer it to be a per-syscall
>> value as opposed to a system-wise setting.
>
> Ok well let's be honest with ourselves - we'd like to give end-users
> reasonable debug info to figure out common problems. But we will
> always have cases where we have to hand-instrument the kernel for
> our own advanced debugging. Cases where you must print out only a
> specific info to catch a race condition probably will fit into
> such a case. So I'd prefer to aim for giving the end-user just one
> or a few options - nothing, errors only, or all debugging info that
> we care to insert in everyday kernels (which hopefully isn't too
> much, but let's a user figure out approximately where the problem
> happened, i.e. recreating a socket, opening a file, etc). Otherwise
I'm ok with this nothing-error-all scheme.
Two questions are:
What's "reasonable debug info" ?
How does it differ than looking at the image (checkpoint/restart) at
the specific file position and error message ?
> too much debugging will (a) deter from the readability of the code
> (b) bulk up the kernel and (c) make the api cumberson (too many
> flags to checkpoint/restart userspace programs, and passing structs
> to syscall).
>
> In fact I'm wondering whether for the shipped c/r kernel we want
> to have only ckpt_error()s and make sure we record the location in
> the restart file where we errored out. The ckpt_write_err() info
> is already very helpful in tracking down checkpoint errors. Of course
> this becomes a pain whenever we introduce a subte bug and have to
> hand-instrument everything. And then I have to curse all the printks
> I have to add. But it's not reasonabe to have every other line of
> c/r code be a debug statement either.
So you suggest ckpt_error() and ckpt_debug(). Fair enough.
Yet I'm not willing to give up my existing (too noisy for the user)
debug statements for a chance to curse at their removal.
So I suggest also _ckpt_debug(), only with CONFIG_CHECKPOINT_DEBUG.
They are there for developers, they can be used by a user who is
having an issue and works with a developer, and they prevent Serge
from cursing unnecessarily :p
(or can change the naming: ckpt_error, ckpt_log, ckpt_debug, etc).
Oren.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list