[Devel] Re: [PATCH 2/7] restart.c: use ckpt_err

Mon Nov 16 08:43:14 PST 2009

Quoting Oren Laadan (orenl at cs.columbia.edu):
> Is it your intent to entirely get rid of ckpt_debug() ?

Replace with a new ckpt_log(), yes.  I think that's too much to do
all at once so figured ckpt_err() in v19, then start adding ckpt_log(),
and converting callers one file at a time.

> 
> We originally discussed two levels of details: only error status
> or a detailed log (and we also thought of a detailed debug, that can
> be compiled out to save space). How does that fit with the patch(es) ?

FIts perfectly.  ckpt_err() always is dumped, ckpt_log() can be deemed
'informative' and optionally dumped.  When we implement it.

> To "define" what's "error status" and what's "log" (and maybe what's
> "debug"), I suggest a test like:
> 
> 1) error status: what conveys the most specific reason of failure,
> e.g. "failed to open file to restore fd";  The caller should be able
> to assume that the total message(s) length will not exceed a pipe
> buffer.
> 
> 2) log status: that gives status about progress, or what lead to and
> what followed an error, e.g. file open failure may have happened
> when restoring a file descriptor, or when restore a vma, so a log
> like "failed to restore vma" would be helpful.

I think 'failed to open file' should always be 'error', so that we
know which file failed to open.  If all we print is a generic
'failed to restore open files' then the user isn't much better off
than getting -EBADF for sys_restart().

> 3) debug status: that we want to be able to compile out without having
> to reintroduce it for every bug that it may help us debug.

<shrug>  This may be useful and good, but in any case starting
with just implementing (1) seemed like the most practical approach.
The patchset accomplishes getting rid of ckpt_write_err(), and sending
error messages to the user logfile, so I think it's plenty useful
without trying to do everything (with resulting in all the extra
patch churn).

> > diff --git a/checkpoint/restart.c b/checkpoint/restart.c
> > index 130b4b2..e1bd0ad 100644
> > --- a/checkpoint/restart.c
> > +++ b/checkpoint/restart.c
> > @@ -64,7 +64,7 @@ static int restore_debug_task(struct ckpt_ctx *ctx, int flags)
> >  
> >  	s = kmalloc(sizeof(*s), GFP_KERNEL);
> >  	if (!s) {
> > -		ckpt_debug("no memory to register ?!\n");
> > +		ckpt_err(ctx, 0, "no memory to register ?!\n");
> >  		return -ENOMEM;
> 
> What is the purpose in passing '0' instead of -ENOMEM to ckpt_err() ?
> (a few more instances below).

Hmm, I think that can pass errno now.  I probably had done that bc
originally ckpt_err() was going to do the restore_notify_error
too.

> Are you still concerned about the increase in code size with c/r ?

Yes, I am.  But our first priority should be to empower a user to
debug why a checkpoint or restart failed.  Once we're settled with
that, we can look at how to decrease code size.  Compiling out the
log and debug messages is fair game imo, but compiling out ckpt_err()
is not.  If users can't tell that checkpoint failed because they had
an unlinked file which used to be called .vimrc open, then I don't
think we can reasonably hope to get this upstream (as per previous
'toy implementation' arguments).

-serge
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers