[Devel] Re: [PATCH 2/2] Add checkpoint/restart support for epoll files.
Matt Helsley
matthltc at us.ibm.com
Fri Oct 2 13:27:26 PDT 2009
On Fri, Oct 02, 2009 at 02:22:00PM -0400, Oren Laadan wrote:
>
>
> Matt Helsley wrote:
> > On Tue, Sep 29, 2009 at 06:56:06PM -0400, Oren Laadan wrote:
> >>
> >> Matt Helsley wrote:
> >>> Save/restore epoll items during checkpoint/restart respectively.
> >>> kmalloc failures should be dealt with more kindly than just error-out
> >>> because epoll is made to poll many thousands of file descriptors.
> >>> Subsequent patches will change epoll c/r to "chunk" its output/input
> >>> respectively.
>
> [...]
>
> >>> + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
> >>> + struct epitem *epi;
> >>> +
> >>> + epi = rb_entry(rbp, struct epitem, rbn);
> >>> + ret = ckpt_obj_collect(ctx, epi->ffd.file, CKPT_OBJ_FILE);
> >> s/ckpt_obj_collect/ckpt_collect_file/
> >
> > OK. Seems like we might recurse here though if the file is an epoll
> > file and there's a cycle in the epoll sets (e.g. epoll set A includes the fd
> > for epoll set B, epoll set B includes the fd for epoll set A).
> > So I think I'll do:
>
> I doubt if an epoll file can epoll an epoll file ...
> (eg. see line 1265 in fs/eventpoll.c)
I am certain it can. The testcase I have proves it and the man page mentions
it too. Since it has a poll op I believe it can be epoll'd (line 691 of
fs/eventpoll.c).
I haven't double checked, but I believe the line you cited just prevents the
epoll set from including its own fd.
<snip>
> >>> + ckpt_write_err(ctx, "ep_items_checkpoint(): checkpoint leak detected.\n", "");
> >> It would be useful (for the user) to know which fd/item caused the
> >> leak, or whether the leak was because i != h->num_items (You could
> >> use two ckpt_write_err(), one inside the mutex and on here).
> >
> > That's a good idea. Note the "fd" in the item isn't necessarily what's
> > in the fd table and the index "i" isn't a stable indicator of where we
> > are in the epoll set. So I'm going to have it print out the path to the
> > "file" rather than either of those things.
> >
>
> Good point. Then please note both - the (original) fd may be a
> useful fact for the developer trying to understand what failed.
True. I suppose if the fd is inconsistent with the file's path then
they'll have enough information to eventually figure it out.
<snip>
> > I can't help but think there's more improvement possible. Some less
> > well-thought-out ideas:
> >
> > It might be even better to remove the prefmt string and have ckpt_write_err
> > variants for specialized areas of the code. If area "foo" could use an
> > err, objref, and file flags you might define:
> >
> > int ckpt_write_foo_err(struct ckpt_ctx *ctx, int err, __s32 objref, int
> > flags, char *fmt, ...);
>
> I thought about it, but wished to avoid proliferation of these
> variants.
Makes sense.
> > Otherwise I wonder if it would be better to join the prefmt and fmt
> > strings by just defining our own, new, conversion specifiers.
> >
>
> Thought about it, but that would limit us in the conversion specifiers
> that we can use, because many are used by printf already. Perhaps one
> way to do it is use
> "%Zxyz"
> where "%Z" tells the following letters indicate a ckpt_write_err()
> format, and then "xyz" are the letters from the current @prefmt.
Would %( work better? The ( suggests that there's more to it.
<snip>
> > I think I see a bug in either ckpt_read_obj or its kerneldocs:
> >
> > /**
> > * ckpt_read_obj - allocate and read an object (ckpt_hdr followed by
> > * payload)
> > * @ctx: checkpoint context
> > * @h: object descriptor
> > * @len: desired payload length (if 0, flexible)
> > * @max: maximum payload length
> > *
> > * Return: new buffer allocated on success, error pointer otherwise
> > */
> > static void *ckpt_read_obj(struct ckpt_ctx *ctx, int len, int max)
> > {
> > struct ckpt_hdr hh;
> > struct ckpt_hdr *h;
> > int ret;
> >
> > again:
> > ret = ckpt_kread(ctx, &hh, sizeof(hh));
> > if (ret < 0)
> > return ERR_PTR(ret);
> > _ckpt_debug(CKPT_DRW, "type %d len %d(%d,%d)\n",
> > hh.type, hh.len, len, max);
> > if (hh.len < sizeof(*h))
> > return ERR_PTR(-EINVAL);
> >
> > if (hh.type == CKPT_HDR_OBJREF) {
> > ret = _ckpt_read_objref(ctx, &hh);
> > if (ret < 0)
> > return ERR_PTR(ret);
> > goto again;
> > }
> >
> > /* if len specified, enforce, else if maximum specified, enforce */
> > if ((len && hh.len != len) || (!len && max && hh.len > max))
> > return ERR_PTR(-EINVAL);
> >
> > The last test really should be:
> >
> > if ((len && hh.len != len) ||
> > (!len && max && (hh.len - sizeof(*h)) > max))
> > return ERR_PTR(-EINVAL);
> >
> > Otherwise it's not really the maximum payload length and the "bug" is
> > just the comment.
>
> The bug is in the comment: @max is intended to tell the max (total)
> buffer size returned to the user. Will fix.
OK. That means I need to fix my code too then. Thanks for fixing these
up.
>
> [...]
>
> >>> + ret = 0;
> >>> + /* Restore the epoll items/watches */
> >>> + for (i = 0; !ret && i < h->num_items; i++) {
> >>> + struct epoll_event epev;
> >>> + struct file *tfile;
> >>> +
> >>> + /* Get the file* for the target file */
> >>> + if (h->items[i].file_objref <= 0) {
> >>> + ret = -EINVAL;
> >>> + break;
> >>> + }
> >> I should probably change the code elsewhere too, but this test
> >> is unnecessary because ckpt_obj_fetch() will complain anyway.
> >
> > I don't think it will complain since I don't see anything in the read or hash
> > code that checks for negative objrefs. However moving this into
>
> What is "this" that you want to move ?
I thought it would be necessary to move the < 0 test above into the
objhash code because...
>
> > ckpt_obj_fetch() would get rid of alot of code much like passing NULL into
> > kfree() does, so I'll remove this test.
>
> ckpt_obj_fetch() won't complain about a negative value a-priori, but
> the search is certain to fail. Nevertheless, I'll tighten the restart
> related calls in objhash to ensure that.
Hmm, I didn't see how it was certain to fail. I guess I'll have to look
again.
Cheers,
-Matt Helsley
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list