[Devel] Re: C/R: File substitution at restart
Matt Helsley
matthltc at us.ibm.com
Thu Sep 9 04:02:20 PDT 2010
On Thu, Sep 09, 2010 at 12:37:20PM +0200, Louis Rilling wrote:
> On 08/09/10 21:06 -0700, Matt Helsley wrote:
> > On Wed, Sep 08, 2010 at 08:03:52PM -0500, Serge E. Hallyn wrote:
> > > Quoting Matt Helsley (matthltc at us.ibm.com):
> > > > On Wed, Sep 08, 2010 at 08:09:31AM -0500, Serge E. Hallyn wrote:
> > > > I think it can be split into two composable pieces which may also be
> > > > useful independently.
> > > >
> > > > The first uses the fcntl() interface to add a flag like
> > > > O_CLOEXEC. Unlike O_CLOEXEC it marks an fd for preservation during
> > > > restart. That way we don't have to specify an fd number and a "source"
> > > > to the kernel. Just tell the kernel to keep the fd. The source can
> > > > be opened and dup2'd via userspace. This is useful without the
> > > > second piece if we want to simply add rather than replace an fd.
> > >
> > > Can you think of any other use for this flag other than restart?
> >
> > <joking>
> > I can't think of any other uses for O_CLOEXEC.
> > </joking>
> >
> > Seriously though, restart will be used _much_ less often than exec so yes
> > it does seem like a waste of a valuable bit and something that wouldn't
> > quite belong in an fcntl interface.
> >
> > However we can try to be a tad clever -- we could (ab|re)use O_CLOEXEC.
> > Right now restart closes all file descriptors and pays absolutely
> > no attention to O_CLOEXEC. We could reuse O_CLOEXEC to mean O_CLOREST
> > too. Have user-cr's restart tool mark all unwanted fds O_CLOEXEC. Any we
> > want to keep we do not mark with O_CLOEXEC.
>
> This would also be useful at checkpoint, to tell sys_checkpoint() which fds
> should be ignored, being because it is not supported or because the application
> has a better way to deal with it.
True. Though unlike restart I don't think we just can (ab|re)use O_CLOEXEC
for that purpose.
>
> >
> >
> > Here's another idea which I haven't fully thought out yet.
> >
> > We could introduce the concept of object id substitutions in the image.
> > So the image would look like (going from file pos 0 at the top..):
> >
> > 0 +-------------------------------+
> > | |
> > .....
> > +-------------------------------+
> > | <substitute object> | <--- object with id == <substitute id>
> > .....
> > +---------------+---------------+
> > | <object id> |<substitute id>|
> > +---------------+---------------+
> > .....
> > +---------------+---------------+
> > | <object to ignore> | <-- object with id == <object id>
> > .....
> >
> > (The above is ignoring the ckpt_hdr fields..)
> >
> > When we read the image during restart we use the substitute ids to
> > create indirect objhash entries. When we encounter an obj id and
> > it refers to an indirect entry we first parse the object (ignoring
> > errors and dropping references on new objhash insertions), flip
> > a bit on the indirect entry (indicating the object has been parsed),
> > and then lookup the substitute id and return whatever that resolved to.
> >
> > We can ignore the new objhash objects by making the objhash have its
> > own operation struct. When we're parsing an object that's been
> > substituted we just temporarily set the objhash add/lookup operations
> > to something suitable for properly dropping references to the new
> > object(s). This way we don't have to add checks for this peculiar
> > need all over the checkpoint/restart code. Sure it'll be slower...
>
> If at checkpoint we can take care to ignore files that we know will be
> substituted, this should not be that slower.
So, would you say typically it's the application developer who knows
what to ignore? Are we expecting distros/packagers to be able to set
that up? Admins? These specific optimizations seem like they would be a
bit fragile unless the application developer is involved.
Cheers,
-Matt Helsley
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list