[Devel] Re: [RFC v14][PATCH 00/54] Kernel based checkpoint/restart

Louis Rilling Louis.Rilling at kerlabs.com
Tue May 5 07:26:33 PDT 2009


On 05/05/09  8:49 -0500, Serge E. Hallyn wrote:
> Quoting Louis Rilling (Louis.Rilling at kerlabs.com):
> > On 04/05/09  8:01 -0500, Serge E. Hallyn wrote:
> > > Quoting Oren Laadan (orenl at cs.columbia.edu):
> > > > > I see one drawback with this approach if you allow checkpoint of
> > > > > application that is not isolated in a container. In that case, you may
> > > > > want to select which IPC objects to dump to not dump all the IPC objects
> > > > > living in the system. Indeed, this is why we have chosen in Kerrighed to
> > > > > checkpoint IPC objects independently of tasks, since we have no
> > > > > container/namespaces support currently.
> > > > 
> > > > I assume that in this case it will be the application itself that
> > > > will somehow tell the system which specific sysvipc objects (ids) it
> > > > cares about.
> > > > 
> > > > (I'm not sure how would the system otherwise know what to dump and
> > > > what to leave out).
> > > > 
> > > > I originally proposed the construct of cradvise() syscall to handle
> > > > exactly those cases where the application would like to advise the
> > > > kernel about certain resources. So, extending the previous example,
> > > > a task may call something like:
> > > > 
> > > >    cradvise(CHECKPOINT_SYSVIPC_SHM, false);  /* generally skip shm */
> > > >    cradvise(CHECKPOINT_SYSVIPC_SHMID, id, true);  /* but include this */
> > > > 
> > > > or:
> > > >    cradvise(CHECKPOINT_SYSVIPC_SHM, true);  /* generally include shm */
> > > >    cradvise(CHECKPOINT_SYSVIPC_SHMID, id, false);  /* but skip this */
> > > > 
> > > > Anyway, these are just examples of the concept and what sort of generic
> > > > interface can be used to implement it; don't pick on the details...
> > > > 
> > > > Oren.
> > > 
> > > Oren, I have to be honest:  I could of course be wrong, but imo there
> > > is 0 chance of such a bigger-and-uglier-than-ioctl syscall as cradvise
> > > being accepted upstream.  There may be good uses for it, but I think
> > > it's worthwhile thinking of ways around it whenever possible.
> > > 
> > > In this particular case, wouldn't it be better to do something like:
> > > 
> > > 	1. freeze + checkpoint full application + container (== C1)
> > > 	2. continue application, which does a clone(CLONE_COPYIPC) (*1)
> > > 	3. application removes all shms except the one to be
> > > 	checkpointed
> > > 	4. freeze + checkpoint application again ( == C2)
> > > 	5. restart applicaiton from C1
> > > 
> > 
> > Besides COW issues mentioned by Oren in his reply, this approach does not
> > seem to provide the required flexibility. The point is to avoid checkpointing
> > some IPC objects together with the application,
> 
> ... avoided at step 3 ...
> 
> > but we still need those IPC
> > objects, and the application still uses them.
> 
> ... step 5 ...

But this involves changing the application. What I describe requires that the
application is not changed (that is checkpoint/restart is transparent). The
actual policy is handled by some helper configured by sysadmin for instance.

> 
> > Moreover, on restart the
> > administrator should be able to first install the required IPC objects, e.g.
> > re-create them from scratch, or restore them from another checkpoint, and second
> > restart the application, linking it to the previously
> > re-created/restored/whatever SHMs.
> 
> Of course he can do that.
> 
> Anyway I'm not setting off to implement the clone(COPY_IPC)
> functionality, and Oren might be right that cradvise would
> be deemed different from ioctl.  I just thought I'd give a
> warning, and (being a productive type :) give an alternative...

Sure, I never doubted of the positiveness of your reply :) I just pointed
out that this alternative may not be acceptable.

> 
> By the way, another alternative to all of the cr_advise()
> stuff is to have userspace programs carve up your checkpoint
> images.  It's been talked about before, but I believe Nathan
> in particular is worried about what this says about kernel-user
> API.

With large IPC SHMs, as a user I wouldn't like paying the price of
checkpointing them if I do not need it.

If an approach à la cr_advise() still looks too close to ioctl(), I would argue
in favor of implementing as many syscalls as needed to "cleanly" obtain the same
flexibility.

To me cr_advise() looks closer to fcntl() or prctl() than to ioctl(). There are
not so many types of objects for which optional checkpoint should be considered
(at least not as many as device types that could be invented), and the set of
advices for a given object will probably be limited to {CHECKPOINT,
DO_NOT_CHECKPOINT} for checkpoint, and {RESTART, REPLACE_WITH} for restart.

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.openvz.org/pipermail/devel/attachments/20090505/232fd5fa/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers


More information about the Devel mailing list