[Devel] Re: [BIG RFC] Filesystem-based checkpoint
Serge E. Hallyn
serue at us.ibm.com
Thu Oct 30 12:28:17 PDT 2008
Quoting Oren Laadan (orenl at cs.columbia.edu):
>
> I'm not sure why you say it's "un-linux-y" to begin with. But to the
The thing that is un-linux-y is specifically having user-space pass an
fd to the kernel from which it reads/writes. LSMs had to go to a lot of
pain to avoid doing that for reading policy configuration at boot.
Of course it's now several years later, and moods and tastes change in
the kernel community, but I suspect it's still frowned upon.
> point, here are my thought:
>
>
> 1. What you suggest is to expose the internal data to user space and
> pull it. Isn't that what cryo tried to do ? And the conclusion was
> that it takes too many interfaces to work out, code in, provide, and
> maintain forever, with issues related to backward compatibility and
> what not. In fact, the conclusion was "let's do a kernel-blob" !
Right, the problem with cryo was that it tried to do the checkpoint and
restart themselves at too fine-grained a level in terms of kernel-user
API.
What Dave is suggesting (as I understand it) is just changing the way
the data is shipped between kernel and user-space. But to continue with
sys_checkpoint() and sys_restart(). So I think it's a less fundamental
change than you are thinking.
Now maybe eventually he's going to propose something more esotaric where
doing the mount() actually starts the checkpoint (that's where I figured
he'd be heading), but I think it would still be one action on the part
of userspace telling the kernel "do a checkpoint".
(Or am I wrong on that, Dave?)
[...]
(I'll let Dave respond to your other questions i.e. about what you gain)
> If this is only to be able to parallelize checkpoint - then let's discuss
> the problem, not a specific solution.
The specific problem is that you have userspace pass a file fd to the
kernel and kernel reading/writing to it, which is un-linuxy.
> > It enables us to do all the I/O from userspace: no in-kernel
> > sys_read/write().
>
> What's so wrong with in-kernel vfs_read/write() ? You mentioned deadlocks,
It's un-linux-y :)
[...]
> 5. Your suggestions leaves too many details out. Yes, it's a call for
> discussion. But still. Zap, OpenVZ and other systems build on experience
> and working code. We know how to do incremental, live, and other goodies.
> I'm not sure how these would work with your scheme.
Not sure what problems you envision, but taking the specific example of
pre-dump to prepare for a quick live migration, I could envision a
pre_checkpoint() system call creating the checkpoint data directory
and starting to dump out the data, and starting to copy that data
over the network (optimistically), after which the do_checkpoint()
syscall checks file timestamps and quickly dumps and network-copies the
data which has changed up until the container was frozen.
-serge
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list