[Devel] Re: [BIG RFC] Filesystem-based checkpoint
Eric W. Biederman
ebiederm at xmission.com
Thu Oct 30 16:33:16 PDT 2008
Dave Hansen <dave at linux.vnet.ibm.com> writes:
> I hate the syscall. It's a very un-Linux-y way of doing things. There,
> I said it. Here's an alternative. It still uses the syscall to
> initiate things, but it uses debugfs to transport the data instead.
> This is just a concept demonstration. It doesn't actually work, and I
> wouldn't be using debugfs in practice.
A syscall is a very linux-y way to do it.
If you called it a core dump instead of a checkpoint you have exactly the same set
of issues.
Why we are doing vfs_write instead of file->f_op->write I don't understand.
> System calls in Linux are fast. Doing lots of them is not a problem.
> If it becomes one, we can always export a condensed version of this
> format next to the expanded one, kinda like ftrace does. Atomicity with
> this approach is also not a problem. The system call in this approach
> doesn't return until the checkpoint is completely written out.
Extra copies for something (memory) you want to transfer quickly
and efficiently is a problem.
Reading the memory of another process is a problem, to the point
that the /proc/<pid>/mem interface has been removed from the kernel.
> This lets userspace pick and choose what parts of the checkpoint it
> cares about. It enables us to do all the I/O from userspace: no
> in-kernel sys_read/write(). I think this interface is much more
> flexible than a plain syscall.
Then get with Roland McGraff and build the next generation user
space debugging interface.
> Want to do a fast checkpoint? Fine, copy all data, use a lot of memory,
> store it in-kernel. Dump that out when the filesystem is accessed.
> Destroy it when userspace asks.
> So, why not?
Besides the part of creating a bunch of questionable interfaces
that we need to support forever.
Ultimately the question is how do you do checkpoint restore and I just
don't see that happening with a filesystem interface. Way way way too many
dangerous syscalls that are only needed for one thing.
Checkpoint/Restore are an atomic operation, and filesystems suck and building
high level atomic primitives.
Eric
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list