[Devel] Re: [C/R v20][PATCH 38/96] c/r: dump open file descriptors
Matt Helsley
matthltc at us.ibm.com
Sun Mar 21 18:06:06 PDT 2010
On Sun, Mar 21, 2010 at 05:27:03PM +0000, Jamie Lokier wrote:
> Matt Helsley wrote:
> > > That said, if the intent is to allow the restore to be done on
> > > another node with a "similar" filesystem (e.g. created by rsync/node
> > > image), instead of having a coherent distributed filesystem on all
> > > of the nodes then the filename makes sense.
> >
> > Yes, this is the intent.
>
> I would worry about programs which are using files which have been
> deleted, renamed, or (very common) renamed-over by another process
> after being opened, as there's a good chance they will successfully
> open the wrong file after c/r, and corrupt state from then on.
The code in the patches does check for unlinked files and refuses
to checkpoint if an unlinked file is open. Yes, this limits the usefulness
of the code somewhat but it's a problem we can solve and c/r is still quite
useful without the solution.
My favorite solution for unlinked files is keeping the contents of the file
in the checkpoint image. Another solution is relinking it to a new "safe"
location in the filesystem. Determining the "safe" location is not very clean
because we need one "safe" location per filesystem being backed-up. Hence I
tend to favor the first approach. Neither solution is implemented
and thoroughly tested yet though.
These solutions are needed because the data is not available via a normal
filesystem backup. Renames are dealt with by requiring userspace to freeze
and/or safely take a snapshot of the filesystem as with any backup.
> This can be avoided by ensuring every checkpointed application is
> specially "c/r aware", but that makes the feature a lot less
> attractive, as well as uncomfortably unsafe to use on arbitrary
We avoided using that solution for the very flaws you point out.
In fact, so far we've managed to avoid requiring cooperation with
the tasks being checkpointed.
> processes. Ideally, c/r would fail on some types of process
> (e.g. using sockets), but at least fail in a safe way that does not
> lead to quiet data corruption.
We've done our best to try and reach that ideal. You're welcome to have a
look at the code to see if you can find any ways in which we haven't.
Here's the code that refuses to checkpoint unsupported files. I think
it's pretty easy to read:
int checkpoint_file(struct ckpt_ctx *ctx, void *ptr)
{
struct file *file = (struct file *) ptr;
int ret;
if (!file->f_op || !file->f_op->checkpoint) {
ckpt_err(ctx, -EBADF, "%(T)%(P)%(V)f_op lacks checkpoint\n",
file, file->f_op);
return -EBADF;
}
if (is_dnotify_attached(file)) {
ckpt_err(ctx, -EBADF, "%(T)%(P)dnotify unsupported\n", file);
return -EBADF;
}
ret = file->f_op->checkpoint(ctx, file);
if (ret < 0)
ckpt_err(ctx, ret, "%(T)%(P)file checkpoint failed\n", file);
return ret;
}
(As Serge noted, we don't support inotify. inotify and fanotify require
an fd to register the fsnotify marks and the struct file associated with
that fd lacks the f_ops->checkpoint operation, hence that will cause
checkpoint to fail too and, again, there will be no silent corruption)
Negative return values cause sys_checkpoint() to stop checkpointing and
return the given errno. The f_op->checkpoint is often a generic operation
which ensures that the file is not unlinked before it saves things like
the position of the file (checkpoint_file_common()) and the path to the file
(checkpoint_fname()):
int generic_file_checkpoint(struct ckpt_ctx *ctx, struct file *file)
{
struct ckpt_hdr_file_generic *h;
int ret;
/*
* FIXME: when we'll add support for unlinked files/dirs, we'll
* need to distinguish between unlinked filed and unlinked dirs.
*/
if (d_unlinked(file->f_dentry)) {
ckpt_err(ctx, -EBADF, "%(T)%(P)Unlinked files unsupported\n",
file);
return -EBADF;
}
h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_FILE);
if (!h)
return -ENOMEM;
h->common.f_type = CKPT_FILE_GENERIC;
ret = checkpoint_file_common(ctx, file, &h->common);
if (ret < 0)
goto out;
ret = ckpt_write_obj(ctx, &h->common.h);
if (ret < 0)
goto out;
ret = checkpoint_fname(ctx, &file->f_path, &ctx->root_fs_path);
out:
ckpt_hdr_put(ctx, h);
return ret;
}
EXPORT_SYMBOL(generic_file_checkpoint);
I wrote a simple script to look for missing operations in things like
file_operations. It can output counts in directories/files or show the
spot in the files where the struct is defined and a little context.
I used that script to check which files and protocols aren't supported
(for 2.6.33-rc8), I placed a histogram of the output in the wiki, and I've
tried to keep it up-to-date.
https://ckpt.wiki.kernel.org/index.php/UncheckpointableFilesystems
https://ckpt.wiki.kernel.org/index.php/UncheckpointableProtocols
The script is also there for anyone who wants to use it on newer kernels.
Here's the output which is of interest to folks on linux-fsdevel for anyone
who doesn't wish to follow a link -- the number of file_operations
structures missing the .checkpoint operation:
162 arch
3 block
1 crypto
1 Documentation
718 drivers
178 fs
3 9p
8 afs
1 autofs
3 autofs4
1 bad_inode.c
3 binfmt_misc.c
1 block_dev.c
2 cachefiles
1 char_dev.c
15 cifs
4 coda
2 configfs
3 debugfs
8 dlm
1 ext4
1 fifo.c
1 filesystems.c
3 fscache
9 fuse
5 gfs2
1 hugetlbfs
1 jbd2
6 jfs
1 libfs.c
1 locks.c
2 ncpfs
2 nfs
5 nfsd
1 no-block.c
1 notify
1 ntfs
15 ocfs2
55 proc
1 reiserfs
1 signalfd.c
2 smbfs
3 sysfs
1 timerfd.c
3 xfs
1 include
4 ipc
88 kernel
3 lib
12 mm
164 net
1 samples
35 security
29 sound
4 virt
Notes:
1. The missing checkpoint file operation in fs/fifo.c is only an artifact of
the unusual way fifo file ops are assigned. FIFOs are supported.
2. The ext4 missing file operation is for the multiblock groups file in /proc
IMHO trying to checkpoint the contents of /proc files is usually a bad
idea. Thankfuly, most programs don't hold these files open for very
long.
Cheers,
-Matt Helsley
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list