[Devel] Re: [RFC v10][PATCH 08/13] Dump open file descriptors

Dave Hansen dave at linux.vnet.ibm.com
Mon Dec 1 13:25:45 PST 2008


On Mon, 2008-12-01 at 13:02 -0800, Linus Torvalds wrote:
> On Mon, 1 Dec 2008, Dave Hansen wrote:
> > 
> > Why is this done in two steps?  It first grabs a list of fd numbers
> > which needs to be validated, then goes back and turns those into 'struct
> > file's which it saves off.  Is there a problem with doing that
> > fd->'struct file' conversion under the files->file_lock?
> 
> Umm, why do we even worry about this?
> 
> Wouldn't it be much better to make sure that all other threads are 
> stopped before we snapshot, and if we cannot account for some thread (ie 
> there's some elevated count in the fs/files/mm structures that we cannot 
> see from the threads we've stopped), just refuse to dump.

My guess is that the mm is probably ok here, but we'll need some work on
the vfs structures, at least eventually.

The mm is nice that it has ->count separated from ->users.  We can
easily compare the sum of mm->users to the number of tasks we've frozen
since mm->users has nice defined behavior.

But, we've got suckers like proc_fd_info() that do a
get/put_files_struct().  So, somebody just doing lots of looks in /proc
could stop us from ever checkpointing.  Unfortunately, I know of a
number of "monitoring" programs that do just that. :)

I guess we can use the plain counts for now, and add something like
mm->users for the vfs structures in a bit.  

-- Dave

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list