[Devel] [PATCH] fuse: hotfix truncate_pagecache() issue

Maxim Patlasov mpatlasov at parallels.com
Thu Aug 29 06:01:06 PDT 2013


Hi,

08/29/2013 01:25 PM, Miklos Szeredi пишет:
> On Wed, Aug 28, 2013 at 04:21:46PM +0400, Maxim Patlasov wrote:
>> The way how fuse calls truncate_pagecache() from fuse_change_attributes()
>> is completely wrong. Because, w/o i_mutex held, we never sure whether
>> 'oldsize' and 'attr->size' are valid by the time of execution of
>> truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
>> released fc->lock in the middle of fuse_change_attributes(), we completely
>> loose control of actions which may happen with given inode until we reach
>> truncate_pagecache. The list of potentially dangerous actions includes mmap-ed
>> reads and writes, ftruncate(2) and write(2) extending file size.
>>
>> The typical outcome of doing truncate_pagecache() with outdated arguments is
>> data corruption from user point of view. This is (in some sense) acceptable
>> in cases when the issue is triggered by a change of the file on the server
>> (i.e. externally wrt fuse operation), but it is absolutely intolerable in
>> scenarios when a single fuse client modifies a file without any external
>> intervention. A real life case I discovered by fsx-linux looked like this:
>>
>> 1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
>> FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
>> 2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
>> to the server synchronously, then calls fuse_change_attributes(). The latter
>> updates i_size, releases fc->lock, but before comparing oldsize vs attr->size..
>> 3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
>> updating attributes and i_size, but now oldsize is equal to outarg.attr.size
>> because i_size has just been updated (step 2). Hence, fuse_do_setattr()
>> returns w/o calling truncate_pagecache().
>> 4. As soon as ftruncate(2) completes, the user extends file size by write(2)
>> making a hole in the middle of file, then reads data from the hole either by
>> read(2) or mmap-ed read. The user expects to get zero data from the hole, but
>> gets stale data because truncate_pagecache() is not executed yet.
>>
>> The patch is a hotfix resolving the issue in a simplistic way: let's skip
>> dangerous i_size update and truncate_pagecache if an operation changing file
>> size is in progress. This simplistic approach looks correct for the cases
>> w/o external changes. And to handle them properly, more sophisticated and
>> intrusive techniques (e.g. NFS-like one) would be required. I'd like
>> to postpone it until the issue is well discussed on the mailing list(s).
> Thanks for the analysis!
>
> Okay, what about this even more simplistic approach?
>
> Not tested, but I think it addresses the very crux of the issue: not truncating
> the page cache even though we should.

The patch looks fine, but it solves only one side of the problem 
(exactly what you formulated above). Another side is opposite: 
truncating page cache too late, when the state of inode changed 
significantly. The beginning may be as in the scenario above: 
fuse_dentry_revalidate() discovered that i_size changed (due to our own 
fuse_do_setattr()) and is going to call truncate_pagecache() for some 
'new_size' it believes valid right now. But by the time that particular 
truncate_pagecache() is called, a lot of things may happen:

1) fuse_do_setattr() called truncate_pagecache() according to your patch
2) the file was extended either by write(2) or ftruncate(2) or fallocate(2).
3) mmap-ed write make a page in the extended region dirty

The result will be the lost of data user wrote on the step '3)'. (my 
patch solves the issue at least for all cases w/o external changes)

>
> AFAICS there's no such issue with write(2) or fallocate(2).  But I haven't
> thought about this very deeply.

I added bits to the fuse_perform_write() to address that other side of 
the issue. Fixing fallocate is also required, but I postponed it until 
you include that another fix for fallocate (incorrect use of 
fuse_set_nowrite()).

Thanks,
Maxim



More information about the Devel mailing list