[Devel] [PATCH vz7] fuse: relax i_mutex coverage in fuse_fsync
Maxim Patlasov
mpatlasov at virtuozzo.com
Wed Nov 30 10:01:48 PST 2016
Alexey,
You're right. And while composing the patch I well understood that it's
possible to rework fuse_sync_writes() using a counter instead of
negative bias. But the problem with flush_mtime still exists anyway.
Think about it: we firstly acquire local mtime from local inode, then
fill and submit mtime-update-request. Since then, we don't know when
exactly fuse daemon will apply that new mtime to its metadata
structures. If another mtime-update is generated in-between (e.g. "touch
-d <date> file", or even simplier -- just a single direct write
implicitly updating mtime), we wouldn't know which of those two
mtime-update-requests are processed by fused first. That comes from a
general FUSE protocol limitation: when kernel fuse queues request A,
then request B, it cannot be sure if they will be processed by userspace
as <A, then B> or <B, then A>.
The big advantage of the patch I sent is that it's very simple,
straightforward and presumably will remove 99% of contention between
fsync and io_submit (assuming we spend most of time waiting for
userspace ACK for FUSE_FSYNC request. There are actually three questions
to answer:
1) Do we really must honor a crazy app who mixes a lot of fsyncs with a
lot of io_submits? The goal of fsync is to ensure that some state is
actually went to platters. An app who races io_submit-s with fsync-s
actually doesn't care which state will come to platters. I'm not sure
that it's reasonable to work very hard to achieve the best possible
performance for such a marginal app.
2) Will the patch (in the form I sent it) break something? I think no.
If you know some usecase that can be broken, let's discuss it in more
details.
3) Should we expect some noticeable (or significant) improvement in
performance comparing fuse_fsync with no locking at all vs. the locking
we have with that patch applied? I tend to think that the answer is "no"
because handling FUSE_FSYNC is notoriously heavy-weight operation. If
you disagree, let's firstly measure that difference in performance
(simply commenting out lock/unlock(i_mutex) in fuse_fsync) and then
start to think if it's really worthy to fully re-work locking scheme to
preserve flush_mtime correctness w/o i_mutex.
Thanks,
Maxim
On 11/30/2016 05:09 AM, Alexey Kuznetsov wrote:
> Sorry, missed that pair fuse_set_nowrite/fuse_release_writes
> can be done only under i_mutex.
>
> IMHO it is only due to bad implementation.
> If fuse_set_nowrite would be done with separate
> count instead of adding negative bias, it would
> be possible.
>
>
> On Wed, Nov 30, 2016 at 3:47 PM, Alexey Kuznetsov <kuznet at virtuozzo.com> wrote:
>> Hello!
>>
>> I do not think you got it right.
>>
>> i_mutex in fsync is not about some atomicity,
>> it is about stopping data feed while fsync is executed
>> to prevent livelock.
>>
>> I cannot tell anything about mtime update, it is just some voodoo
>> magic for me.
>>
>> What's about fsync semantics, I see two different ways:
>>
>> A.
>>
>> 1. Remove useless write_inode_now. Its work is done
>> by filemap_write_and_wait_range(), there is no need to repeat it
>> under mutex.
>> 2. move mutex_lock _after_ fuse_sync_writes(), which is essentially
>> fuse continuation forfilemap_write_and_wait_range().
>> 3. i_mutex is preserved only around fsync call.
>>
>> B.
>> 1. Remove write_inode_now as well.
>> 2. Remove i_mutex _completely_. (No idea about mtime voodo though)
>> 2. Replace fuse_sync_writes() with fuse_set_nowrite()
>> and add release after call to FSYNC.
>>
>> Both prevent livelock. B is obviosly optimal.
>>
>> But A preserves historic fuse protocol semantics.
>> F.e. I have no idea would user space survive truncate
>> racing with fsync. pstorage should survice, though this
>> path was never tested.
>>
>>
>>
>>
>>
>> On Wed, Nov 30, 2016 at 4:02 AM, Maxim Patlasov <mpatlasov at virtuozzo.com> wrote:
>>> fuse_fsync_common() does need i_mutex for fuse_sync_writes() and
>>> fuse_flush_mtime(). But when those operations are done, it's actually
>>> doesn't matter whether to hold the lock over fuse_request_send(FUSE_FSYNC)
>>> or not: we ensured that all relevant data were already seen by
>>> userspace fuse daemon, and so it will sync them (by handling FUSE_FSYNC)
>>> anyway; if the user screws up by leaking new data updates in-between, it
>>> is up to the user and doesn't violate fsync(2) semantics.
>>>
>>> https://jira.sw.ru/browse/PSBM-55919
>>>
>>> Signed-off-by: Maxim Patlasov <mpatlasov at virtuozzo.com>
>>> ---
>>> fs/fuse/file.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>> index 464b2f5..559dfd9 100644
>>> --- a/fs/fuse/file.c
>>> +++ b/fs/fuse/file.c
>>> @@ -697,6 +697,8 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end,
>>> goto out;
>>> }
>>>
>>> + mutex_unlock(&inode->i_mutex);
>>> +
>>> memset(&inarg, 0, sizeof(inarg));
>>> inarg.fh = ff->fh;
>>> inarg.fsync_flags = datasync ? 1 : 0;
>>> @@ -715,6 +717,7 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end,
>>> fc->no_fsync = 1;
>>> err = 0;
>>> }
>>> + return err;
>>> out:
>>> mutex_unlock(&inode->i_mutex);
>>> return err;
>>>
More information about the Devel
mailing list