[Devel] [vzlin-dev] [PATCH vz7] fuse: relax i_mutex coverage in fuse_fsync

Maxim Patlasov mpatlasov at virtuozzo.com
Thu Dec 1 11:30:22 PST 2016


On 12/01/2016 11:27 AM, Denis V. Lunev wrote:

> On 12/01/2016 10:09 PM, Maxim Patlasov wrote:
>> On 12/01/2016 12:06 AM, Dmitry Monakhov wrote:
>>
>>> Maxim Patlasov <mpatlasov at virtuozzo.com> writes:
>>>
>>>> Alexey,
>>>>
>>>>
>>>> You're right. And while composing the patch I well understood that it's
>>>> possible to rework fuse_sync_writes() using a counter instead of
>>>> negative bias. But the problem with flush_mtime still exists anyway.
>>>> Think about it: we firstly acquire local mtime from local inode, then
>>>> fill and submit mtime-update-request. Since then, we don't know when
>>>> exactly fuse daemon will apply that new mtime to its metadata
>>>> structures. If another mtime-update is generated in-between (e.g.
>>>> "touch
>>>> -d <date> file", or even simplier -- just a single direct write
>>>> implicitly updating mtime), we wouldn't know which of those two
>>>> mtime-update-requests are processed by fused first. That comes from a
>>>> general FUSE protocol limitation: when kernel fuse queues request A,
>>>> then request B, it cannot be sure if they will be processed by
>>>> userspace
>>>> as <A, then B> or <B, then A>.
>>>>
>>>>
>>>> The big advantage of the patch I sent is that it's very simple,
>>>> straightforward and presumably will remove 99% of contention between
>>>> fsync and io_submit (assuming we spend most of time waiting for
>>>> userspace ACK for FUSE_FSYNC request. There are actually three
>>>> questions
>>>> to answer:
>>>>
>>>> 1) Do we really must honor a crazy app who mixes a lot of fsyncs with a
>>>> lot of io_submits? The goal of fsync is to ensure that some state is
>>>> actually went to platters. An app who races io_submit-s with fsync-s
>>>> actually doesn't care which state will come to platters. I'm not sure
>>>> that it's reasonable to work very hard to achieve the best possible
>>>> performance for such a marginal app.
>>> Obiously any filesystem behave like this.
>>> Task A(mail-server) may perform write/fsync, task B(mysql) do a lot
>>> of io_submit-s
>>> All that io may happens in parallel, fs guarantee only that metadata
>>> will be serialized. So all that concurent IO flowa to blockdevice which
>>> does no have i_mutex so all IO indeed happen concurrently.
>> Looks as you're comparing an app doing POSIX
>> open/read/write/fsync/close with fs doing submit_bio. This is a
>> stretch. But OK, there is a similarity. But I don't think this rather
>> vague similarity proves something.
> we are speaking about VM process, which essentially
> re-submits IO from the guest to host like above. For sure
> QEMU and VM_app have this IO pattern. Thus this
> pattern MUST be optimized as this is one of our
> main loads.

Yes, I agree. That's exactly why I wrote in the same email (next paragraph):

> This really makes sense. If an app inside a VM loops over ordinary 
> direct writes, while another app (in the same VM) does fsync, it's not 
> fair to suspend the first app for long while just because fuse holds 
> i_mutex for long somewhere deep in fuse_fsync. 

Max

>
> That is why I think that this case is not marginal
> and important.
>
> Den



More information about the Devel mailing list