[Devel] [vzlin-dev] [PATCH vz7] fuse: relax i_mutex coverage in fuse_fsync

Thu Dec 1 11:27:11 PST 2016

On 12/01/2016 10:09 PM, Maxim Patlasov wrote:
> On 12/01/2016 12:06 AM, Dmitry Monakhov wrote:
>
>> Maxim Patlasov <mpatlasov at virtuozzo.com> writes:
>>
>>> Alexey,
>>>
>>>
>>> You're right. And while composing the patch I well understood that it's
>>> possible to rework fuse_sync_writes() using a counter instead of
>>> negative bias. But the problem with flush_mtime still exists anyway.
>>> Think about it: we firstly acquire local mtime from local inode, then
>>> fill and submit mtime-update-request. Since then, we don't know when
>>> exactly fuse daemon will apply that new mtime to its metadata
>>> structures. If another mtime-update is generated in-between (e.g.
>>> "touch
>>> -d <date> file", or even simplier -- just a single direct write
>>> implicitly updating mtime), we wouldn't know which of those two
>>> mtime-update-requests are processed by fused first. That comes from a
>>> general FUSE protocol limitation: when kernel fuse queues request A,
>>> then request B, it cannot be sure if they will be processed by
>>> userspace
>>> as <A, then B> or <B, then A>.
>>>
>>>
>>> The big advantage of the patch I sent is that it's very simple,
>>> straightforward and presumably will remove 99% of contention between
>>> fsync and io_submit (assuming we spend most of time waiting for
>>> userspace ACK for FUSE_FSYNC request. There are actually three
>>> questions
>>> to answer:
>>>
>>> 1) Do we really must honor a crazy app who mixes a lot of fsyncs with a
>>> lot of io_submits? The goal of fsync is to ensure that some state is
>>> actually went to platters. An app who races io_submit-s with fsync-s
>>> actually doesn't care which state will come to platters. I'm not sure
>>> that it's reasonable to work very hard to achieve the best possible
>>> performance for such a marginal app.
>> Obiously any filesystem behave like this.
>> Task A(mail-server) may perform write/fsync, task B(mysql) do a lot
>> of io_submit-s
>> All that io may happens in parallel, fs guarantee only that metadata
>> will be serialized. So all that concurent IO flowa to blockdevice which
>> does no have i_mutex so all IO indeed happen concurrently.
>
> Looks as you're comparing an app doing POSIX
> open/read/write/fsync/close with fs doing submit_bio. This is a
> stretch. But OK, there is a similarity. But I don't think this rather
> vague similarity proves something.

we are speaking about VM process, which essentially
re-submits IO from the guest to host like above. For sure
QEMU and VM_app have this IO pattern. Thus this
pattern MUST be optimized as this is one of our
main loads.

That is why I think that this case is not marginal
and important.

Den