[Devel] [PATCH] fuse kio: Add comment about why we need to wait pending read requests

Tue Oct 23 17:37:33 MSK 2018

On 23.10.2018 16:38, Kirill Tkhai wrote:
> On 23.10.2018 16:08, Pavel Butsykin wrote:
>>
>>
>> On 23.10.2018 13:19, Kirill Tkhai wrote:
>>> On 23.10.2018 13:15, Pavel Butsykin wrote:
>>>>
>>>>
>>>> On 23.10.2018 13:01, Kirill Tkhai wrote:
>>>>> Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
>>>>> ---
>>>>>     fs/fuse/kio/pcs/pcs_fuse_kdirect.c |    7 ++++++-
>>>>>     1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c
>>>>> index 3940d6c255ba..9b45fcbf8941 100644
>>>>> --- a/fs/fuse/kio/pcs/pcs_fuse_kdirect.c
>>>>> +++ b/fs/fuse/kio/pcs/pcs_fuse_kdirect.c
>>>>> @@ -1081,7 +1081,12 @@ static void pcs_kio_setattr_handle(struct fuse_inode *fi, struct fuse_req *req)
>>>>>     	r->end = req->end;
>>>>>     	if (di->size.op == PCS_SIZE_SHRINK) {
>>>>>     		BUG_ON(!mutex_is_locked(&req->io_inode->i_mutex));
>>>>> -		/* wait for aio reads in flight */
>>>>> +		/*
>>>>> +		 * Wait for already submitted aio reads. Further reads
>>>>> +		 * (including already queued to bg_queue) will be stopped
>>>>> +		 * by wait_shrink(), and they will be processed from
>>>>> +		 * _pcs_shrink_end().
>>>>> +		 */
>>>>
>>>> Why do we have to wait already submitted aio reads ?
>>>
>>> Because there could be grow requests. This place is broken, I'm fixing that at the moment.
>>> One-dimensional size.op is not enough to drive race between grow and shrink, we need to
>>> convert PCS_SIZE_* into bit fields like it used to be before.
>>>
>>
>> I don't understand. There are no chances to races between grow and
>> shrink, because the shrink is protected from simultaneous execution with
>> write requests at the fuse level. No write requests - no grows.
> 
> Not exactly. Currently we have fallocate requests, which may spawn a grow operation.
> Even FALLOC_FL_KEEP_SIZE may do that -- then inode mutex is not held. So, the question
> is not so easy, what is correct, and which is place should be fixed.
> 

So, we have two places where possible fallocate without mutex:

1.
static size_t fuse_send_unmap(struct fuse_req *req, struct fuse_io_priv *io,
		loff_t pos, size_t count, fl_owner_t owner)
{

..
	inarg->mode = FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE;
	req->in.h.opcode = FUSE_FALLOCATE;
..

	if (io->async)
		return fuse_async_req_send(fc, req, count, io);

	fuse_request_send(fc, req);
	return count;
}

It's definitely a mistake, we should protect this fallocate
from simultaneous execution with shrink.

2.
static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
				loff_t length)
{
...
	bool lock_inode = !(mode & FALLOC_FL_KEEP_SIZE) ||
			   (mode & (FALLOC_FL_PUNCH_HOLE|FALLOC_FL_ZERO_RANGE));

	/* Return error if mode is not supported */
	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
		     FALLOC_FL_ZERO_RANGE))
		return -EOPNOTSUPP;

...

So, fallocate without mutex only possible
if (mode & FALLOC_FL_KEEP_SIZE) == FALLOC_FL_KEEP_SIZE.

Hmm, what does it mean ? Looks like it will be the same as
FALLOC_FL_ZERO_RANGE|FALLOC_FL_KEEP_SIZE.

I wonder, Do we need to handle such fallocate for vstorage at all?

It seems that NO, it's just PCS_IREQ_NOOP.

Alexey?

>>>>>     		inode_dio_wait(req->io_inode);
>>>>>     		/*
>>>>>     		 * Writebackcache was flushed already so it is safe to
>>>>>