[Devel] [PATCH rh7] ploop: io_direct: delay f_op->fsync() until index_update for reloc requests

Dmitry Monakhov dmonakhov at openvz.org
Wed Jul 20 04:27:44 PDT 2016


Maxim Patlasov <mpatlasov at virtuozzo.com> writes:

> Dima,
>
>
> I have not heard from you since 07/06/2016. Do you agree with that 
> reasoning I provided in last email? What's your objection against the 
> patch now?
Max, this patch looks ugly because it mix many things in one place.
In order to do this in right way let's introduce fsync-pended eng-state
where we can queue our requests and let fsync_thread will handle this.
Thre are three places where we need such functionality.

ENTRY: for req with FUA and IO_FSYNC_PENDED
PLOOP_E_DATA_WBI: for reqs with FUA
PLOOP_E_NULLIFY: 
PLOOP_E_COMPLETE: for reqs with FUA
Let's do it one once and it will works fine for all cases.
>
>
> Thanks,
>
> Maxim
>
>
> On 07/06/2016 11:10 AM, Maxim Patlasov wrote:
>> Dima,
>>
>> On 07/06/2016 04:58 AM, Dmitry Monakhov wrote:
>>
>>> Maxim Patlasov <mpatlasov at virtuozzo.com> writes:
>>>
>>>> Commit 9f860e606 introduced an engine to delay fsync: doing
>>>> fallocate(FALLOC_FL_CONVERT_UNWRITTEN) dio_post_submit marks
>>>> io as PLOOP_IO_FSYNC_DELAYED to ensure that fsync happens
>>>> later, when incoming FLUSH|FUA comes.
>>>>
>>>> That was deemed as important because (PSBM-47026):
>>>>
>>>>> This optimization becomes more important due to the fact that 
>>>>> customers tend to use pcompact heavily => ploop images grow each day.
>>>> Now, we can easily re-use the engine to delay fsync for reloc
>>>> requests as well. As explained in the description of commit
>>>> 5aa3fe09:
>>>>
>>>>>      1->read_data_from_old_post
>>>>>      2->write_to_new_pos
>>>>>        ->sumbit_alloc
>>>>>           ->submit_pad
>>>>>       ->post_submit->convert_unwritten
>>>>>      3->update_index ->write_page with FLUSH|FUA
>>>>>      4->nullify_old_pos
>>>>>     5->issue_flush
>>>> by the time of step 3 extent coversion is not yet stable because
>>>> belongs to uncommitted transaction. But instead of doing fsync
>>>> inside ->post_submit, we can fsync later, as the very first step
>>>> of write_page for index_update.
>>> NAK from me. What is advantage of this patch?
>>
>> The advantage is the following: in case of BAT multi-updates, instead 
>> of doing many fsync-s (one per dio_post_submit), we'll do only one 
>> (when final ->write_page is called).
>>
>>> Does it makes code more optimal? No
>>
>> Yes, it does. In the same sense as 9f860e606: saving some fsync-s.
>>
>>> Does it makes main ploop more asynchronous? No.
>>
>> Correct, the patch optimizes ploop in the other way. It's not about 
>> making ploop more asynchronous.
>>
>>
>>>
>>> If you want to make optimization then it is reasonable to
>>> queue preq with PLOOP_IO_FSYNC_DELAYED to top_io->fsync_queue
>>> before processing PLOOP_E_DATA_WBI  state for  preq with FUA
>>> So sequence will looks like follows:
>>> ->sumbit_alloc
>>>    ->submit_pad
>>>    ->post_submit->convert_unwritten-> tag PLOOP_IO_FSYNC_DELAYED
>>> ->ploop_req_state_process
>>>    case PLOOP_E_DATA_WBI:
>>>    if (preq->start & PLOOP_IO_FSYNC_DELAYED_FL) {
>>>        preq->start &= ~PLOOP_IO_FSYNC_DELAYED_FL
>>>        list_add_tail(&preq->list, &top_io->fsync_queue)
>>>        return;
>>>     }
>>> ##Let fsync_thread do it's work
>>> ->ploop_req_state_process
>>>     case LOOP_E_DATA_WBI:
>>>     update_index->write_page with FUA (FLUSH is not required because 
>>> we  already done fsync)
>>
>> That's another type of optimization: making ploop more asynchronous. I 
>> thought about it, but didn't come to conclusion whether it's worthy 
>> w.r.t. adding more complexity to ploop-state-machine and possible bugs 
>> introduced with that.
>>
>> Thanks,
>> Maxim
>>
>>>
>>>> https://jira.sw.ru/browse/PSBM-47026
>>>>
>>>> Signed-off-by: Maxim Patlasov <mpatlasov at virtuozzo.com>
>>>> ---
>>>>   drivers/block/ploop/dev.c       |    4 ++--
>>>>   drivers/block/ploop/io_direct.c |   25 ++++++++++++++++++++++++-
>>>>   drivers/block/ploop/io_kaio.c   |    3 ++-
>>>>   drivers/block/ploop/map.c       |   17 ++++++++++++-----
>>>>   include/linux/ploop/ploop.h     |    3 ++-
>>>>   5 files changed, 42 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
>>>> index e5f010b..40768b6 100644
>>>> --- a/drivers/block/ploop/dev.c
>>>> +++ b/drivers/block/ploop/dev.c
>>>> @@ -4097,7 +4097,7 @@ static void ploop_relocate(struct ploop_device 
>>>> * plo)
>>>>       preq->bl.tail = preq->bl.head = NULL;
>>>>       preq->req_cluster = 0;
>>>>       preq->req_size = 0;
>>>> -    preq->req_rw = WRITE_SYNC|REQ_FUA;
>>>> +    preq->req_rw = WRITE_SYNC;
>>>>       preq->eng_state = PLOOP_E_ENTRY;
>>>>       preq->state = (1 << PLOOP_REQ_SYNC) | (1 << PLOOP_REQ_RELOC_A);
>>>>       preq->error = 0;
>>>> @@ -4401,7 +4401,7 @@ static void ploop_relocblks_process(struct 
>>>> ploop_device *plo)
>>>>           preq->bl.tail = preq->bl.head = NULL;
>>>>           preq->req_cluster = ~0U; /* uninitialized */
>>>>           preq->req_size = 0;
>>>> -        preq->req_rw = WRITE_SYNC|REQ_FUA;
>>>> +        preq->req_rw = WRITE_SYNC;
>>>>           preq->eng_state = PLOOP_E_ENTRY;
>>>>           preq->state = (1 << PLOOP_REQ_SYNC) | (1 << 
>>>> PLOOP_REQ_RELOC_S);
>>>>           preq->error = 0;
>>>> diff --git a/drivers/block/ploop/io_direct.c 
>>>> b/drivers/block/ploop/io_direct.c
>>>> index 1086850..0a5fb15 100644
>>>> --- a/drivers/block/ploop/io_direct.c
>>>> +++ b/drivers/block/ploop/io_direct.c
>>>> @@ -1494,13 +1494,36 @@ dio_read_page(struct ploop_io * io, struct 
>>>> ploop_request * preq,
>>>>     static void
>>>>   dio_write_page(struct ploop_io * io, struct ploop_request * preq,
>>>> -           struct page * page, sector_t sec, unsigned long rw)
>>>> +           struct page * page, sector_t sec, unsigned long rw,
>>>> +           int do_fsync_if_delayed)
>>>>   {
>>>>       if (!(io->files.file->f_mode & FMODE_WRITE)) {
>>>>           PLOOP_FAIL_REQUEST(preq, -EBADF);
>>>>           return;
>>>>       }
>>>>   +    if (do_fsync_if_delayed &&
>>>> +        test_bit(PLOOP_IO_FSYNC_DELAYED, &io->io_state)) {
>>>> +        struct ploop_device * plo = io->plo;
>>>> +        u64 io_count;
>>>> +        int err;
>>>> +
>>>> +        spin_lock_irq(&plo->lock);
>>>> +        io_count = io->io_count;
>>>> +        spin_unlock_irq(&plo->lock);
>>>> +
>>>> +        err = io->ops->sync(io);
>>>> +        if (err) {
>>>> +            PLOOP_FAIL_REQUEST(preq, -EBADF);
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        spin_lock_irq(&plo->lock);
>>>> +        if (io_count == io->io_count && !(io_count & 1))
>>>> +            clear_bit(PLOOP_IO_FSYNC_DELAYED, &io->io_state);
>>>> +        spin_unlock_irq(&plo->lock);
>>>> +    }
>>>> +
>>>>       dio_io_page(io, rw | WRITE | REQ_SYNC, preq, page, sec);
>>>>   }
>>>>   diff --git a/drivers/block/ploop/io_kaio.c 
>>>> b/drivers/block/ploop/io_kaio.c
>>>> index 85863df..0d731ef 100644
>>>> --- a/drivers/block/ploop/io_kaio.c
>>>> +++ b/drivers/block/ploop/io_kaio.c
>>>> @@ -614,7 +614,8 @@ kaio_read_page(struct ploop_io * io, struct 
>>>> ploop_request * preq,
>>>>     static void
>>>>   kaio_write_page(struct ploop_io * io, struct ploop_request * preq,
>>>> -         struct page * page, sector_t sec, unsigned long rw)
>>>> +        struct page * page, sector_t sec, unsigned long rw,
>>>> +        int do_fsync_if_delayed)
>>>>   {
>>>>       ploop_prepare_tracker(preq, sec);
>>>>   diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c
>>>> index 1883674..96e428b 100644
>>>> --- a/drivers/block/ploop/map.c
>>>> +++ b/drivers/block/ploop/map.c
>>>> @@ -910,6 +910,7 @@ void ploop_index_update(struct ploop_request * 
>>>> preq)
>>>>       sector_t sec;
>>>>       unsigned long rw;
>>>>       unsigned long state = READ_ONCE(preq->state);
>>>> +    int do_fsync_if_delayed = 0;
>>>>         /* No way back, we are going to initiate index write. */
>>>>   @@ -970,10 +971,13 @@ void ploop_index_update(struct ploop_request 
>>>> * preq)
>>>>       preq->req_rw &= ~REQ_FLUSH;
>>>>         /* Relocate requires consistent index update */
>>>> -    if (state & (PLOOP_REQ_RELOC_A_FL|PLOOP_REQ_RELOC_S_FL))
>>>> +    if (state & (PLOOP_REQ_RELOC_A_FL|PLOOP_REQ_RELOC_S_FL)) {
>>>>           rw |= (REQ_FLUSH | REQ_FUA);
>>>> +        do_fsync_if_delayed = 1;
>>>> +    }
>>>>   - top_delta->io.ops->write_page(&top_delta->io, preq, page, sec, rw);
>>>> + top_delta->io.ops->write_page(&top_delta->io, preq, page, sec, rw,
>>>> +                      do_fsync_if_delayed);
>>>>         put_page(page);
>>>>       return;
>>>> @@ -1096,6 +1100,7 @@ static void map_wb_complete(struct map_node * 
>>>> m, int err)
>>>>       unsigned int idx;
>>>>       sector_t sec;
>>>>       unsigned long rw;
>>>> +    int do_fsync_if_delayed = 0;
>>>>         /* First, complete processing of written back indices,
>>>>        * finally instantiate indices in mapping cache.
>>>> @@ -1193,8 +1198,10 @@ static void map_wb_complete(struct map_node * 
>>>> m, int err)
>>>>                 state = READ_ONCE(preq->state);
>>>>               /* Relocate requires consistent index update */
>>>> -            if (state & (PLOOP_REQ_RELOC_A_FL|PLOOP_REQ_RELOC_S_FL))
>>>> +            if (state & (PLOOP_REQ_RELOC_A_FL|PLOOP_REQ_RELOC_S_FL)) {
>>>>                   rw |= (REQ_FLUSH | REQ_FUA);
>>>> +                do_fsync_if_delayed = 1;
>>>> +            }
>>>>                 preq->eng_state = PLOOP_E_INDEX_WB;
>>>>               get_page(page);
>>>> @@ -1221,8 +1228,8 @@ static void map_wb_complete(struct map_node * 
>>>> m, int err)
>>>>       plo->st.map_multi_writes++;
>>>>       top_delta->ops->map_index(top_delta, m->mn_start, &sec);
>>>>   - top_delta->io.ops->write_page(&top_delta->io, main_preq, page, sec,
>>>> -                      rw);
>>>> + top_delta->io.ops->write_page(&top_delta->io, main_preq, page, 
>>>> sec, rw,
>>>> +                      do_fsync_if_delayed);
>>>>       put_page(page);
>>>>   }
>>>>   diff --git a/include/linux/ploop/ploop.h 
>>>> b/include/linux/ploop/ploop.h
>>>> index deee8a7..b03565b 100644
>>>> --- a/include/linux/ploop/ploop.h
>>>> +++ b/include/linux/ploop/ploop.h
>>>> @@ -164,7 +164,8 @@ struct ploop_io_ops
>>>>       void    (*read_page)(struct ploop_io * io, struct 
>>>> ploop_request * preq,
>>>>                    struct page * page, sector_t sec);
>>>>       void    (*write_page)(struct ploop_io * io, struct 
>>>> ploop_request * preq,
>>>> -                  struct page * page, sector_t sec, unsigned long rw);
>>>> +                  struct page * page, sector_t sec, unsigned long rw,
>>>> +                  int do_fsync_if_delayed);
>>>>           int    (*sync_read)(struct ploop_io * io, struct page * page,
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 472 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/devel/attachments/20160720/2bef7118/attachment-0001.sig>


More information about the Devel mailing list