[Devel] [PATCH 3/3] ploop: fixup FORCE_{FLUSH,FUA} handling v2

Maxim Patlasov mpatlasov at virtuozzo.com
Tue Jun 21 09:52:24 PDT 2016


On 06/21/2016 12:25 AM, Dmitry Monakhov wrote:
> Maxim Patlasov <mpatlasov at virtuozzo.com> writes:
>
>> Dima,
>>
>> I agree with general approach of this patch, but there are some
>> (easy-to-fix) issues. See, please, inline comments below...
>>
>> On 06/20/2016 11:58 AM, Dmitry Monakhov wrote:
>>> barrier code is broken in many ways:
>>> Currently only ->dio_submit() handles PLOOP_REQ_FORCE_{FLUSH,FUA} correctly.
>>> But request also can goes though ->dio_submit_alloc()->dio_submit_pad and write_page (for indexes)
>>> So in case of grow_dev we have following sequance:
>>>
>>> E_RELOC_DATA_READ:
>>>                ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>>                 ->delta->allocate
>>>                    ->io->submit_allloc: dio_submit_alloc
>>>                      ->dio_submit_pad
>>> E_DATA_WBI : data written, time to update index
>>>                 ->delta->allocate_complete:ploop_index_update
>>>                   ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>>                   ->write_page
>>>                   ->ploop_map_wb_complete
>>>                     ->ploop_wb_complete_post_process
>>>                       ->set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>> E_RELOC_NULLIFY:
>>>
>>>                  ->submit()
>>>
>>> BUG#2: currecntly kaio write_page silently ignores REQ_FUA
>> Sorry, I can't agree, it actually does not ignore:
> I've misstyped. I ment to say REQ_FLUSH.
>>> static void
>>> kaio_write_page(struct ploop_io * io, struct ploop_request * preq,
>>>           struct page * page, sector_t sec, int fua)
>>> {
>>>      /* No FUA in kaio, convert it to fsync */
>>>      if (fua)
>>>          set_bit(PLOOP_REQ_KAIO_FSYNC, &preq->state);
>>
>>> BUG#3: io_direct:dio_submit  if fua_delay is not possible we MUST tag all bios via REQ_FUA
>>>          not just latest one.
>> No need to tag *all*. See inline comments below.
>>
>>> This patch unify barrier handling like follows:
>>> - Get rid of FORCE_{FLUSH,FUA}
>>> - Introduce DELAYED_FLUSH, currecntly it supported only by io_direct
>>> - fix up fua handling for dio_submit
>>>
>>> This makes reloc sequence optimal:
>>> io_direct
>>> RELOC_S: R1, W2, WBI:FLUSH|FUA
>>> RELOC_A: R1, W2, WBI:FLUSH|FUA, W1:NULLIFY|FUA
>>> io_kaio
>>> RELOC_S: R1, W2:FUA, WBI:FUA
>>> RELOC_A: R1, W2:FUA, WBI:FUA, W1:NULLIFY|FUA
>>>
>>> https://jira.sw.ru/browse/PSBM-47107
>>> Signed-off-by: Dmitry Monakhov <dmonakhov at openvz.org>
>>> ---
>>>    drivers/block/ploop/dev.c       |  8 +++++---
>>>    drivers/block/ploop/io_direct.c | 29 +++++++++-----------------
>>>    drivers/block/ploop/io_kaio.c   | 17 ++++++++++------
>>>    drivers/block/ploop/map.c       | 45 ++++++++++++++++++++++-------------------
>>>    include/linux/ploop/ploop.h     |  8 ++++----
>>>    5 files changed, 54 insertions(+), 53 deletions(-)
>>>
>>> diff --git a/drivers/block/ploop/dev.c b/drivers/block/ploop/dev.c
>>> index 96f7850..fbc5f2f 100644
>>> --- a/drivers/block/ploop/dev.c
>>> +++ b/drivers/block/ploop/dev.c
>>> @@ -1224,6 +1224,9 @@ static void ploop_complete_request(struct ploop_request * preq)
>>>    
>>>    	__TRACE("Z %p %u\n", preq, preq->req_cluster);
>>>    
>>> +	if (!preq->error) {
>>> +		WARN_ON(test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state));
>>> +	}
>>>    	while (preq->bl.head) {
>>>    		struct bio * bio = preq->bl.head;
>>>    		preq->bl.head = bio->bi_next;
>>> @@ -2530,9 +2533,8 @@ restart:
>>>    		top_delta = ploop_top_delta(plo);
>>>    		sbl.head = sbl.tail = preq->aux_bio;
>>>    
>>> -		/* Relocated data write required sync before BAT updatee */
>>> -		set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
>>> -
>>> +		/* Relocated data write required sync before BAT updatee
>>> +		 * this will happen inside index_update */
>>>    		if (test_bit(PLOOP_REQ_RELOC_S, &preq->state)) {
>>>    			preq->eng_state = PLOOP_E_DATA_WBI;
>>>    			plo->st.bio_out++;
>>> diff --git a/drivers/block/ploop/io_direct.c b/drivers/block/ploop/io_direct.c
>>> index a6d83fe..d7ecd4a 100644
>>> --- a/drivers/block/ploop/io_direct.c
>>> +++ b/drivers/block/ploop/io_direct.c
>>> @@ -90,21 +90,12 @@ dio_submit(struct ploop_io *io, struct ploop_request * preq,
>>>    	trace_submit(preq);
>>>    
>>>    	preflush = !!(rw & REQ_FLUSH);
>>> -
>>> -	if (test_and_clear_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state))
>>> -		preflush = 1;
>>> -
>>> -	if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state))
>>> -		postfua = 1;
>>> -
>>> -	if (!postfua && ploop_req_delay_fua_possible(rw, preq)) {
>>> -
>>> +	postfua = !!(rw & REQ_FUA);
>>> +	if (ploop_req_delay_fua_possible(rw, preq)) {
>>>    		/* Mark req that delayed flush required */
>>> -		set_bit(PLOOP_REQ_FORCE_FLUSH, &preq->state);
>>> -	} else if (rw & REQ_FUA) {
>>> -		postfua = 1;
>>> +		set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);
>>> +		postfua = 0;
>>>    	}
>> "postfua" is a horrible name, let us see if we can get rid of it
>> completely. Also, the way how ploop_req_delay_fua_possible implemented
>> is prone to errors (see below an issue in kaio_complete_io_state). Let's
>> rework it like this:
> Let it be postflush.
>>> static inline bool ploop_req_delay_fua_possible(struct ploop_request
>>> *preq)
>>> {
>>>      return preq->eng_state == PLOOP_E_DATA_WBI;
>>> }
>> Then, that chunk in the dio_submit above might look as:
>>
>>>      /* If we can delay, mark req that delayed flush required */
>>>      if ((rw & REQ_FUA) && ploop_req_delay_fua_possible(preq))
>>>          set_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state);
>>
>>
>>> -
>>>    	rw &= ~(REQ_FLUSH | REQ_FUA);
>>>    
>>>    
>>> @@ -238,14 +229,15 @@ flush_bio:
>>>    			rw2 |= REQ_FLUSH;
>>>    			preflush = 0;
>>>    		}
>>> -		if (unlikely(postfua && !bl.head))
>>> -			rw2 |= (REQ_FUA | ((bio_num) ? REQ_FLUSH : 0));
>>> +		/* Very unlikely, but correct.
>>> +		 * TODO: Optimize postfua via DELAY_FLUSH for any req state */
>>> +		if (unlikely(!postfua))
>>> +			rw2 |= REQ_FUA;
>> If test_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state), do nothing here --
>> that's obvious. So let's assume !test_bit(PLOOP_REQ_DELAYED_FLUSH,
>> &preq->state)...
>>
>> There is a call to bio_add_page in dio_submit before that loop. The
>> moment bio_add_page returned zero (i.e. OK), we know for sure which bio
>> it came from: it is bw.cur. Then it's enough to mark new bio like this:
> OK
>>>      bio->bi_rw |= bw.cur & REQ_FUA;
>> Then, instead of "submit_bio(rw2, b);", use "submit_bio(rw2 | b->bi_rw,
>> b);".
>>
>> This way we'll mark with FUA only those of new bio-s that has at least
>> some data (pages) from original bio-s with FUA. Makes sense?
>>
>> Btw, dio_io_page() must be fixed similarly (get rid of postfua, mark
>> with FUA only what we really need).
>>
>>>    
>>>    		ploop_acc_ff_out(preq->plo, rw2 | b->bi_rw);
>>>    		submit_bio(rw2, b);
>>>    		bio_num++;
>>>    	}
>>> -
>> I liked this empty line, please keep it! ;)
>>
>>>    	ploop_complete_io_request(preq);
>>>    	return;
>>>    
>>> @@ -1520,15 +1512,14 @@ dio_read_page(struct ploop_io * io, struct ploop_request * preq,
>>>    
>>>    static void
>>>    dio_write_page(struct ploop_io * io, struct ploop_request * preq,
>>> -	       struct page * page, sector_t sec, int fua)
>>> +	       struct page * page, sector_t sec, unsigned long rw)
>>>    {
>>>    	if (!(io->files.file->f_mode & FMODE_WRITE)) {
>>>    		PLOOP_FAIL_REQUEST(preq, -EBADF);
>>>    		return;
>>>    	}
>>>    
>>> -	dio_io_page(io, WRITE | (fua ? REQ_FUA : 0) | REQ_SYNC,
>>> -		    preq, page, sec);
>>> +	dio_io_page(io, rw | WRITE | REQ_SYNC, preq, page, sec);
>>>    }
>>>    
>>>    static int
>>> diff --git a/drivers/block/ploop/io_kaio.c b/drivers/block/ploop/io_kaio.c
>>> index de26319..0b93e13 100644
>>> --- a/drivers/block/ploop/io_kaio.c
>>> +++ b/drivers/block/ploop/io_kaio.c
>>> @@ -72,13 +72,17 @@ static void kaio_complete_io_state(struct ploop_request * preq)
>>>    	}
>>>    
>>>    	/* Convert requested fua to fsync */
>>> -	if (test_and_clear_bit(PLOOP_REQ_FORCE_FUA, &preq->state) ||
>>> +	if (test_and_clear_bit(PLOOP_REQ_DELAYED_FLUSH, &preq->state) ||
>> io_kaio has nothing to do with DELAYED_FLUSH (only dio sets it). Hence,
>> no need to check it here.
> Why not. delayed flush is possible for !reloc writes. In such cases
> we may postpone data flush till index_update. So from kaio point of view
> it will looks like follows: W1,WBI,FSYNC. AFAIU this is legal because
> even if power failure happens we may have WBI, but not W1. In such case
> index will point to zeroed data

OK, then we must have corresponding code in kaio_submit setting 
DELAYED_FLUSH.

Max


More information about the Devel mailing list