[Devel] [PATCH RHEL7 COMMIT] ploop: mark reloc reqs to force FUA/fsync(kaio) for index update I/O
Konstantin Khorenko
khorenko at odin.com
Mon May 18 21:27:16 PDT 2015
The commit is pushed to "branch-rh7-3.10.0-123.1.2-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-123.1.2.vz7.5.1
------>
commit 197e223da01fd641d5d5f0df6dbf3968aef56c9c
Author: Andrey Smetanin <asmetanin at virtuozzo.com>
Date: Tue May 19 08:27:15 2015 +0400
ploop: mark reloc reqs to force FUA/fsync(kaio) for index update I/O
Series description:
During relocation of ploop clusters (resize/baloon) we need to FUA/fsync
image file after such operations:
a) new data block wrote
b) BAT update
c) nullify old data block for BAT grow. We do this already nullify of old data
block at format module -> complete_grow callback.
This patch forses fsync(kaio), FUA(direct) of reloc write I/O to image
by marking such reloc reqs(A|S) with appropriate flags. Kaio/direct modules
tuned by patch to force fsync/FUA if these flags are set. This code does
FUA/fsync only for a) and b) cases, while c) already implemented.
Also patch fixes inconsistent bio list FUA processing in direct module.
The problem is that for bunch of bios we only set FUA at last bio. Its possible
in case of power outage that last bio will be stored and previos are not
because they are stored only in cache at the time of power failure.
To solve problem this patch marking last bio as FLUSH|FUA if more than one bio
in list.
Moreover for KAIO if fsync possible at BAT update stage we do that like we
did in direct case instead of 2 fsync's. For direct case if we going to make
FUA at BAT update only(optimization trick that already exists) then we need
to mark req to FLUSH previously written(without FUA) data.
Performance:
Overall(includes EXT4 resize upto 16T) resize performance degradated by -5% of
time.
https://jira.sw.ru/browse/PSBM-31222
https://jira.sw.ru/browse/PSBM-31225
https://jira.sw.ru/browse/PSBM-31321
Signed-off-by: Andrey Smetanin <asmetanin at parallels.com>
Andrey Smetanin (7):
ploop: define struct ploop_request->state flags to force pre FLUSH
before write IO and FUA/fsync at I/O complete
ploop: mark reloc reqs to force FUA/fsync(kaio) for index update I/O
ploop: mark reloc reqs to force FUA before write of relocated data
ploop: direct: to support truly FLUSH/FUA of req we need mark first
bio FLUSH, write all bios and mark last bio as FLUSH/FUA
ploop: added ploop_req_delay_fua_possible() func that detects possible
delaying of upcoming FUA to index update stage. This function will
be lately used in direct/kaio code to detect and delay FUA
ploop: make image fsync at I/O complete if it's required by FUA/fsync
force flag or by req->req_rw
ploop: do preflush or postfua according force FUA/flush flags, and
delay FUA if possible but add force FLUSH to req if so
This patch description:
Need to force FUA/fsync of index update I/O for consistent resize.
https://jira.sw.ru/browse/PSBM-31222
https://jira.sw.ru/browse/PSBM-31225
https://jira.sw.ru/browse/PSBM-31321
Signed-off-by: Andrey Smetanin <asmetanin at parallels.com>
Reviewed-by: Andrew Vagin <avagin at parallels.com>
---
drivers/block/ploop/map.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/block/ploop/map.c b/drivers/block/ploop/map.c
index 2e971cd..67e2852 100644
--- a/drivers/block/ploop/map.c
+++ b/drivers/block/ploop/map.c
@@ -953,6 +953,11 @@ void ploop_index_update(struct ploop_request * preq)
__TRACE("wbi %p %u %p\n", preq, preq->req_cluster, m);
plo->st.map_single_writes++;
top_delta->ops->map_index(top_delta, m->mn_start, &sec);
+ /* Relocate requires consistent writes, mark such reqs appropriately */
+ if (test_bit(PLOOP_REQ_RELOC_A, &preq->state) ||
+ test_bit(PLOOP_REQ_RELOC_S, &preq->state))
+ set_bit(PLOOP_REQ_FORCE_FUA, &preq->state);
+
top_delta->io.ops->write_page(&top_delta->io, preq, page, sec,
!!(preq->req_rw & REQ_FUA));
put_page(page);
@@ -1050,6 +1055,11 @@ static void map_wb_complete_post_process(struct ploop_map *map,
memset(page_address(preq->aux_bio->bi_io_vec[i].bv_page),
0, PAGE_SIZE);
+ /*
+ * FUA of this data occures at format driver ->complete_grow() by
+ * all image sync. After that header size increased to use this
+ * cluster as BAT cluster.
+ */
top_delta->io.ops->submit(&top_delta->io, preq, preq->req_rw,
&sbl, preq->iblock, 1<<plo->cluster_log);
}
@@ -1064,7 +1074,7 @@ static void map_wb_complete(struct map_node * m, int err)
int delayed = 0;
unsigned int idx;
sector_t sec;
- int fua;
+ int fua, force_fua;
/* First, complete processing of written back indices,
* finally instantiate indices in mapping cache.
@@ -1135,6 +1145,7 @@ static void map_wb_complete(struct map_node * m, int err)
main_preq = NULL;
fua = 0;
+ force_fua = 0;
list_for_each_safe(cursor, tmp, &m->io_queue) {
struct ploop_request * preq;
@@ -1156,6 +1167,10 @@ static void map_wb_complete(struct map_node * m, int err)
if (preq->req_rw & REQ_FUA)
fua = 1;
+ if (test_bit(PLOOP_REQ_RELOC_A, &preq->state) ||
+ test_bit(PLOOP_REQ_RELOC_S, &preq->state))
+ force_fua = 1;
+
preq->eng_state = PLOOP_E_INDEX_WB;
get_page(page);
preq->sinfo.wi.tpage = page;
@@ -1180,6 +1195,10 @@ static void map_wb_complete(struct map_node * m, int err)
__TRACE("wbi2 %p %u %p\n", main_preq, main_preq->req_cluster, m);
plo->st.map_multi_writes++;
top_delta->ops->map_index(top_delta, m->mn_start, &sec);
+
+ if (force_fua)
+ set_bit(PLOOP_REQ_FORCE_FUA, &main_preq->state);
+
top_delta->io.ops->write_page(&top_delta->io, main_preq, page, sec, fua);
put_page(page);
}
More information about the Devel
mailing list