[Devel] [PATCH RHEL9 COMMIT] dm-ploop: combine processing of pios thru prepare list and remove fsync worker
Konstantin Khorenko
khorenko at virtuozzo.com
Mon Jan 27 16:12:35 MSK 2025
The commit is pushed to "branch-rh9-5.14.0-427.44.1.vz9.80.x-ovz" and will appear at git at bitbucket.org:openvz/vzkernel.git
after rh9-5.14.0-427.44.1.vz9.80.6
------>
commit 2aa2dd0074abe2df067201bc6ab6e9dd15f9cb33
Author: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
Date: Fri Jan 24 17:35:43 2025 +0200
dm-ploop: combine processing of pios thru prepare list and remove fsync worker
Currently data pios and fluses are separated into different lists before
handled to workqueue. This can lead to executing of flushes before relevant
data pios and it is not possible to get that dependency in the worker.
So put both data and flush pios into prepare list. This way worker can
get single list of the pios and manage ordere while executing.
Now we can remove the fsync_worker and the worker can queue back more
work without problems.
https://virtuozzo.atlassian.net/browse/VSTOR-91820
Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
======
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement in
performance and makes further optimistations possible.
Known bugs:
- delayed metadata writeback is not working and is missing error handling
- patch to disable it until fixed
- fast path is not working - causes rcu lockups - patch to disable it
Further improvements:
- optimize md pages lookups
Alexander Atanasov (50):
dm-ploop: md_pages map all pages at creation time
dm-ploop: Use READ_ONCE/WRITE_ONCE to access md page data
dm-ploop: fsync after all pios are sent
dm-ploop: move md status to use proper bitops
dm-ploop: convert wait_list and wb_batch_llist to use lockless lists
dm-ploop: convert enospc handling to use lockless lists
dm-ploop: convert suspended_pios list to use lockless list
dm-ploop: convert the rest of the lists to use llist variant
dm-ploop: combine processing of pios thru prepare list and remove
fsync worker
dm-ploop: move from wq to kthread
dm-ploop: move preparations of pios into the caller from worker
dm-ploop: fast path execution for reads
dm-ploop: do not use a wrapper for set_bit to make a page writeback
dm-ploop: BAT use only one list for writeback
dm-ploop: make md writeback timeout to be per page
dm-ploop: add interface to disable bat writeback delay
dm-ploop: convert wb_batch_list to lockless variant
dm-ploop: convert high_prio to status
dm-ploop: split cow processing into two functions
dm-ploop: convert md page rw lock to spin lock
dm-ploop: convert bat_rwlock to bat_lock spinlock
dm-ploop: prepare bat updates under bat_lock
dm-ploop: make ploop_bat_write_complete ready for parallel pio
completion
dm-ploop: make ploop_submit_metadata_writeback return number of
requests sent
dm-ploop: introduce pio runner threads
dm-ploop: add pio list ids to be used when passing pios to runners
dm-ploop: process pios via runners
dm-ploop: disable metadata writeback delay
dm-ploop: disable fast path
dm-ploop: use lockless lists for chained cow updates list
dm-ploop: use lockless lists for data ready pios
dm-ploop: give runner threads better name
dm-ploop: resize operation - add holes bitmap locking
dm-ploop: remove unnecessary operations
dm-ploop: use filp per thread
dm-ploop: catch if we try to advance pio past bio end
dm-ploop: support REQ_FUA for data pios
dm-ploop: proplerly access nr_bat_entries
dm-ploop: fix locking and improve error handling when submitting pios
dm-ploop: fix how ENOTBLK is handled
dm-ploop: sync when suspended or stopping
dm-ploop: rework bat completion logic
dm-ploop: rework logic in pio processing
dm-ploop: end fsync pios in parallel
dm-ploop: make filespace preallocations async
dm-ploop: resubmit enospc pios from dispatcher thread
dm-ploop: dm-ploop: simplify discard completion
dm-ploop: use GFP_ATOMIC instead of GFP_NOIO
dm-ploop: fix locks used in mixed context
dm-ploop: fix how current flags are managed inside threads
Andrey Zhadchenko (13):
dm-ploop: do not flush after metadata writes
dm-ploop: set IOCB_DSYNC on all FUA requests
dm-ploop: remove extra ploop_cluster_is_in_top_delta()
dm-ploop: introduce per-md page locking
dm-ploop: reduce BAT accesses on discard completion
dm-ploop: simplify llseek
dm-ploop: speed up ploop_prepare_bat_update()
dm-ploop: make new allocations immediately visible in BAT
dm-ploop: drop ploop_cluster_is_in_top_delta()
dm-ploop: do not wait for BAT update for non-FUA requests
dm-ploop: add delay for metadata writeback
dm-ploop: submit all postponed metadata on REQ_OP_FLUSH
dm-ploop: handle REQ_PREFLUSH
Feature: dm-ploop: ploop target driver
---
drivers/md/dm-ploop-map.c | 62 ++++++++++++++------------------------------
drivers/md/dm-ploop-target.c | 1 -
drivers/md/dm-ploop.h | 2 --
3 files changed, 20 insertions(+), 45 deletions(-)
diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
index 28244755f3ce..93def46f15b4 100644
--- a/drivers/md/dm-ploop-map.c
+++ b/drivers/md/dm-ploop-map.c
@@ -340,17 +340,14 @@ static int ploop_split_pio_to_list(struct ploop *ploop, struct pio *pio,
}
ALLOW_ERROR_INJECTION(ploop_split_pio_to_list, ERRNO);
-static void ploop_dispatch_pio(struct ploop *ploop, struct pio *pio,
- bool *is_data, bool *is_flush)
+static void ploop_dispatch_pio(struct ploop *ploop, struct pio *pio)
{
struct llist_head *list = (struct llist_head *)&ploop->pios[pio->queue_list_id];
WARN_ON_ONCE(pio->queue_list_id >= PLOOP_LIST_COUNT);
if (pio->queue_list_id == PLOOP_LIST_FLUSH)
- *is_flush = true;
- else
- *is_data = true;
+ list = (struct llist_head *)&ploop->pios[PLOOP_LIST_PREPARE];
llist_add((struct llist_node *)(&pio->list), list);
}
@@ -358,19 +355,14 @@ static void ploop_dispatch_pio(struct ploop *ploop, struct pio *pio,
void ploop_dispatch_pios(struct ploop *ploop, struct pio *pio,
struct list_head *pio_list)
{
- bool is_data = false, is_flush = false;
-
if (pio)
- ploop_dispatch_pio(ploop, pio, &is_data, &is_flush);
+ ploop_dispatch_pio(ploop, pio);
if (pio_list) {
while ((pio = ploop_pio_list_pop(pio_list)) != NULL)
- ploop_dispatch_pio(ploop, pio, &is_data, &is_flush);
+ ploop_dispatch_pio(ploop, pio);
}
- if (is_data)
- queue_work(ploop->wq, &ploop->worker);
- else if (is_flush)
- queue_work(ploop->wq, &ploop->fsync_worker);
+ queue_work(ploop->wq, &ploop->worker);
}
static bool ploop_delay_if_md_busy(struct ploop *ploop, struct md_page *md,
@@ -808,10 +800,9 @@ static void ploop_advance_local_after_bat_wb(struct ploop *ploop,
wait_llist_pending = llist_del_all(&md->wait_llist);
if (wait_llist_pending) {
- wait_llist_pending = llist_reverse_order(wait_llist_pending);
llist_for_each_safe(pos, t, wait_llist_pending) {
pio = list_entry((struct list_head *)pos, typeof(*pio), list);
- list_add_tail(&pio->list, &list);
+ list_add(&pio->list, &list);
}
}
@@ -1689,7 +1680,11 @@ static void ploop_prepare_embedded_pios(struct ploop *ploop,
llist_for_each_safe(pos, t, pios) {
pio = list_entry((struct list_head *)pos, typeof(*pio), list);
INIT_LIST_HEAD(&pio->list); /* until type is changed */
- ploop_prepare_one_embedded_pio(ploop, pio, deferred_pios);
+ if (pio->queue_list_id != PLOOP_LIST_FLUSH)
+ ploop_prepare_one_embedded_pio(ploop, pio, deferred_pios);
+ else
+ llist_add((struct llist_node *)(&pio->list),
+ &ploop->pios[PLOOP_LIST_FLUSH]);
}
}
@@ -1808,6 +1803,9 @@ static void process_ploop_fsync_work(struct ploop *ploop)
llflush_pios = llist_del_all(&ploop->pios[PLOOP_LIST_FLUSH]);
+ if (!llflush_pios)
+ return;
+
file = ploop_top_delta(ploop)->file;
/* All flushes are done as one */
ret = vfs_fsync(file, 0);
@@ -1833,32 +1831,25 @@ void do_ploop_work(struct work_struct *ws)
struct llist_node *lldiscard_pios;
struct llist_node *llcow_pios;
struct llist_node *llresubmit;
- bool do_fsync = false;
unsigned int old_flags = current->flags;
current->flags |= PF_IO_THREAD|PF_LOCAL_THROTTLE|PF_MEMALLOC_NOIO;
- spin_lock_irq(&ploop->deferred_lock);
llembedded_pios = llist_del_all(&ploop->pios[PLOOP_LIST_PREPARE]);
lldeferred_pios = llist_del_all(&ploop->pios[PLOOP_LIST_DEFERRED]);
lldiscard_pios = llist_del_all(&ploop->pios[PLOOP_LIST_DISCARD]);
llcow_pios = llist_del_all(&ploop->pios[PLOOP_LIST_COW]);
llresubmit = llist_del_all(&ploop->llresubmit_pios);
- if (!llist_empty(&ploop->pios[PLOOP_LIST_FLUSH]))
- do_fsync = true;
-
- spin_unlock_irq(&ploop->deferred_lock);
-
/* add old deferred to the list */
if (lldeferred_pios) {
struct llist_node *pos, *t;
struct pio *pio;
- llist_for_each_safe(pos, t, llist_reverse_order(lldeferred_pios)) {
+ llist_for_each_safe(pos, t, lldeferred_pios) {
pio = list_entry((struct list_head *)pos, typeof(*pio), list);
INIT_LIST_HEAD(&pio->list);
- list_add_tail(&pio->list, &deferred_pios);
+ list_add(&pio->list, &deferred_pios);
}
}
@@ -1867,7 +1858,6 @@ void do_ploop_work(struct work_struct *ws)
if (llresubmit)
ploop_process_resubmit_pios(ploop, llist_reverse_order(llresubmit));
-
ploop_process_deferred_pios(ploop, &deferred_pios);
if (lldiscard_pios)
@@ -1878,33 +1868,21 @@ void do_ploop_work(struct work_struct *ws)
ploop_submit_metadata_writeback(ploop);
- current->flags = old_flags;
-
- if (do_fsync)
+ if (!llist_empty(&ploop->pios[PLOOP_LIST_FLUSH]))
process_ploop_fsync_work(ploop);
-}
-
-void do_ploop_fsync_work(struct work_struct *ws)
-{
- struct ploop *ploop = container_of(ws, struct ploop, fsync_worker);
-
- process_ploop_fsync_work(ploop);
+ current->flags = old_flags;
}
static void ploop_submit_embedded_pio(struct ploop *ploop, struct pio *pio)
{
struct ploop_rq *prq = pio->endio_cb_data;
struct request *rq = prq->rq;
- struct work_struct *worker;
- unsigned long flags;
if (blk_rq_bytes(rq)) {
pio->queue_list_id = PLOOP_LIST_PREPARE;
- worker = &ploop->worker;
} else {
WARN_ON_ONCE(pio->bi_op != REQ_OP_FLUSH);
pio->queue_list_id = PLOOP_LIST_FLUSH;
- worker = &ploop->fsync_worker;
}
if (unlikely(ploop->stop_submitting_pios)) {
@@ -1913,9 +1891,9 @@ static void ploop_submit_embedded_pio(struct ploop *ploop, struct pio *pio)
}
ploop_inc_nr_inflight(ploop, pio);
- llist_add((struct llist_node *)(&pio->list), &ploop->pios[pio->queue_list_id]);
+ llist_add((struct llist_node *)(&pio->list), &ploop->pios[PLOOP_LIST_PREPARE]);
- queue_work(ploop->wq, worker);
+ queue_work(ploop->wq, &ploop->worker);
}
void ploop_submit_embedded_pios(struct ploop *ploop, struct list_head *list)
diff --git a/drivers/md/dm-ploop-target.c b/drivers/md/dm-ploop-target.c
index f12c6912f8d0..ea9af6b6abe9 100644
--- a/drivers/md/dm-ploop-target.c
+++ b/drivers/md/dm-ploop-target.c
@@ -384,7 +384,6 @@ static int ploop_ctr(struct dm_target *ti, unsigned int argc, char **argv)
timer_setup(&ploop->enospc_timer, ploop_enospc_timer, 0);
INIT_WORK(&ploop->worker, do_ploop_work);
- INIT_WORK(&ploop->fsync_worker, do_ploop_fsync_work);
INIT_WORK(&ploop->event_work, do_ploop_event_work);
init_completion(&ploop->inflight_bios_ref_comp);
diff --git a/drivers/md/dm-ploop.h b/drivers/md/dm-ploop.h
index 0cd18c0c7bfa..1ba91cbc4f04 100644
--- a/drivers/md/dm-ploop.h
+++ b/drivers/md/dm-ploop.h
@@ -181,7 +181,6 @@ struct ploop {
struct workqueue_struct *wq;
struct work_struct worker;
- struct work_struct fsync_worker;
struct work_struct event_work;
struct completion inflight_bios_ref_comp;
@@ -568,7 +567,6 @@ extern void ploop_submit_embedded_pios(struct ploop *ploop,
extern void ploop_dispatch_pios(struct ploop *ploop, struct pio *pio,
struct list_head *pio_list);
extern void do_ploop_work(struct work_struct *ws);
-extern void do_ploop_fsync_work(struct work_struct *ws);
extern void do_ploop_event_work(struct work_struct *work);
extern int ploop_clone_and_map(struct dm_target *ti, struct request *rq,
union map_info *map_context,
More information about the Devel
mailing list