[Devel] Fwd: [PATCH RH9] dm-ploop: port the standby mode feature.
Alexander Atanasov
alexander.atanasov at virtuozzo.com
Wed Oct 12 22:04:18 MSK 2022
Hello,
On 12.10.22 21:08, Konstantin Khorenko wrote:
> Sashas, can you please review the patch from Liu?
>
> And i wonder if we need similar patch for dm-qcow driver.
comments bellow.
>
> -------- Forwarded Message --------
> Subject: [Devel] [PATCH RH9] dm-ploop: port the standby mode feature.
> Date: Mon, 3 Oct 2022 15:42:32 +0800
> From: Liu Kui <Kui.Liu at acronis.com>
> To: devel at acronis.com
>
> ==========================
> commit f5df00558e645959ab08b198681004679d4a3595
> Author: Anton Nefedov <anton.nefedov at virtuozzo.com>
> Date: Fri Nov 16 12:46:28 2018 +0300
>
> ploop: move to standby mode after fsync() error too
>
> Extend bdadef81aba5 'ploop: add a standby mode' so it covers fsync().
>
> https://pmc.acronis.com/browse/VSTOR-17613
>
> Signed-off-by: Anton Nefedov <anton.nefedov at virtuozzo.com>
> Acked-by: Andrey Zaitsev <azaitsev at virtuozzo.com>
> Acked-by: Sergey Lysanov <slysanov at virtuozzo.com>
>
> ===========================
> Patchset description:
>
> ploop: more cases for standby mode
>
> standby mode motivation:
>
> commit bdadef81aba572fdcfd59e4c5c7b18736c962ebd
> Author: Andrei Vagin <avagin at openvz.org>
> Date: Thu Mar 1 11:07:34 2018 +0300
>
> ploop: add a standby mode
>
> This mode shows that a delta lease was stolen and it is
> impossible to
> handle any requests.
>
> We want to know about this situation from the iscsi target.
> When HA
> decides that the current target is broken, it can initialize
> another
> target with the same delta. In this case, the first target has
> to complete
> all in-porgress commands and set the
> ASCQ_04H_ALUA_TG_PT_STANDBY bit in
> their status.
>
> In Linux, bio-s are always completed with EIO in error cases,
> so we need
> another way how to determine this state. This patch addes a new
> block
> queue flag QUEUE_FLAG_STANDBY.
>
> Anton Nefedov (3):
> ploop: move standby mode setting to separate function
> ploop: move to standby mode after fsync() error too
> ploop: move to standby after -ENOTCONN too
>
> Signed-off-by: Liu Kui <Kui.Liu at acronis.com>
> ---
> drivers/md/dm-ploop-map.c | 25 ++++++++++++++++++++++++-
> drivers/md/dm-ploop-target.c | 4 ++++
> drivers/md/dm-ploop.h | 2 ++
> include/linux/blkdev.h | 2 ++
> 4 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
> index e8288e522e28..8f1f614b7746 100644
> --- a/drivers/md/dm-ploop-map.c
> +++ b/drivers/md/dm-ploop-map.c
> @@ -1163,6 +1163,23 @@ static void ploop_queue_resubmit(struct pio *pio)
> queue_work(ploop->wq, &ploop->worker);
> }
> +static void check_standby_mode(struct ploop *ploop, long res) {
> + struct request_queue *q = ploop->queue;
> + int prev;
> +
> + /* move to standby if delta lease was stolen or mount is gone */
> + if (res != -EBUSY && res != -ENOTCONN && res != -EIO) {
> + return;
> + }
> +
> + spin_lock_irq(&q->queue_lock);
> + prev = blk_queue_flag_test_and_set(QUEUE_FLAG_STANDBY, q);
> + spin_unlock_irq(&q->queue_lock);
> +
> + if (!prev)
> + pr_info("ploop: switch into standby mode\n");
> +}
What guarantees that we got EBUSY, EIO or ENOTCONN for the reasons
listed in the description - "This mode shows that a delta lease was
stolen and it is impossible to handle any requests." - what if we got
such an error for other reason? If the problem comes from nfs why not
suspend device mapper from there ?
Also do this errors actually reach dm-ploop ? From my recent tests i
observed that if you have and EIO the filesystem gets it and remounts RO
- it might not reach ploop code at all. Even if you suspend/resume the
queue it would not help with the RO fs.
> +
> static void ploop_data_rw_complete(struct pio *pio)
> {
> bool completed;
> @@ -1175,6 +1192,7 @@ static void ploop_data_rw_complete(struct pio *pio)
> ploop_queue_resubmit(pio);
> return;
> }
> + check_standby_mode(pio->ploop, pio->ret);
> pio->bi_status = errno_to_blk_status(pio->ret);
> }
> @@ -1815,8 +1833,10 @@ void do_ploop_fsync_work(struct work_struct *ws)
> ret = vfs_fsync(file, 0);
> while ((pio = ploop_pio_list_pop(&flush_pios)) != NULL) {
> - if (unlikely(ret))
> + if (unlikely(ret)) {
> + check_standby_mode(ploop, ret);
> pio->bi_status = errno_to_blk_status(ret);
> + }
> ploop_pio_endio(pio);
> }
> }
> @@ -1869,6 +1889,9 @@ int ploop_clone_and_map(struct dm_target *ti,
> struct request *rq,
> struct ploop_rq *prq;
> struct pio *pio;
> + if (blk_queue_standby(ploop->queue))
> + return DM_MAPIO_KILL;
> +
> if (blk_rq_bytes(rq) && ploop_rq_valid(ploop, rq) < 0)
> return DM_MAPIO_KILL;
> diff --git a/drivers/md/dm-ploop-target.c b/drivers/md/dm-ploop-target.c
> index 4b50487b7fb5..677ecb3432f4 100644
> --- a/drivers/md/dm-ploop-target.c
> +++ b/drivers/md/dm-ploop-target.c
> @@ -20,6 +20,7 @@
> #include <linux/uio.h>
> #include <linux/error-injection.h>
> #include "dm-ploop.h"
> +#include "dm-core.h"
> #define DM_MSG_PREFIX "ploop"
> @@ -363,6 +364,9 @@ static int ploop_ctr(struct dm_target *ti,
> unsigned int argc, char **argv)
> INIT_WORK(&ploop->event_work, do_ploop_event_work);
> init_completion(&ploop->inflight_bios_ref_comp);
> + ploop->queue = ti->table->md->queue;
This looks wrong, ploop struct alread have a ptr to ti.
Tables are reloaded in some cases and it is possible that table/md/queue
change.
> + blk_queue_flag_clear(QUEUE_FLAG_STANDBY, ploop->queue);
> +
> for (i = 0; i < 2; i++) {
> release = i ? ploop_inflight_bios_ref_exit1 :
> ploop_inflight_bios_ref_exit0;
> diff --git a/drivers/md/dm-ploop.h b/drivers/md/dm-ploop.h
> index 5d953278a976..2502718c131f 100644
> --- a/drivers/md/dm-ploop.h
> +++ b/drivers/md/dm-ploop.h
> @@ -229,6 +229,8 @@ struct ploop {
> struct timer_list enospc_timer;
> bool event_enospc;
> +
> + struct request_queue *queue;
> };
> struct ploop_rq {
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 8fcd753c70b0..c2ebcdae3770 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -434,6 +434,7 @@ struct request_queue {
> #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */
> #define QUEUE_FLAG_HCTX_ACTIVE 28 /* at least one blk-mq hctx is
> active */
> #define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */
> +#define QUEUE_FLAG_STANDBY 30 /* unable to handle read/write
> requests */
> #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
> (1 << QUEUE_FLAG_SAME_COMP) | \
> @@ -480,6 +481,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag,
> struct request_queue *q);
> #define blk_queue_fua(q) test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
> #define blk_queue_registered(q) test_bit(QUEUE_FLAG_REGISTERED,
> &(q)->queue_flags)
> #define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT,
> &(q)->queue_flags)
> +#define blk_queue_standby(q) test_bit(QUEUE_FLAG_STANDBY,
> &(q)->queue_flags)
> extern void blk_set_pm_only(struct request_queue *q);
> extern void blk_clear_pm_only(struct request_queue *q);
Why introduce a new queue flag?
Why can't device mapper suspend be used ?
Why not use the existing ploop code to defer and retry pios?
--
Regards,
Alexander Atanasov
More information about the Devel
mailing list