[Devel] [PATCH RH9] dm-ploop: port the standby mode feature.

Thu Oct 13 16:33:02 MSK 2022

Hello, 

First of all, please bear in mind that this patch is a port of a patchset currently present in VZ 7 kernel. 
Original patches were added just to address a problem that would only happen in our particular use case, 
where we need to use ploop devices as the back store for  iSCSI target, while ploop devices are backed 
by files stored in vstorage filesystem.   

With this background info, please see my comments below. 

-----Original Message-----
From: Konstantin Khorenko <khorenko at virtuozzo.com>
Date: Thursday, 13 October 2022 at 5:23 PM
To: Kui Liu <Kui.Liu at acronis.com>, Alexey Kuznetsov <kuznet at acronis.com>
Cc: Devel <devel at openvz.org>, Alexander Atanasov <alexander.atanasov at virtuozzo.com>, Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
Subject: Re: Fwd: [Devel] [PATCH RH9] dm-ploop: port the standby mode feature.

    Adding Alexey and Liu from Acronis.

    Lesha, Liu, please check questions from Alexander in the comments below.

    --
    Best regards,

    Konstantin Khorenko,
    Virtuozzo Linux Kernel Team

    On 12.10.2022 21:04, Alexander Atanasov wrote:
    > Hello,
    > 
    > On 12.10.22 21:08, Konstantin Khorenko wrote:
    >> Sashas, can you please review the patch from Liu?
    >>
    >> And i wonder if we need similar patch for dm-qcow driver.
    > 
    comments bellow.

    >>
    >> -------- Forwarded Message --------
    >> Subject: [Devel] [PATCH RH9] dm-ploop: port the standby mode feature.
    >> Date: Mon, 3 Oct 2022 15:42:32 +0800
    >> From: Liu Kui <Kui.Liu at acronis.com>
    >> To: devel at acronis.com
    >>
    >> ==========================
    >> commit f5df00558e645959ab08b198681004679d4a3595
    >> Author: Anton Nefedov <anton.nefedov at virtuozzo.com>
    >> Date:   Fri Nov 16 12:46:28 2018 +0300
    >>
    >>       ploop: move to standby mode after fsync() error too
    >>
    >>       Extend bdadef81aba5 'ploop: add a standby mode' so it covers fsync().
    >>
    >>       https://pmc.acronis.com/browse/VSTOR-17613
    >>
    >>       Signed-off-by: Anton Nefedov <anton.nefedov at virtuozzo.com>
    >>       Acked-by: Andrey Zaitsev <azaitsev at virtuozzo.com>
    >>       Acked-by: Sergey Lysanov <slysanov at virtuozzo.com>
    >>
    >>       ===========================
    >>       Patchset description:
    >>
    >>       ploop: more cases for standby mode
    >>
    >>       standby mode motivation:
    >>
    >>         commit bdadef81aba572fdcfd59e4c5c7b18736c962ebd
    >>         Author: Andrei Vagin <avagin at openvz.org>
    >>         Date:   Thu Mar 1 11:07:34 2018 +0300
    >>
    >>           ploop: add a standby mode
    >>
    >>           This mode shows that a delta lease was stolen and it is
    >> impossible to
    >>           handle any requests.
    >>
    >>           We want to know about this situation from the iscsi target.
    >> When HA
    >>           decides that the current target is broken, it can initialize
    >> another
    >>           target with the same delta. In this case, the first target has
    >> to complete
    >>           all in-porgress commands and set the
    >> ASCQ_04H_ALUA_TG_PT_STANDBY bit in
    >>           their status.
    >>
    >>           In Linux, bio-s are always completed with EIO in error cases,
    >> so we need
    >>           another way how to determine this state. This patch addes a new
    >> block
    >>           queue flag QUEUE_FLAG_STANDBY.
    >>
    >>       Anton Nefedov (3):
    >>         ploop: move standby mode setting to separate function
    >>         ploop: move to standby mode after fsync() error too
    >>         ploop: move to standby after -ENOTCONN too
    >>
    >> Signed-off-by: Liu Kui <Kui.Liu at acronis.com>
    >> ---
    >>    drivers/md/dm-ploop-map.c    | 25 ++++++++++++++++++++++++-
    >>    drivers/md/dm-ploop-target.c |  4 ++++
    >>    drivers/md/dm-ploop.h        |  2 ++
    >>    include/linux/blkdev.h       |  2 ++
    >>    4 files changed, 32 insertions(+), 1 deletion(-)
    >>
    >> diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
    >> index e8288e522e28..8f1f614b7746 100644
    >> --- a/drivers/md/dm-ploop-map.c
    >> +++ b/drivers/md/dm-ploop-map.c
    >> @@ -1163,6 +1163,23 @@ static void ploop_queue_resubmit(struct pio *pio)
    >>        queue_work(ploop->wq, &ploop->worker);
    >>    }
    >>    +static void check_standby_mode(struct ploop *ploop, long res) {
    >> +    struct request_queue *q = ploop->queue;
    >> +    int prev;
    >> +
    >> +    /* move to standby if delta lease was stolen or mount is gone */
    >> +    if (res != -EBUSY && res != -ENOTCONN && res != -EIO) {
    >> +        return;
    >> +    }
    >> +
    >> +    spin_lock_irq(&q->queue_lock);
    >> +    prev = blk_queue_flag_test_and_set(QUEUE_FLAG_STANDBY, q);
    >> +    spin_unlock_irq(&q->queue_lock);
    >> +
    >> +    if (!prev)
    >> +        pr_info("ploop: switch into standby mode\n");
    >> +}

    What guarantees that we got EBUSY, EIO or ENOTCONN for the reasons
    listed in the description - "This mode shows that a delta lease was
    stolen and it is impossible to handle any requests." - what if we got
    such an error for other reason? If the problem comes from nfs why not
    suspend device mapper from there ?

[LIU]: I would assume vstorage filesystem guarantees that only EBUSY, EIO, or ENOTCONN 
will be returned when the described event happens, and it doesn't matter whether we could 
get these errors for other reasons.  And what matters is, in our use case, once the 3 errors are
returned, the ploop device need to be flagged as 'standby mode' which needs to be passed to 
 iSCSI target driver.  As for what happens when used with other filesystems, we actually don't care,
 however you can see that the changes don't affect ploop's normal behaviour either.  

    Also do this errors actually reach dm-ploop ? From my recent tests i
    observed that if you have and EIO the filesystem gets it and remounts RO
    - it might not reach ploop code at all. Even if you suspend/resume the
    queue it would not help with the RO fs.

[LIU]:  When used with vstorage filesystem, these errors do reach dm-ploop.  Again, we really 
don’t care about other filesystems as long as it doesn't break anything. 

    >> +
    >>    static void ploop_data_rw_complete(struct pio *pio)
    >>    {
    >>        bool completed;
    >> @@ -1175,6 +1192,7 @@ static void ploop_data_rw_complete(struct pio *pio)
    >>                ploop_queue_resubmit(pio);
    >>                return;
    >>            }
    >> +        check_standby_mode(pio->ploop, pio->ret);
    >>            pio->bi_status = errno_to_blk_status(pio->ret);
    >>        }
    >>    @@ -1815,8 +1833,10 @@ void do_ploop_fsync_work(struct work_struct *ws)
    >>        ret = vfs_fsync(file, 0);
    >>         while ((pio = ploop_pio_list_pop(&flush_pios)) != NULL) {
    >> -        if (unlikely(ret))
    >> +        if (unlikely(ret)) {
    >> +            check_standby_mode(ploop, ret);
    >>                pio->bi_status = errno_to_blk_status(ret);
    >> +        }
    >>            ploop_pio_endio(pio);
    >>        }
    >>    }
    >> @@ -1869,6 +1889,9 @@ int ploop_clone_and_map(struct dm_target *ti,
    >> struct request *rq,
    >>        struct ploop_rq *prq;
    >>        struct pio *pio;
    >>    +    if (blk_queue_standby(ploop->queue))
    >> +        return DM_MAPIO_KILL;
    >> +
    >>        if (blk_rq_bytes(rq) && ploop_rq_valid(ploop, rq) < 0)
    >>            return DM_MAPIO_KILL;
    >>    diff --git a/drivers/md/dm-ploop-target.c b/drivers/md/dm-ploop-target.c
    >> index 4b50487b7fb5..677ecb3432f4 100644
    >> --- a/drivers/md/dm-ploop-target.c
    >> +++ b/drivers/md/dm-ploop-target.c
    >> @@ -20,6 +20,7 @@
    >>    #include <linux/uio.h>
    >>    #include <linux/error-injection.h>
    >>    #include "dm-ploop.h"
    >> +#include "dm-core.h"
    >>     #define DM_MSG_PREFIX "ploop"
    >>    @@ -363,6 +364,9 @@ static int ploop_ctr(struct dm_target *ti,
    >> unsigned int argc, char **argv)
    >>        INIT_WORK(&ploop->event_work, do_ploop_event_work);
    >>        init_completion(&ploop->inflight_bios_ref_comp);
    >>    +    ploop->queue = ti->table->md->queue;
    > 
    > This looks wrong, ploop struct alread have a ptr to ti.
    > Tables are reloaded in some cases and it is possible that table/md/queue
    > change.
    > 
    >> +    blk_queue_flag_clear(QUEUE_FLAG_STANDBY, ploop->queue);
    >> +
    >>        for (i = 0; i < 2; i++) {
    >>            release = i ? ploop_inflight_bios_ref_exit1 :
    >>                      ploop_inflight_bios_ref_exit0;
    >> diff --git a/drivers/md/dm-ploop.h b/drivers/md/dm-ploop.h
    >> index 5d953278a976..2502718c131f 100644
    >> --- a/drivers/md/dm-ploop.h
    >> +++ b/drivers/md/dm-ploop.h
    >> @@ -229,6 +229,8 @@ struct ploop {
    >>         struct timer_list enospc_timer;
    >>        bool event_enospc;
    >> +
    >> +    struct request_queue *queue;
    >>    };
    >>     struct ploop_rq {
    >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
    >> index 8fcd753c70b0..c2ebcdae3770 100644
    >> --- a/include/linux/blkdev.h
    >> +++ b/include/linux/blkdev.h
    >> @@ -434,6 +434,7 @@ struct request_queue {
    >>    #define QUEUE_FLAG_RQ_ALLOC_TIME 27    /* record rq->alloc_time_ns */
    >>    #define QUEUE_FLAG_HCTX_ACTIVE    28    /* at least one blk-mq hctx is
    >> active */
    >>    #define QUEUE_FLAG_NOWAIT       29    /* device supports NOWAIT */
    >> +#define QUEUE_FLAG_STANDBY      30    /* unable to handle read/write
    >> requests */
    >>     #define QUEUE_FLAG_MQ_DEFAULT    ((1 << QUEUE_FLAG_IO_STAT) |        \
    >>                     (1 << QUEUE_FLAG_SAME_COMP) |        \
    >> @@ -480,6 +481,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag,
    >> struct request_queue *q);
    >>    #define blk_queue_fua(q)    test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
    >>    #define blk_queue_registered(q)    test_bit(QUEUE_FLAG_REGISTERED,
    >> &(q)->queue_flags)
    >>    #define blk_queue_nowait(q)    test_bit(QUEUE_FLAG_NOWAIT,
    >> &(q)->queue_flags)
    >> +#define blk_queue_standby(q)    test_bit(QUEUE_FLAG_STANDBY,
    >> &(q)->queue_flags)
    >>     extern void blk_set_pm_only(struct request_queue *q);
    >>    extern void blk_clear_pm_only(struct request_queue *q);

    Why introduce a new queue flag?
    Why can't device mapper suspend be used ?
    Why not use the existing ploop code to defer and retry pios?

[LIU]:  The original commit does give some explanation on why  introduce the new queue flag, because we need to 
transit to another state, aka standby, once the 3 errors are returned, then the state needs to be aware by iSCSI target 
driver.  while we cannot use the bio error code to pass the event to iSCSI target driver (adding new error code is not an 
option here  because it would break other ploop users) . Since the only data struct that are shared between dm-ploop 
and iSCSI target driver is 'struct request_queue', therefore use a new flag is the most convenient way, this is my understanding. 

We can't use device mapper suspend flag because it is not aware by iSCSI target driver. 

Defer and retry pios will be unnecessary once these errors is returned in our used case.