[Devel] [PATCH v3 vz9/vz10] dm-ploop: fallback to kvmalloc for large bvec allocations

Wed Oct 22 20:20:04 MSK 2025

Exactly. Do you understand better why I was so agitated about such
stupid thing? :-)

IMHO the only way to avoid disaster is do to use kvmalloc with
anything different of pure GFP_KERNEL/GFP_NOIO.

And when playing with gfp flags never try to outsmart youself. Linux
mm is a _mess_. The only way to make some
"smart trick" working and not just today is to copypaste from some
common widely used place, like net/core/skbuff.c

On Thu, Oct 23, 2025 at 12:56 AM Andrey Zhadchenko
<andrey.zhadchenko at virtuozzo.com> wrote:
>
>
>
> On 10/22/25 17:20, Vasileios Almpanis wrote:
> > When handling multiple concurrent dm-ploop requests, large bio_vec arrays
> > can be allocated during request processing. These allocations are currently
> > done with kmalloc_array(GFP_ATOMIC), which can fail under memory pressure
> > for higher orders (order >= 6, ~256KB). Such failures result in partial or
> > corrupted I/O, leading to EXT4 directory checksum errors and read-only
> > remounts under heavy parallel workloads.
> >
> > This patch adds a fallback mechanism to use kvmalloc_array for
> > large or failed allocations in case the kmalloc_array allocation fails.
> > This avoids high-order GFP_ATOMIC allocations from interrupt context
> > and ensures more reliable memory allocation behavior.
> >
> > Signed-off-by: Vasileios Almpanis <vasileios.almpanis at virtuozzo.com>
> > Acked-by: Denis V. Lunev <den at openvz.org>
> >
> > Feature: dm-ploop: ploop target driver
> > ---
> >   drivers/md/dm-ploop-map.c | 27 ++++++++++++++++++---------
> >   1 file changed, 18 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
> > index 3fb841f8bcea..8e40497b837c 100644
> > --- a/drivers/md/dm-ploop-map.c
> > +++ b/drivers/md/dm-ploop-map.c
> > @@ -194,7 +194,7 @@ static void ploop_prq_endio(struct pio *pio, void *prq_ptr,
> >       struct request *rq = prq->rq;
> >
> >       if (prq->bvec)
> > -             kfree(prq->bvec);
> > +             kvfree(prq->bvec);
> >       if (prq->css)
> >               css_put(prq->css);
> >       /*
> > @@ -1963,7 +1963,7 @@ void ploop_index_wb_submit(struct ploop *ploop, struct ploop_index_wb *piwb)
> >       ploop_runners_add_work(ploop, pio);
> >   }
> >
> > -static struct bio_vec *ploop_create_bvec_from_rq(struct request *rq)
> > +static struct bio_vec *ploop_create_bvec_from_rq(struct request *rq, gfp_t flags)
> >   {
> >       struct bio_vec bv, *bvec, *tmp;
> >       struct req_iterator rq_iter;
> > @@ -1972,8 +1972,7 @@ static struct bio_vec *ploop_create_bvec_from_rq(struct request *rq)
> >       rq_for_each_bvec(bv, rq, rq_iter)
> >               nr_bvec++;
> >
> > -     bvec = kmalloc_array(nr_bvec, sizeof(struct bio_vec),
> > -                          GFP_ATOMIC);
> > +     bvec = kvmalloc_array(nr_bvec, sizeof(struct bio_vec), flags);
>
> I am not sure that will happen here. Imagine kvmalloc(GFP_ATOMIC) will
> firstly try kmalloc and fail. What's next? There are checks for
> gfpflags_allow_blocking(flags), but this is __GFP_DIRECT_RECLAIM and
> GFP_ATOMIC is (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM).
> In general GFP_ATOMIC is not allowed for kvmalloc but looks like the
> check is somewhere deep and I can't find it.
> I just hope we don't explode like in VSTOR-98291
>
> >       if (!bvec)
> >               goto out;
> >
> > @@ -1989,7 +1988,8 @@ ALLOW_ERROR_INJECTION(ploop_create_bvec_from_rq, NULL);
> >
> >   static void ploop_prepare_one_embedded_pio(struct ploop *ploop,
> >                                          struct pio *pio,
> > -                                        struct llist_head *lldeferred_pios)
> > +                                        struct llist_head *lldeferred_pios,
> > +                                        gfp_t flags)
> >   {
> >       struct ploop_rq *prq = pio->endio_cb_data;
> >       struct request *rq = prq->rq;
> > @@ -2003,9 +2003,18 @@ static void ploop_prepare_one_embedded_pio(struct ploop *ploop,
> >                * Transform a set of bvec arrays related to bios
> >                * into a single bvec array (which we can iterate).
> >                */
> > -             bvec = ploop_create_bvec_from_rq(rq);
> > -             if (!bvec)
> > +             bvec = ploop_create_bvec_from_rq(rq, flags);
> > +             if (!bvec) {
> > +                     /*
> > +                      * If allocation in atomic context fails defer
> > +                      * it to blocking context.
> > +                      */
> > +                     if (!gfpflags_allow_blocking(flags)) {
>
> See the previous comment about GFP_ATOMIC and __GFP_DIRECT_RECLAIM
> Maybe use ((flags & GFP_ATOMIC) == GFP_ATOMIC) or smth like this.
>
> > +                             llist_add((struct llist_node *)(&pio->list), &ploop->pios[PLOOP_LIST_PREPARE]);
> > +                             return;
> > +                     }
> >                       goto err_nomem;
> > +             }
> >               prq->bvec = bvec;
> >   skip_bvec:
> >               pio->bi_iter.bi_size = blk_rq_bytes(rq);
> > @@ -2044,7 +2053,7 @@ static void ploop_prepare_embedded_pios(struct ploop *ploop,
> >               pio = list_entry((struct list_head *)pos, typeof(*pio), list);
> >               INIT_LIST_HEAD(&pio->list); /* until type is changed */
> >               if (pio->queue_list_id != PLOOP_LIST_FLUSH)
> > -                     ploop_prepare_one_embedded_pio(ploop, pio, deferred_pios);
> > +                     ploop_prepare_one_embedded_pio(ploop, pio, deferred_pios, GFP_NOIO);
> >               else
> >                       llist_add((struct llist_node *)(&pio->list),
> >                                 &ploop->pios[PLOOP_LIST_FLUSH]);
> > @@ -2615,7 +2624,7 @@ static void ploop_submit_embedded_pio(struct ploop *ploop, struct pio *pio)
> >               goto out;
> >       }
> >
> > -     ploop_prepare_one_embedded_pio(ploop, pio, &deferred_pios);
> > +     ploop_prepare_one_embedded_pio(ploop, pio, &deferred_pios, GFP_ATOMIC | __GFP_NOWARN);
> >       /*
> >        * Disable fast path due to rcu lockups fs -> ploop -> fs - fses are not reentrant
> >        * we can however try another fast path skip dispatcher thread and pass directly to
>