[Devel] [PATCH vz10 3/4] vhost-blk: re-validate vq index in vhost_blk_setup() (double-fetch)
Vladimir Riabchun
vladimir.riabchun at virtuozzo.com
Mon Jun 8 12:18:50 MSK 2026
On 6/5/26 19:49, Konstantin Khorenko wrote:
> vhost_blk_setup() uses the virtqueue index supplied by userspace as an
> array index into blk->vqs[] without bounds checking:
>
> if (copy_from_user(&s, argp, sizeof(s)))
> return -EFAULT;
> if (blk->vqs[s.index].req) /* s.index unchecked */
> return 0;
> blk->vqs[s.index].req = kvmalloc(...);
>
> blk->vqs[] is a fixed array of VHOST_BLK_VQ_MAX (32) inline
> struct vhost_blk_vq elements embedded in struct vhost_blk. sizeof()
> of each element is large (it contains iov[UIO_MAXIOV]), so an
> out-of-range s.index points many megabytes outside the allocation:
> reading blk->vqs[s.index].req typically oopses, and if that address
> happens to be mapped and reads as NULL, the following assignment
> performs a wild kernel write -> memory corruption.
>
> At first glance s.index looks safe because vhost_blk_setup() is only
> reached from the VHOST_SET_VRING_NUM path:
>
> ret = vhost_vring_ioctl(&blk->dev, ioctl, argp);
> if (!ret && ioctl == VHOST_SET_VRING_NUM)
> ret = vhost_blk_setup(blk, argp);
>
> and vhost_vring_ioctl() -> vhost_get_vq_from_user() already validates
> the index (idx >= dev->nvqs -> -ENOBUFS) and even applies
> array_index_nospec().
>
> The catch is that this is a classic double-fetch (TOCTOU, CWE-367).
> argp is a pointer into the *caller's* address space. The index is read
> from it twice:
>
> 1. vhost_get_vq_from_user() does get_user(idx, idxp) and validates it;
> 2. vhost_blk_setup() does a *second* copy_from_user() of the same
> struct and uses s.index directly.
>
> blk->dev.mutex is held across both reads, but that mutex only serializes
> other in-kernel ioctls on this fd. It does not stop a *different thread
> of the same process* (which shares the address space) from overwriting
> the struct between the two reads with a plain userspace store - that
> thread never enters the kernel at all:
>
> Thread T1: Thread T2 (same process):
> state->index = 0;
> ioctl(fd, VHOST_SET_VRING_NUM, state)
> get_user(index) == 0 -> OK
> state->index = 0x41414141;
> copy_from_user(&s, argp)
> s.index == 0x41414141
> blk->vqs[0x41414141] <- out of bounds
>
> So the value validated in step 1 is not necessarily the value used in
> step 2.
>
> Practical exposure is limited: the trigger requires a process that
> already owns the vhost-blk fd (a host-side VMM, not the guest) and that
> wins a narrow race window, so this is local hardening on the
> kernel/userspace boundary rather than a guest-triggerable escape. It is
> still memory-unsafe and must be fixed.
>
> Re-validate s.index against VHOST_BLK_VQ_MAX after the copy_from_user(),
> right where it is used. This is the canonical double-fetch fix: never
> trust a previously validated copy of userspace data, validate the value
> you actually use. s.num needs no such treatment - a raced value only
> feeds the kvmalloc() size (a bogus value just fails with -ENOMEM), while
> the real ring size (vq->num) was set from vhost_vring_ioctl()'s own
> validated copy.
>
> Fixes: 40a5928ec730 ("drivers/vhost: vhost-blk accelerator for virtio-blk guests")
>
> Feature: vhost-blk: in-kernel accelerator for virtio-blk guests
> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
> ---
> drivers/vhost/blk.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
> index 61de56cd73a0a..f73867b45f7a0 100644
> --- a/drivers/vhost/blk.c
> +++ b/drivers/vhost/blk.c
> @@ -847,6 +847,15 @@ static int vhost_blk_setup(struct vhost_blk *blk, void __user *argp)
> if (copy_from_user(&s, argp, sizeof(s)))
> return -EFAULT;
>
> + /*
> + * s.index was validated by vhost_vring_ioctl() before we got here,
> + * but that was a separate read of the same userspace memory. This is
> + * a fresh copy_from_user(), so re-validate it before using it as an
> + * array index (see the commit message for the double-fetch race).
> + */
> + if (s.index >= VHOST_BLK_VQ_MAX)
> + return -ENOBUFS;
> +
Since s.index is practically always < VHOST_BLK_VO_MAX, branch predictor
will be very confident here. Maybe it's worth adding array_index_nospec
here as well to protect from updated value?
> if (blk->vqs[s.index].req)
> return 0;
>
--
Best regards, Riabchun Vladimir
Linux Kernel Developer, Virtuozzo
More information about the Devel
mailing list