[Devel] [PATCH vz10 3/4] vhost-blk: re-validate vq index in vhost_blk_setup() (double-fetch)

Fri Jun 5 20:49:08 MSK 2026

vhost_blk_setup() uses the virtqueue index supplied by userspace as an
array index into blk->vqs[] without bounds checking:

	if (copy_from_user(&s, argp, sizeof(s)))
		return -EFAULT;
	if (blk->vqs[s.index].req)		/* s.index unchecked */
		return 0;
	blk->vqs[s.index].req = kvmalloc(...);

blk->vqs[] is a fixed array of VHOST_BLK_VQ_MAX (32) inline
struct vhost_blk_vq elements embedded in struct vhost_blk.  sizeof()
of each element is large (it contains iov[UIO_MAXIOV]), so an
out-of-range s.index points many megabytes outside the allocation:
reading blk->vqs[s.index].req typically oopses, and if that address
happens to be mapped and reads as NULL, the following assignment
performs a wild kernel write -> memory corruption.

At first glance s.index looks safe because vhost_blk_setup() is only
reached from the VHOST_SET_VRING_NUM path:

	ret = vhost_vring_ioctl(&blk->dev, ioctl, argp);
	if (!ret && ioctl == VHOST_SET_VRING_NUM)
		ret = vhost_blk_setup(blk, argp);

and vhost_vring_ioctl() -> vhost_get_vq_from_user() already validates
the index (idx >= dev->nvqs -> -ENOBUFS) and even applies
array_index_nospec().

The catch is that this is a classic double-fetch (TOCTOU, CWE-367).
argp is a pointer into the *caller's* address space.  The index is read
from it twice:

  1. vhost_get_vq_from_user() does get_user(idx, idxp) and validates it;
  2. vhost_blk_setup() does a *second* copy_from_user() of the same
     struct and uses s.index directly.

blk->dev.mutex is held across both reads, but that mutex only serializes
other in-kernel ioctls on this fd.  It does not stop a *different thread
of the same process* (which shares the address space) from overwriting
the struct between the two reads with a plain userspace store - that
thread never enters the kernel at all:

	Thread T1:                          Thread T2 (same process):
	  state->index = 0;
	  ioctl(fd, VHOST_SET_VRING_NUM, state)
	    get_user(index) == 0 -> OK
	                                      state->index = 0x41414141;
	    copy_from_user(&s, argp)
	    s.index == 0x41414141
	    blk->vqs[0x41414141]      <- out of bounds

So the value validated in step 1 is not necessarily the value used in
step 2.

Practical exposure is limited: the trigger requires a process that
already owns the vhost-blk fd (a host-side VMM, not the guest) and that
wins a narrow race window, so this is local hardening on the
kernel/userspace boundary rather than a guest-triggerable escape.  It is
still memory-unsafe and must be fixed.

Re-validate s.index against VHOST_BLK_VQ_MAX after the copy_from_user(),
right where it is used.  This is the canonical double-fetch fix: never
trust a previously validated copy of userspace data, validate the value
you actually use.  s.num needs no such treatment - a raced value only
feeds the kvmalloc() size (a bogus value just fails with -ENOMEM), while
the real ring size (vq->num) was set from vhost_vring_ioctl()'s own
validated copy.

Fixes: 40a5928ec730 ("drivers/vhost: vhost-blk accelerator for virtio-blk guests")

Feature: vhost-blk: in-kernel accelerator for virtio-blk guests
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
---
 drivers/vhost/blk.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
index 61de56cd73a0a..f73867b45f7a0 100644
--- a/drivers/vhost/blk.c
+++ b/drivers/vhost/blk.c
@@ -847,6 +847,15 @@ static int vhost_blk_setup(struct vhost_blk *blk, void __user *argp)
 	if (copy_from_user(&s, argp, sizeof(s)))
 		return -EFAULT;
 
+	/*
+	 * s.index was validated by vhost_vring_ioctl() before we got here,
+	 * but that was a separate read of the same userspace memory.  This is
+	 * a fresh copy_from_user(), so re-validate it before using it as an
+	 * array index (see the commit message for the double-fetch race).
+	 */
+	if (s.index >= VHOST_BLK_VQ_MAX)
+		return -ENOBUFS;
+
 	if (blk->vqs[s.index].req)
 		return 0;
 
-- 
2.43.0