[Devel] [PATCH RHEL9 COMMIT] drivers/vhost: add ioctl to increase the number of workers
Konstantin Khorenko
khorenko at virtuozzo.com
Thu Sep 15 20:25:50 MSK 2022
The commit is pushed to "branch-rh9-5.14.0-70.22.1.vz9.17.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-70.22.1.vz9.17.3
------>
commit 83860720697fc673ea32138b92835f40ebdbea60
Author: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
Date: Thu Sep 8 18:32:53 2022 +0300
drivers/vhost: add ioctl to increase the number of workers
Finally add ioctl to allow userspace to create additional workers
For now only allow to increase the number of workers
https://jira.sw.ru/browse/PSBM-139414
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
======
Patchset description:
vhost-blk: in-kernel accelerator for virtio-blk guests
Although QEMU virtio-blk is quite fast, there is still some room for
improvements. Disk latency can be reduced if we handle virito-blk
requests in host kernel so we avoid a lot of syscalls and context
switches.
The idea is quite simple - QEMU gives us block device and we translate
any incoming virtio requests into bio and push them into bdev.
The biggest disadvantage of this vhost-blk flavor is raw format.
Luckily Kirill Thai proposed device mapper driver for QCOW2 format to
attach files as block devices:
https://www.spinics.net/lists/kernel/msg4292965.html
Also by using kernel modules we can bypass iothread limitation and
finaly scale block requests with cpus for high-performance devices.
There have already been several attempts to write vhost-blk:
Asias' version: https://lkml.org/lkml/2012/12/1/174
Badari's version: https://lwn.net/Articles/379864/
Vitaly's version: https://lwn.net/Articles/770965/
The main difference between them is API to access backend file. The
fastest one is Asias's version with bio flavor. It is also the most
reviewed and have the most features. So vhost_blk module is partially
based on it. Multiple virtqueue support was addded, some places
reworked. Added support for several vhost workers.
test setup and results:
fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128
QEMU drive options: cache=none
filesystem: xfs
SSD:
| randread, IOPS | randwrite, IOPS |
Host | 95.8k | 85.3k |
QEMU virtio | 57.5k | 79.4k |
QEMU vhost-blk | 95.6k | 84.3k |
RAMDISK (vq == vcpu):
| randread, IOPS | randwrite, IOPS |
virtio, 1vcpu | 123k | 129k |
virtio, 2vcpu | 253k (??) | 250k (??) |
virtio, 4vcpu | 158k | 154k |
vhost-blk, 1vcpu | 110k | 113k |
vhost-blk, 2vcpu | 247k | 252k |
vhost-blk, 8vcpu | 497k | 469k | *single kernel thread
vhost-blk, 8vcpu | 730k | 701k | *two kernel threads
v2:
patch 1/10
- removed unused VHOST_BLK_VQ
- reworked bio handling a bit: now add all pages from signle iov into
single bio istead of allocating one bio per page
- changed how to calculate sector incrementation
- check move_iovec() in vhost_blk_req_handle()
- remove snprintf check and better check ret from copy_to_iter for
VIRTIO_BLK_ID_BYTES requests
- discard vq request if vhost_blk_req_handle() returned negative code
- forbid to change nonzero backend in vhost_blk_set_backend(). First of
all, QEMU sets backend only once. Also if we want to change backend when
we already running requests we need to be much more careful in
vhost_blk_handle_guest_kick() as it is not taking any references. If
userspace want to change backend that bad it can always reset device.
- removed EXPERIMENTAL from Kconfig
patch 3/10
- don't bother with checking dev->workers[0].worker since dev->nworkers
will always contain 0 in this case
patch 6/10
- Make code do what docs suggest. Previously ioctl-supplied new number
of workers were treated like an amount that should be added. Use new
number as a ceiling instead and add workers up to that number.
https://jira.sw.ru/browse/PSBM-139414
Andrey Zhadchenko (10):
drivers/vhost: vhost-blk accelerator for virtio-blk guests
drivers/vhost: use array to store workers
drivers/vhost: adjust vhost to flush all workers
drivers/vhost: rework attaching cgroups to be worker aware
drivers/vhost: rework worker creation
drivers/vhost: add ioctl to increase the number of workers
drivers/vhost: assign workers to virtqueues
drivers/vhost: add API to queue work at virtqueue worker
drivers/vhost: allow polls to be bound to workers via vqs
drivers/vhost: queue vhost_blk works at vq workers
Feature: vhost-blk: in-kernel accelerator for virtio-blk guests
---
drivers/vhost/vhost.c | 32 +++++++++++++++++++++++++++++++-
include/uapi/linux/vhost.h | 9 +++++++++
2 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 0542eab1e815..9066241e8dc6 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -656,6 +656,25 @@ static int vhost_add_worker(struct vhost_dev *dev)
return err;
}
+static int vhost_set_workers(struct vhost_dev *dev, int n)
+{
+ int i, ret;
+
+ if (n > dev->nvqs)
+ n = dev->nvqs;
+
+ if (n > VHOST_MAX_WORKERS)
+ n = VHOST_MAX_WORKERS;
+
+ for (i = 0; i < n - dev->nworkers ; i++) {
+ ret = vhost_add_worker(dev);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
/* Caller should have device mutex */
long vhost_dev_set_owner(struct vhost_dev *dev)
{
@@ -1809,7 +1828,7 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
struct eventfd_ctx *ctx;
u64 p;
long r;
- int i, fd;
+ int i, fd, n;
/* If you are not the owner, you can become one */
if (ioctl == VHOST_SET_OWNER) {
@@ -1866,6 +1885,17 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp)
if (ctx)
eventfd_ctx_put(ctx);
break;
+ case VHOST_SET_NWORKERS:
+ r = get_user(n, (int __user *)argp);
+ if (r < 0)
+ break;
+ if (n < d->nworkers) {
+ r = -EINVAL;
+ break;
+ }
+
+ r = vhost_set_workers(d, n);
+ break;
default:
r = -ENOIOCTLCMD;
break;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 13caf114bcde..d6d87f6315f6 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -71,6 +71,15 @@
#define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
#define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
+/* Set number of vhost workers
+ * Currently nuber of vhost workers can only be increased.
+ * All workers are freed upon reset.
+ * If the value is too big it is silently truncated to the maximum number of
+ * supported vhost workers
+ * Even if the error is returned it is possible that some workers were created
+ */
+#define VHOST_SET_NWORKERS _IOW(VHOST_VIRTIO, 0x1F, int)
+
/* The following ioctls use eventfd file descriptors to signal and poll
* for events. */
More information about the Devel
mailing list