[Devel] [PATCH RHEL9 COMMIT] drivers/vhost: rework worker creation

Konstantin Khorenko khorenko at virtuozzo.com
Thu Sep 15 20:25:49 MSK 2022


The commit is pushed to "branch-rh9-5.14.0-70.22.1.vz9.17.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-70.22.1.vz9.17.3
------>
commit c4284dad02c76eb6e28a6776700258b92e529d61
Author: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
Date:   Thu Sep 8 18:32:52 2022 +0300

    drivers/vhost: rework worker creation
    
    Add function to create a vhost worker and add it into the device.
    Rework vhost_dev_set_owner
    
    https://jira.sw.ru/browse/PSBM-139414
    Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
    
    ======
    Patchset description:
    vhost-blk: in-kernel accelerator for virtio-blk guests
    
    Although QEMU virtio-blk is quite fast, there is still some room for
    improvements. Disk latency can be reduced if we handle virito-blk
    requests in host kernel so we avoid a lot of syscalls and context
    switches.
    The idea is quite simple - QEMU gives us block device and we translate
    any incoming virtio requests into bio and push them into bdev.
    The biggest disadvantage of this vhost-blk flavor is raw format.
    
    Luckily Kirill Thai proposed device mapper driver for QCOW2 format to
    attach files as block devices:
    https://www.spinics.net/lists/kernel/msg4292965.html
    
    Also by using kernel modules we can bypass iothread limitation and
    finaly scale block requests with cpus for high-performance devices.
    
    There have already been several attempts to write vhost-blk:
    
    Asias'   version:       https://lkml.org/lkml/2012/12/1/174
    Badari's version:       https://lwn.net/Articles/379864/
    Vitaly's version:       https://lwn.net/Articles/770965/
    
    The main difference between them is API to access backend file. The
    fastest one is Asias's version with bio flavor. It is also the most
    reviewed and have the most features. So vhost_blk module is partially
    based on it. Multiple virtqueue support was addded, some places
    reworked. Added support for several vhost workers.
    
    test setup and results:
      fio --direct=1 --rw=randread  --bs=4k  --ioengine=libaio --iodepth=128
    QEMU drive options: cache=none
    filesystem: xfs
    
    SSD:
                   | randread, IOPS  | randwrite, IOPS |
    Host           |      95.8k      |      85.3k      |
    QEMU virtio    |      57.5k      |      79.4k      |
    QEMU vhost-blk |      95.6k      |      84.3k      |
    
    RAMDISK (vq == vcpu):
                     | randread, IOPS | randwrite, IOPS |
    virtio, 1vcpu    |      123k      |      129k       |
    virtio, 2vcpu    |      253k (??) |      250k (??)  |
    virtio, 4vcpu    |      158k      |      154k       |
    vhost-blk, 1vcpu |      110k      |      113k       |
    vhost-blk, 2vcpu |      247k      |      252k       |
    vhost-blk, 8vcpu |      497k      |      469k       | *single kernel thread
    vhost-blk, 8vcpu |      730k      |      701k       | *two kernel threads
    
    v2:
    
    patch 1/10
     - removed unused VHOST_BLK_VQ
     - reworked bio handling a bit: now add all pages from signle iov into
    single bio istead of allocating one bio per page
     - changed how to calculate sector incrementation
     - check move_iovec() in vhost_blk_req_handle()
     - remove snprintf check and better check ret from copy_to_iter for
    VIRTIO_BLK_ID_BYTES requests
     - discard vq request if vhost_blk_req_handle() returned negative code
     - forbid to change nonzero backend in vhost_blk_set_backend(). First of
    all, QEMU sets backend only once. Also if we want to change backend when
    we already running requests we need to be much more careful in
    vhost_blk_handle_guest_kick() as it is not taking any references. If
    userspace want to change backend that bad it can always reset device.
     - removed EXPERIMENTAL from Kconfig
    
    patch 3/10
     - don't bother with checking dev->workers[0].worker since dev->nworkers
    will always contain 0 in this case
    
    patch 6/10
     - Make code do what docs suggest. Previously ioctl-supplied new number
    of workers were treated like an amount that should be added. Use new
    number as a ceiling instead and add workers up to that number.
    
    https://jira.sw.ru/browse/PSBM-139414
    
    Andrey Zhadchenko (10):
      drivers/vhost: vhost-blk accelerator for virtio-blk guests
      drivers/vhost: use array to store workers
      drivers/vhost: adjust vhost to flush all workers
      drivers/vhost: rework attaching cgroups to be worker aware
      drivers/vhost: rework worker creation
      drivers/vhost: add ioctl to increase the number of workers
      drivers/vhost: assign workers to virtqueues
      drivers/vhost: add API to queue work at virtqueue worker
      drivers/vhost: allow polls to be bound to workers via vqs
      drivers/vhost: queue vhost_blk works at vq workers
    
    Feature: vhost-blk: in-kernel accelerator for virtio-blk guests
---
 drivers/vhost/vhost.c | 64 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5199495d948a..0542eab1e815 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -625,53 +625,65 @@ static void vhost_detach_mm(struct vhost_dev *dev)
 	dev->mm = NULL;
 }
 
+static int vhost_add_worker(struct vhost_dev *dev)
+{
+	struct vhost_worker *w = &dev->workers[dev->nworkers];
+	struct task_struct *worker;
+	int err;
+
+	if (dev->nworkers == VHOST_MAX_WORKERS)
+		return -E2BIG;
+
+	worker = kthread_create(vhost_worker, w,
+				"vhost-%d-%d", current->pid, dev->nworkers);
+	if (IS_ERR(worker))
+		return PTR_ERR(worker);
+
+	w->worker = worker;
+	wake_up_process(worker); /* avoid contributing to loadavg */
+
+	err = vhost_worker_attach_cgroups(w);
+	if (err)
+		goto cleanup;
+
+	dev->nworkers++;
+	return 0;
+
+cleanup:
+	kthread_stop(worker);
+	w->worker = NULL;
+
+	return err;
+}
+
 /* Caller should have device mutex */
 long vhost_dev_set_owner(struct vhost_dev *dev)
 {
-	struct task_struct *worker;
 	int err;
 
 	/* Is there an owner already? */
-	if (vhost_dev_has_owner(dev)) {
-		err = -EBUSY;
-		goto err_mm;
-	}
+	if (vhost_dev_has_owner(dev))
+		return -EBUSY;
 
 	vhost_attach_mm(dev);
 
 	dev->kcov_handle = kcov_common_handle();
 	if (dev->use_worker) {
-		worker = kthread_create(vhost_worker, dev,
-					"vhost-%d", current->pid);
-		if (IS_ERR(worker)) {
-			err = PTR_ERR(worker);
-			goto err_worker;
-		}
-
-		dev->workers[0].worker = worker;
-		dev->nworkers = 1;
-		wake_up_process(worker); /* avoid contributing to loadavg */
-
-		err = vhost_worker_attach_cgroups(&dev->workers[0]);
+		err = vhost_add_worker(dev);
 		if (err)
-			goto err_cgroup;
+			goto err_mm;
 	}
 
 	err = vhost_dev_alloc_iovecs(dev);
 	if (err)
-		goto err_cgroup;
+		goto err_worker;
 
 	return 0;
-err_cgroup:
-	dev->nworkers = 0;
-	if (dev->workers[0].worker) {
-		kthread_stop(dev->workers[0].worker);
-		dev->workers[0].worker = NULL;
-	}
 err_worker:
+	vhost_cleanup_workers(dev);
+err_mm:
 	vhost_detach_mm(dev);
 	dev->kcov_handle = 0;
-err_mm:
 	return err;
 }
 EXPORT_SYMBOL_GPL(vhost_dev_set_owner);


More information about the Devel mailing list