[Devel] [PATCH RH9 v5 00/10] vhost-blk: in-kernel accelerator for virtio-blk guests
Andrey Zhadchenko
andrey.zhadchenko at virtuozzo.com
Fri Nov 11 12:55:46 MSK 2022
Although QEMU virtio-blk is quite fast, there is still some room for
improvements. Disk latency can be reduced if we handle virito-blk requests
in host kernel so we avoid a lot of syscalls and context switches.
The idea is quite simple - QEMU gives us block device and we translate
any incoming virtio requests into bio and push them into bdev.
The biggest disadvantage of this vhost-blk flavor is raw format.
Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach
files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html
Also by using kernel modules we can bypass iothread limitation and finaly scale
block requests with cpus for high-performance devices.
There have already been several attempts to write vhost-blk:
Asias' version: https://lkml.org/lkml/2012/12/1/174
Badari's version: https://lwn.net/Articles/379864/
Vitaly's https://lwn.net/Articles/770965/
The main difference between them is API to access backend file. The fastest
one is Asias's version with bio flavor. It is also the most reviewed and
have the most features. So vhost_blk module is partially based on it. Multiple
virtqueue support was addded, some places reworked. Added support for several
vhost workers.
test setup and results:
fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128
QEMU drive options: cache=none
filesystem: xfs
SSD:
| randread, IOPS | randwrite, IOPS |
Host | 95.8k | 85.3k |
QEMU virtio | 57.5k | 79.4k |
QEMU vhost-blk | 95.6k | 84.3k |
RAMDISK (vq == vcpu == numjobs):
| randread, IOPS | randwrite, IOPS |
virtio, 1vcpu | 133k | 133k |
virtio, 2vcpu | 305k | 306k |
virtio, 4vcpu | 310k | 298k |
virtio, 8vcpu | 271k | 252k |
vhost-blk, 1vcpu | 110k | 113k |
vhost-blk, 2vcpu | 247k | 252k |
vhost-blk, 4vcpu | 558k | 556k |
vhost-blk, 8vcpu | 576k | 575k | *single kernel thread
vhost-blk, 8vcpu | 803k | 779k | *two kernel threads
v2:
patch 1/10
- removed unused VHOST_BLK_VQ
- reworked bio handling a bit: now add all pages from signle iov into
single bio istead of allocating one bio per page
- changed how to calculate sector incrementation
- check move_iovec() in vhost_blk_req_handle()
- remove snprintf check and better check ret from copy_to_iter for
VIRTIO_BLK_ID_BYTES requests
- discard vq request if vhost_blk_req_handle() returned negative code
- forbid to change nonzero backend in vhost_blk_set_backend(). First of
all, QEMU sets backend only once. Also if we want to change backend when
we already running requests we need to be much more careful in
vhost_blk_handle_guest_kick() as it is not taking any references. If
userspace want to change backend that bad it can always reset device.
- removed EXPERIMENTAL from Kconfig
patch 3/10
- don't bother with checking dev->workers[0].worker since dev->nworkers
will always contain 0 in this case
patch 6/10
- Make code do what docs suggest. Previously ioctl-supplied new number
of workers were treated like an amount that should be added. Use new
number as a ceiling instead and add workers up to that number.
v3:
patch 1/10
- reworked bio handling a bit - now create new only if the previous is
full
patch 2/10
- set vq->worker = NULL in vhost_vq_reset()
v4:
patch 1/10
- vhost_blk_req_done() now won't hide errors for multi-bio requests
- vhost_blk_prepare_req() now better estimates bio_len
- alloc bio for max pages_nr_total pages instead of nr_pages
- added new ioctl VHOST_BLK_SET_SERIAL to set serial
- rework flush alghoritm a bit - now use two bins "new req" and
"for flush" and swap them at the start of the flush
- moved backing file dereference to vhost_blk_req_submit() and
after request was added to flush bin to avoid race in
vhost_blk_release(). Now even if we dropped backend and started
flush the request will either be tracked by flush or be rolled back
patch 2/10
- moved vq->worker = NULL to patch #7 where this field is
introduced.
patch 7/10
- Set vq->worker = NULL in vhost_vq_reset. This will fix both
https://jira.sw.ru/browse/PSBM-142058
https://jira.sw.ru/browse/PSBM-142852
v5:
patch 1/10
- several codestyle/spacing fixes
- added WARN_ON() for vhost_blk_flush
Andrey Zhadchenko (10):
drivers/vhost: vhost-blk accelerator for virtio-blk guests
drivers/vhost: use array to store workers
drivers/vhost: adjust vhost to flush all workers
drivers/vhost: rework attaching cgroups to be worker aware
drivers/vhost: rework worker creation
drivers/vhost: add ioctl to increase the number of workers
drivers/vhost: assign workers to virtqueues
drivers/vhost: add API to queue work at virtqueue worker
drivers/vhost: allow polls to be bound to workers via vqs
drivers/vhost: queue vhost_blk works at vq workers
drivers/vhost/Kconfig | 12 +
drivers/vhost/Makefile | 3 +
drivers/vhost/blk.c | 860 +++++++++++++++++++++++++++++++++++++
drivers/vhost/vhost.c | 253 ++++++++---
drivers/vhost/vhost.h | 21 +-
include/uapi/linux/vhost.h | 17 +
6 files changed, 1104 insertions(+), 62 deletions(-)
create mode 100644 drivers/vhost/blk.c
--
2.31.1
More information about the Devel
mailing list