[Devel] [PATCH RHEL9 COMMIT] ms/iov_iter: Add a function to extract a page list from an iterator

Konstantin Khorenko khorenko at virtuozzo.com
Thu Jan 9 17:58:36 MSK 2025


The commit is pushed to "branch-rh9-5.14.0-427.44.1.vz9.80.x-ovz" and will appear at git at bitbucket.org:openvz/vzkernel.git
after rh9-5.14.0-427.44.1.vz9.80.3
------>
commit 61e320116173ca8941116e0ec16bbb46fb8b4eb9
Author: David Howells <dhowells at redhat.com>
Date:   Mon Dec 30 13:39:50 2024 +0800

    ms/iov_iter: Add a function to extract a page list from an iterator
    
    Add a function, iov_iter_extract_pages(), to extract a list of pages from
    an iterator.  The pages may be returned with a pin added or nothing,
    depending on the type of iterator.
    
    Add a second function, iov_iter_extract_will_pin(), to determine how the
    cleanup should be done.
    
    There are two cases:
    
     (1) ITER_IOVEC or ITER_UBUF iterator.
    
         Extracted pages will have pins (FOLL_PIN) obtained on them so that a
         concurrent fork() will forcibly copy the page so that DMA is done
         to/from the parent's buffer and is unavailable to/unaffected by the
         child process.
    
         iov_iter_extract_will_pin() will return true for this case.  The
         caller should use something like unpin_user_page() to dispose of the
         page.
    
     (2) Any other sort of iterator.
    
         No refs or pins are obtained on the page, the assumption is made that
         the caller will manage page retention.
    
         iov_iter_extract_will_pin() will return false.  The pages don't need
         additional disposal.
    
    Signed-off-by: David Howells <dhowells at redhat.com>
    Reviewed-by: Christoph Hellwig <hch at lst.de>
    Reviewed-by: Jens Axboe <axboe at kernel.dk>
    cc: Al Viro <viro at zeniv.linux.org.uk>
    cc: John Hubbard <jhubbard at nvidia.com>
    cc: David Hildenbrand <david at redhat.com>
    cc: Matthew Wilcox <willy at infradead.org>
    cc: linux-fsdevel at vger.kernel.org
    cc: linux-mm at kvack.org
    Signed-off-by: Steve French <stfrench at microsoft.com>
    
    +++++
    iov_iter: Fix iov_iter_extract_pages() with zero-sized entries
    
    iov_iter_extract_pages() doesn't correctly handle skipping over initial
    zero-length entries in ITER_KVEC and ITER_BVEC-type iterators.
    
    The problem is that it accidentally reduces maxsize to 0 when it
    skipping and thus runs to the end of the array and returns 0.
    
    Fix this by sticking the calculated size-to-copy in a new variable
    rather than back in maxsize.
    
    Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator")
    Signed-off-by: David Howells <dhowells at redhat.com>
    Reviewed-by: Christoph Hellwig <hch at lst.de>
    Cc: Christian Brauner <brauner at kernel.org>
    Cc: Jens Axboe <axboe at kernel.dk>
    Cc: Al Viro <viro at zeniv.linux.org.uk>
    Cc: David Hildenbrand <david at redhat.com>
    Cc: John Hubbard <jhubbard at nvidia.com>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    
    We want bio_iov_iter_get_pages() to accept kvec iterators. There are
    already some mainstream patches. Backport only the part with
    iov_iter_extract_kvec_pages() and it's fixes.
    
    https://virtuozzo.atlassian.net/browse/PSBM-157752
    (cherry picked from commit 7d58fe731028128f3a7e20b9c492be48aae133ee)
    (cherry picked from commit f741bd7178c95abd7aeac5a9d933ee542f9a5509)
    Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    ======
    Patchset description:
    vhost-blk: bounce buffer for unaligned requests
    
    Andrey Zhadchenko (2):
      vhost-blk: rework iov and bio handling
      vhost-blk: add bounce-buffer for non-aligned requests
    
    David Howells (1):
      iov_iter: Add a function to extract a page list from an iterator
    
    Pavel Tikhomirov (1):
      vhost-blk: remove excess vhost_blk_req.use_inline
    
    Feature: vhost-blk: in-kernel accelerator for virtio-blk guests
---
 lib/iov_iter.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5df03dedfc01..de88467a4ea7 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1432,6 +1432,11 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
 	return page;
 }
 
+static ssize_t iov_iter_extract_kvec_pages(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   unsigned int maxpages,
+					   size_t *offset0);
+
 static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   unsigned int maxpages, size_t *start)
@@ -1493,6 +1498,8 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		return pipe_get_pages(i, pages, maxsize, maxpages, start);
 	if (iov_iter_is_xarray(i))
 		return iter_xarray_get_pages(i, pages, maxsize, maxpages, start);
+	if (iov_iter_is_kvec(i))
+		return iov_iter_extract_kvec_pages(i, pages, maxsize, maxpages, start);
 	return -EFAULT;
 }
 
@@ -1907,3 +1914,58 @@ void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
 		i->__iov -= state->nr_segs - i->nr_segs;
 	i->nr_segs = state->nr_segs;
 }
+
+/*
+ * Extract a list of virtually contiguous pages from an ITER_KVEC iterator.
+ * This does not get references on the pages, nor does it get a pin on them.
+ */
+static ssize_t iov_iter_extract_kvec_pages(struct iov_iter *i,
+					   struct page ***pages, size_t maxsize,
+					   unsigned int maxpages,
+					   size_t *offset0)
+{
+	struct page **p, *page;
+	const void *kaddr;
+	size_t skip = i->iov_offset, offset, len, size;
+	int k;
+
+	for (;;) {
+		if (i->nr_segs == 0)
+			return 0;
+		size = min(maxsize, i->kvec->iov_len - skip);
+		if (size)
+			break;
+		i->iov_offset = 0;
+		i->nr_segs--;
+		i->kvec++;
+		skip = 0;
+	}
+
+	kaddr = i->kvec->iov_base + skip;
+	offset = (unsigned long)kaddr & ~PAGE_MASK;
+	*offset0 = offset;
+
+	maxpages = want_pages_array(pages, size, offset, maxpages);
+	if (!maxpages)
+		return -ENOMEM;
+	p = *pages;
+
+	kaddr -= offset;
+	len = offset + size;
+	for (k = 0; k < maxpages; k++) {
+		size_t seg = min_t(size_t, len, PAGE_SIZE);
+
+		if (is_vmalloc_or_module_addr(kaddr))
+			page = vmalloc_to_page(kaddr);
+		else
+			page = virt_to_page(kaddr);
+
+		p[k] = page;
+		len -= seg;
+		kaddr += PAGE_SIZE;
+	}
+
+	size = min_t(size_t, size, maxpages * PAGE_SIZE - offset);
+	iov_iter_advance(i, size);
+	return size;
+}


More information about the Devel mailing list