[Devel] [PATCH RHEL9 COMMIT] fs/fuse kio: Do not use sendpage on pages which cannot be sendpaged

Konstantin Khorenko khorenko at virtuozzo.com
Wed Apr 19 18:46:10 MSK 2023


The commit is pushed to "branch-rh9-5.14.0-162.18.1.vz9.19.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-162.18.1.vz9.19.5
------>
commit cd8916e4a421dfe7ab9a59cb25221a84d4a21b67
Author: Liu Kui <Kui.Liu at acronis.com>
Date:   Wed Apr 19 18:49:03 2023 +0800

    fs/fuse kio: Do not use sendpage on pages which cannot be sendpaged
    
    It looks like the new API iov_iter_get_pages() is not safe for use
    when trying to get a page from a bvec to be sent by kernel_sendpage().
    
    So just revert back our own implentation where we check the page
    before making get_page(), if the page can't be sendpaged, fall back
    to copy mode.
    
    Affects: #PSBM-146846
    https://jira.vzint.dev/browse/PSBM-146846
    
    Signed-off-by: Liu Kui <Kui.Liu at acronis.com>
    Acked-by: Alexey Kuznetsov <kuznet at acronis.com>
    
    Feature: vStorage
    
    Related notes by Kui Liu and Alexey Kuznetsov:
     - we use the same approach as in iscsi iscsi_tcp_segment_map() which
       makes the same operation in the same context.
    
     - a comment in iscsi_tcp_segment_map() explains why slab mem cannot be
       sent to tcp: network send can coalesce neighbour slabs to single page
       fragment which "triggers one of hardened usercopy checks" whatever it is.
    
     - it is not clear why sendpage_ok() is used here in the part of
       checking page_count(page) >= 1. We do not understand why it is
       possible that 0-ref page would end up here, but if that is possible,
       it will be handled correctly with this patch.
    
       The check page_count(page) >= 1 wants to tell that get_page() on page
       with 0 refcnt is illegal. But fuse already did get_page() at top
       level, it uses iov_iter_get_pages() in fuse_get_user_pages(), so
       that:
    
       1. If it is not allowed it is already broken by the time the page
          arrives to us.
       2. It is impossible to have page_count == 0 in kernel_sendpage, it is
          grabbed by top level.
    
     - it is possible current kernels are safe wrt get/put_page, but there
       is some other reason to prohibit page_count(page) == 0 only for
       networking while other components could be happy and would fail when
       we fail iov_iter_get_pages().
    
     - in vz7 we added iov_iter_get_pages() infrastructure ourselves and we
       did it in safe way.
       We checked page before doing any manipulations on it right in core of
       iov_iter_get_pages() and terminated iov_iter_get_pages() if the page
       looked dubious.
       We cannot make this now in vz9 because this function is used in many
       places over kernel and not all the places are sensitive to origin of
       pages but all of them definitely will crash if iov_iter_get_pages()
       returns error.
    
     - only single page bvec is currently supported in do_send_one_seg() and
       page array is not used in bvec NOW, we can't guarantee it won't
       change in the future, and formally we should assert
         it->iov_offset + i->bvec->bv_offset < PAGE_SIZE
       but this iterator is our internal thing, we created it ourselves and
       use it as single paged => we can omit the assertion.
---
 fs/fuse/kio/pcs/pcs_sock_io.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/fuse/kio/pcs/pcs_sock_io.c b/fs/fuse/kio/pcs/pcs_sock_io.c
index 552b82ab398b..710940eafa34 100644
--- a/fs/fuse/kio/pcs/pcs_sock_io.c
+++ b/fs/fuse/kio/pcs/pcs_sock_io.c
@@ -143,16 +143,30 @@ static int do_send_one_seg(struct socket *sock, struct iov_iter *it, size_t left
 	}
 
 	if (iov_iter_is_bvec(it)) {
-		/* Zerocopy */
 		size_t offset;
 		ssize_t len;
 		struct page *page;
 
-		len = iov_iter_get_pages(it, &page, size, 1, &offset);
-		BUG_ON(len <= 0);
-
-		ret = kernel_sendpage(sock, page, offset, len, flags);
-		put_page(page);
+		/* Only support single page bvec here */
+		page = it->bvec->bv_page;
+		offset = it->bvec->bv_offset + it->iov_offset;
+		len = min(size, it->bvec->bv_len - it->iov_offset);
+
+		if (sendpage_ok(page)) {
+			/* Zero copy */
+			get_page(page);
+			ret = kernel_sendpage(sock, page, offset, len, flags);
+			put_page(page);
+		} else {
+			/* Fall back to copy mode */
+			struct msghdr msg = { .msg_flags = flags };
+			struct kvec kv;
+
+			kv.iov_base = kmap(page) + offset;
+			kv.iov_len = len;
+			ret = kernel_sendmsg(sock, &msg, &kv, 1, size);
+			kunmap(page);
+		}
 	}
 
 out:


More information about the Devel mailing list