[Devel] [PATCH RHEL9 COMMIT] fs/fuse kio: Do not use sendpage on pages which cannot be sendpaged
Konstantin Khorenko
khorenko at virtuozzo.com
Wed Apr 19 18:46:10 MSK 2023
The commit is pushed to "branch-rh9-5.14.0-162.18.1.vz9.19.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-162.18.1.vz9.19.5
------>
commit cd8916e4a421dfe7ab9a59cb25221a84d4a21b67
Author: Liu Kui <Kui.Liu at acronis.com>
Date: Wed Apr 19 18:49:03 2023 +0800
fs/fuse kio: Do not use sendpage on pages which cannot be sendpaged
It looks like the new API iov_iter_get_pages() is not safe for use
when trying to get a page from a bvec to be sent by kernel_sendpage().
So just revert back our own implentation where we check the page
before making get_page(), if the page can't be sendpaged, fall back
to copy mode.
Affects: #PSBM-146846
https://jira.vzint.dev/browse/PSBM-146846
Signed-off-by: Liu Kui <Kui.Liu at acronis.com>
Acked-by: Alexey Kuznetsov <kuznet at acronis.com>
Feature: vStorage
Related notes by Kui Liu and Alexey Kuznetsov:
- we use the same approach as in iscsi iscsi_tcp_segment_map() which
makes the same operation in the same context.
- a comment in iscsi_tcp_segment_map() explains why slab mem cannot be
sent to tcp: network send can coalesce neighbour slabs to single page
fragment which "triggers one of hardened usercopy checks" whatever it is.
- it is not clear why sendpage_ok() is used here in the part of
checking page_count(page) >= 1. We do not understand why it is
possible that 0-ref page would end up here, but if that is possible,
it will be handled correctly with this patch.
The check page_count(page) >= 1 wants to tell that get_page() on page
with 0 refcnt is illegal. But fuse already did get_page() at top
level, it uses iov_iter_get_pages() in fuse_get_user_pages(), so
that:
1. If it is not allowed it is already broken by the time the page
arrives to us.
2. It is impossible to have page_count == 0 in kernel_sendpage, it is
grabbed by top level.
- it is possible current kernels are safe wrt get/put_page, but there
is some other reason to prohibit page_count(page) == 0 only for
networking while other components could be happy and would fail when
we fail iov_iter_get_pages().
- in vz7 we added iov_iter_get_pages() infrastructure ourselves and we
did it in safe way.
We checked page before doing any manipulations on it right in core of
iov_iter_get_pages() and terminated iov_iter_get_pages() if the page
looked dubious.
We cannot make this now in vz9 because this function is used in many
places over kernel and not all the places are sensitive to origin of
pages but all of them definitely will crash if iov_iter_get_pages()
returns error.
- only single page bvec is currently supported in do_send_one_seg() and
page array is not used in bvec NOW, we can't guarantee it won't
change in the future, and formally we should assert
it->iov_offset + i->bvec->bv_offset < PAGE_SIZE
but this iterator is our internal thing, we created it ourselves and
use it as single paged => we can omit the assertion.
---
fs/fuse/kio/pcs/pcs_sock_io.c | 26 ++++++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/fs/fuse/kio/pcs/pcs_sock_io.c b/fs/fuse/kio/pcs/pcs_sock_io.c
index 552b82ab398b..710940eafa34 100644
--- a/fs/fuse/kio/pcs/pcs_sock_io.c
+++ b/fs/fuse/kio/pcs/pcs_sock_io.c
@@ -143,16 +143,30 @@ static int do_send_one_seg(struct socket *sock, struct iov_iter *it, size_t left
}
if (iov_iter_is_bvec(it)) {
- /* Zerocopy */
size_t offset;
ssize_t len;
struct page *page;
- len = iov_iter_get_pages(it, &page, size, 1, &offset);
- BUG_ON(len <= 0);
-
- ret = kernel_sendpage(sock, page, offset, len, flags);
- put_page(page);
+ /* Only support single page bvec here */
+ page = it->bvec->bv_page;
+ offset = it->bvec->bv_offset + it->iov_offset;
+ len = min(size, it->bvec->bv_len - it->iov_offset);
+
+ if (sendpage_ok(page)) {
+ /* Zero copy */
+ get_page(page);
+ ret = kernel_sendpage(sock, page, offset, len, flags);
+ put_page(page);
+ } else {
+ /* Fall back to copy mode */
+ struct msghdr msg = { .msg_flags = flags };
+ struct kvec kv;
+
+ kv.iov_base = kmap(page) + offset;
+ kv.iov_len = len;
+ ret = kernel_sendmsg(sock, &msg, &kv, 1, size);
+ kunmap(page);
+ }
}
out:
More information about the Devel
mailing list