[Devel] [PATCH RHEL9 COMMIT] oracle/mm: avoid early cow when copying ptes for MADV_DOEXEC
Konstantin Khorenko
khorenko at virtuozzo.com
Thu Jan 23 23:35:47 MSK 2025
The commit is pushed to "branch-rh9-5.14.0-427.44.1.vz9.80.x-ovz" and will appear at git at bitbucket.org:openvz/vzkernel.git
after rh9-5.14.0-427.44.1.vz9.80.5
------>
commit a8cc8c6ac35ccc81b8c16425596cac77df058dee
Author: Anthony Yznaga <anthony.yznaga at oracle.com>
Date: Thu Jan 26 15:41:44 2023 -0800
oracle/mm: avoid early cow when copying ptes for MADV_DOEXEC
When a VMA preserved via MADV_DOEXEC is copied to the new mm during
exec, copy_page_range() is called to copy the pagetable entries.
Commit 70e806e4 ("mm: Do early cow for pinned pages during fork()
for ptes") changed how pinned pages encountered by copy_page_range()
are handled. A copy of the page is made immediately rather than
write-protecting it for later COW. This breaks MADV_DOEXEC when the
memory to preserve is pinned (e.g. the guest memory of a VFIO-enabled
guest. Ensure that this page copying will not be done when copying
pagetable entries for preservation by adding a check for VM_EXEC_KEEP.
Orabug: 35054621
Signed-off-by: Anthony Yznaga <anthony.yznaga at oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett at oracle.com>
https://virtuozzo.atlassian.net/browse/VSTOR-96305
Porting notes:
RedHat has applied
rh commit: d8f21270d397 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()")
ms commit: fb3d824d1a46 ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()")
and
rh commit: 85f85f728ec6 ("mm/memory: slightly simplify copy_present_pte()")
ms commit: b51ad4f8679e ("mm/memory: slightly simplify copy_present_pte()")
So the check from the copy_present_page() has been moved to
copy_present_pte().
(cherry picked from Oracle commit a904d4d4c24126a64b6d8aa0658425f4964ce674)
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
Feature: oracle/mm: MADV_DOEXEC madvise() flag
---
mm/memory.c | 6 +++++-
mm/mmap.c | 10 +++++++---
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 88b1aead060f..ebd08a1f2c9a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -915,9 +915,12 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
unsigned long vm_flags = src_vma->vm_flags;
pte_t pte = *src_pte;
struct page *page;
+ bool is_exec_keep;
page = vm_normal_page(src_vma, addr, pte);
if (page && PageAnon(page)) {
+ is_exec_keep = dst_vma->vm_flags & VM_EXEC_KEEP ? true : false;
+
/*
* If this page may have been pinned by the parent process,
* copy the page immediately for the child so that we'll always
@@ -925,7 +928,8 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
* future.
*/
get_page(page);
- if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) {
+ if (unlikely(page_try_dup_anon_rmap(page, false, src_vma)) &&
+ !is_exec_keep) {
/* Page maybe pinned, we have to copy. */
put_page(page);
return copy_present_page(dst_vma, src_vma, dst_pte, src_pte,
diff --git a/mm/mmap.c b/mm/mmap.c
index f87d284bd17b..9bb2382d9101 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3300,10 +3300,11 @@ int vma_dup(struct vm_area_struct *old_vma, struct mm_struct *mm)
/*
* Clear functionality that should not carry over to the new
- * process.any memory locking, userfaultfd, and preservation over
- * exec flags.
+ * process. Note that VM_EXEC_KEEP is cleared later to allow
+ * code called by copy_page_range to infer that the copying is
+ * for preserving over exec and not for process forking.
*/
- vma->vm_flags &= ~(VM_LOCKED|VM_LOCKONFAULT|VM_UFFD_MISSING|VM_UFFD_WP|VM_EXEC_KEEP);
+ vma->vm_flags &= ~(VM_LOCKED|VM_LOCKONFAULT|VM_UFFD_MISSING|VM_UFFD_WP);
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
__insert_vm_struct(mm, vma);
@@ -3318,6 +3319,9 @@ int vma_dup(struct vm_area_struct *old_vma, struct mm_struct *mm)
old_vma->vm_flags &= ~VM_ACCOUNT;
ret = copy_page_range(vma, old_vma);
+
+ vma->vm_flags &= ~VM_EXEC_KEEP;
+
return ret;
fail_nomem_anon_vma_fork:
More information about the Devel
mailing list