[Devel] [PATCH RHEL8 COMMIT] userfaultfd: wp: hook userfault handler to write protection fault

Konstantin Khorenko khorenko at virtuozzo.com
Mon Apr 20 10:34:29 MSK 2020


The commit is pushed to "branch-rh8-4.18.0-80.1.2.vz8.3.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-80.1.2.vz8.3.6
------>
commit 3e4e926a3403f9d1de17ddc1208d15c5b9f80fa2
Author: Andrea Arcangeli <aarcange at redhat.com>
Date:   Mon Apr 20 10:34:29 2020 +0300

    userfaultfd: wp: hook userfault handler to write protection fault
    
    There are several cases write protection fault happens.  It could be a
    write to zero page, swaped page or userfault write protected page.  When
    the fault happens, there is no way to know if userfault write protect the
    page before.  Here we just blindly issue a userfault notification for vma
    with VM_UFFD_WP regardless if app write protects it yet.  Application
    should be ready to handle such wp fault.
    
    In the swapin case, always swapin as readonly.  This will cause false
    positive userfaults.  We need to decide later if to eliminate them with a
    flag like soft-dirty in the swap entry (see _PAGE_SWP_SOFT_DIRTY).
    
    hugetlbfs wouldn't need to worry about swapouts but and tmpfs would be
    handled by a swap entry bit like anonymous memory.
    
    The main problem with no easy solution to eliminate the false positives,
    will be if/when userfaultfd is extended to real filesystem pagecache.
    When the pagecache is freed by reclaim we can't leave the radix tree
    pinned if the inode and in turn the radix tree is reclaimed as well.
    
    The estimation is that full accuracy and lack of false positives could be
    easily provided only to anonymous memory (as long as there's no fork or as
    long as MADV_DONTFORK is used on the userfaultfd anonymous range) tmpfs
    and hugetlbfs, it's most certainly worth to achieve it but in a later
    incremental patch.
    
    [peterx at redhat.com: don't conditionally drop FAULT_FLAG_WRITE in do_swap_page]
    Signed-off-by: Andrea Arcangeli <aarcange at redhat.com>
    Signed-off-by: Peter Xu <peterx at redhat.com>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    Reviewed-by: Mike Rapoport <rppt at linux.vnet.ibm.com>
    Reviewed-by: Jerome Glisse <jglisse at redhat.com>
    Cc: Shaohua Li <shli at fb.com>
    Cc: Bobby Powers <bobbypowers at gmail.com>
    Cc: Brian Geffon <bgeffon at google.com>
    Cc: David Hildenbrand <david at redhat.com>
    Cc: Denis Plotnikov <dplotnikov at virtuozzo.com>
    Cc: "Dr . David Alan Gilbert" <dgilbert at redhat.com>
    Cc: Hugh Dickins <hughd at google.com>
    Cc: Johannes Weiner <hannes at cmpxchg.org>
    Cc: "Kirill A . Shutemov" <kirill at shutemov.name>
    Cc: Martin Cracauer <cracauer at cons.org>
    Cc: Marty McFadden <mcfadden8 at llnl.gov>
    Cc: Maya Gokhale <gokhale2 at llnl.gov>
    Cc: Mel Gorman <mgorman at suse.de>
    Cc: Mike Kravetz <mike.kravetz at oracle.com>
    Cc: Pavel Emelyanov <xemul at openvz.org>
    Cc: Rik van Riel <riel at redhat.com>
    Link: http://lkml.kernel.org/r/20200220163112.11409-3-peterx@redhat.com
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    
    https://jira.sw.ru/browse/PSBM-102938
    (cherry picked from commit 529b930b87d997c3d231c9d8638a0bf8db569d70)
    Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
---
 mm/memory.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 855d3109edd1..b261dec43bf4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2740,6 +2740,11 @@ static int do_wp_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 
+	if (userfaultfd_wp(vma)) {
+		pte_unmap_unlock(vmf->pte, vmf->ptl);
+		return handle_userfault(vmf, VM_UFFD_WP);
+	}
+
 	vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
 	if (!vmf->page) {
 		/*
@@ -3892,8 +3897,12 @@ static inline int create_huge_pmd(struct vm_fault *vmf)
 /* `inline' is required to avoid gcc 4.1.2 build error */
 static inline int wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd)
 {
-	if (vma_is_anonymous(vmf->vma))
+	if (vma_is_anonymous(vmf->vma)) {
+		if (userfaultfd_wp(vmf->vma))
+			return handle_userfault(vmf, VM_UFFD_WP);
 		return do_huge_pmd_wp_page(vmf, orig_pmd);
+	}
+
 	if (vmf->vma->vm_ops->huge_fault)
 		return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD);
 


More information about the Devel mailing list