[Devel] [PATCH RHEL9 COMMIT] ms/mm: migrate high-order folios in swap cache correctly

Konstantin Khorenko khorenko at virtuozzo.com
Fri Mar 8 19:05:17 MSK 2024


The commit is pushed to "branch-rh9-5.14.0-362.8.1.vz9.35.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-362.8.1.vz9.35.14
------>
commit 15c99f6db9e9ea609fddf7b0af3e057edde56ef9
Author: Charan Teja Kalla <quic_charante at quicinc.com>
Date:   Wed Mar 6 16:36:03 2024 +0800

    ms/mm: migrate high-order folios in swap cache correctly
    
    Large folios occupy N consecutive entries in the swap cache instead of
    using multi-index entries like the page cache.  However, if a large folio
    is re-added to the LRU list, it can be migrated.  The migration code was
    not aware of the difference between the swap cache and the page cache and
    assumed that a single xas_store() would be sufficient.
    
    This leaves potentially many stale pointers to the now-migrated folio in
    the swap cache, which can lead to almost arbitrary data corruption in the
    future.  This can also manifest as infinite loops with the RCU read lock
    held.
    
    [willy at infradead.org: modifications to the changelog & tweaked the fix]
    mFixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
    Link: https://lkml.kernel.org/r/20231214045841.961776-1-willy@infradead.org
    Signed-off-by: Charan Teja Kalla <quic_charante at quicinc.com>
    Signed-off-by: Matthew Wilcox (Oracle) <willy at infradead.org>
    Reported-by: Charan Teja Kalla <quic_charante at quicinc.com>
    Closes: https://lkml.kernel.org/r/1700569840-17327-1-git-send-email-quic_charante@quicinc.com
    Cc: David Hildenbrand <david at redhat.com>
    Cc: Johannes Weiner <hannes at cmpxchg.org>
    Cc: Kirill A. Shutemov <kirill.shutemov at linux.intel.com>
    Cc: Naoya Horiguchi <n-horiguchi at ah.jp.nec.com>
    Cc: Shakeel Butt <shakeelb at google.com>
    Cc: <stable at vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
    
    We have a check in do_swap_page that page from lookup_swap_cache should
    have PG_swapcache bit set, but these leftover stale pointers may be
    reused by new folio without PG_swapcache bit, and that leads to infinite
    loop in:
    
      +-> mmap_read_lock
        +-> __get_user_pages_locked
          +-> for-loop # taken once
            +-> __get_user_pages
              +-> retry-loop # constantly spinning
                +-> faultin_page # return 0 to trigger retry
                  +-> handle_mm_fault
                    +-> __handle_mm_fault
                      +-> handle_pte_fault
                        +-> do_swap_page
                          +-> lookup_swap_cache # returns non-NULL
                          +-> if (swapcache)
                            +-> if (!folio_test_swapcache || page_private(page) != entry.val)
                              +-> goto out_page
                                +-> return 0
    
    (cherry picked from commit fc346d0a70a13d52fe1c4bc49516d83a42cd7c4c)
    https://virtuozzo.atlassian.net/browse/PSBM-153264
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    Feature: fix ms/mm
---
 mm/migrate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index d36d945cf716..d950f42c0708 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -387,6 +387,7 @@ int folio_migrate_mapping(struct address_space *mapping,
 	int dirty;
 	int expected_count = folio_expected_refs(mapping, folio) + extra_count;
 	long nr = folio_nr_pages(folio);
+	long entries, i;
 
 	if (!mapping) {
 		/* Anonymous page without mapping */
@@ -424,8 +425,10 @@ int folio_migrate_mapping(struct address_space *mapping,
 			folio_set_swapcache(newfolio);
 			newfolio->private = folio_get_private(folio);
 		}
+		entries = nr;
 	} else {
 		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
+		entries = 1;
 	}
 
 	/* Move dirty while page refs frozen and newpage not yet exposed */
@@ -435,7 +438,11 @@ int folio_migrate_mapping(struct address_space *mapping,
 		folio_set_dirty(newfolio);
 	}
 
-	xas_store(&xas, newfolio);
+	/* Swap cache still stores N entries instead of a high-order entry */
+	for (i = 0; i < entries; i++) {
+		xas_store(&xas, newfolio);
+		xas_next(&xas);
+	}
 
 	/*
 	 * Drop cache reference from old page by unfreezing


More information about the Devel mailing list