[Devel] [PATCH VZ9 1/2] mm: migrate high-order folios in swap cache correctly

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Wed Mar 6 11:36:03 MSK 2024


From: Charan Teja Kalla <quic_charante at quicinc.com>

Large folios occupy N consecutive entries in the swap cache instead of
using multi-index entries like the page cache.  However, if a large folio
is re-added to the LRU list, it can be migrated.  The migration code was
not aware of the difference between the swap cache and the page cache and
assumed that a single xas_store() would be sufficient.

This leaves potentially many stale pointers to the now-migrated folio in
the swap cache, which can lead to almost arbitrary data corruption in the
future.  This can also manifest as infinite loops with the RCU read lock
held.

[willy at infradead.org: modifications to the changelog & tweaked the fix]
Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
Link: https://lkml.kernel.org/r/20231214045841.961776-1-willy@infradead.org
Signed-off-by: Charan Teja Kalla <quic_charante at quicinc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy at infradead.org>
Reported-by: Charan Teja Kalla <quic_charante at quicinc.com>
Closes: https://lkml.kernel.org/r/1700569840-17327-1-git-send-email-quic_charante@quicinc.com
Cc: David Hildenbrand <david at redhat.com>
Cc: Johannes Weiner <hannes at cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov at linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi at ah.jp.nec.com>
Cc: Shakeel Butt <shakeelb at google.com>
Cc: <stable at vger.kernel.org>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>

We have a check in do_swap_page that page from lookup_swap_cache should
have PG_swapcache bit set, but these leftover stale pointers may be
reused by new folio without PG_swapcache bit, and that leads to infinite
loop in:

  +-> mmap_read_lock
    +-> __get_user_pages_locked
      +-> for-loop # taken once
        +-> __get_user_pages
          +-> retry-loop # constantly spinning
            +-> faultin_page # return 0 to trigger retry
              +-> handle_mm_fault
                +-> __handle_mm_fault
                  +-> handle_pte_fault
                    +-> do_swap_page
                      +-> lookup_swap_cache # returns non-NULL
                      +-> if (swapcache)
                        +-> if (!folio_test_swapcache || page_private(page) != entry.val)
                          +-> goto out_page
                            +-> return 0

(cherry picked from commit fc346d0a70a13d52fe1c4bc49516d83a42cd7c4c)
https://virtuozzo.atlassian.net/browse/PSBM-153264
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
 mm/migrate.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index d36d945cf716..d950f42c0708 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -387,6 +387,7 @@ int folio_migrate_mapping(struct address_space *mapping,
 	int dirty;
 	int expected_count = folio_expected_refs(mapping, folio) + extra_count;
 	long nr = folio_nr_pages(folio);
+	long entries, i;
 
 	if (!mapping) {
 		/* Anonymous page without mapping */
@@ -424,8 +425,10 @@ int folio_migrate_mapping(struct address_space *mapping,
 			folio_set_swapcache(newfolio);
 			newfolio->private = folio_get_private(folio);
 		}
+		entries = nr;
 	} else {
 		VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
+		entries = 1;
 	}
 
 	/* Move dirty while page refs frozen and newpage not yet exposed */
@@ -435,7 +438,11 @@ int folio_migrate_mapping(struct address_space *mapping,
 		folio_set_dirty(newfolio);
 	}
 
-	xas_store(&xas, newfolio);
+	/* Swap cache still stores N entries instead of a high-order entry */
+	for (i = 0; i < entries; i++) {
+		xas_store(&xas, newfolio);
+		xas_next(&xas);
+	}
 
 	/*
 	 * Drop cache reference from old page by unfreezing
-- 
2.43.0



More information about the Devel mailing list