[Devel] [PATCH RHEL9 COMMIT] ms/mm: migrate high-order folios in swap cache correctly
Konstantin Khorenko
khorenko at virtuozzo.com
Fri Mar 8 19:05:17 MSK 2024
The commit is pushed to "branch-rh9-5.14.0-362.8.1.vz9.35.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-362.8.1.vz9.35.14
------>
commit 15c99f6db9e9ea609fddf7b0af3e057edde56ef9
Author: Charan Teja Kalla <quic_charante at quicinc.com>
Date: Wed Mar 6 16:36:03 2024 +0800
ms/mm: migrate high-order folios in swap cache correctly
Large folios occupy N consecutive entries in the swap cache instead of
using multi-index entries like the page cache. However, if a large folio
is re-added to the LRU list, it can be migrated. The migration code was
not aware of the difference between the swap cache and the page cache and
assumed that a single xas_store() would be sufficient.
This leaves potentially many stale pointers to the now-migrated folio in
the swap cache, which can lead to almost arbitrary data corruption in the
future. This can also manifest as infinite loops with the RCU read lock
held.
[willy at infradead.org: modifications to the changelog & tweaked the fix]
mFixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
Link: https://lkml.kernel.org/r/20231214045841.961776-1-willy@infradead.org
Signed-off-by: Charan Teja Kalla <quic_charante at quicinc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy at infradead.org>
Reported-by: Charan Teja Kalla <quic_charante at quicinc.com>
Closes: https://lkml.kernel.org/r/1700569840-17327-1-git-send-email-quic_charante@quicinc.com
Cc: David Hildenbrand <david at redhat.com>
Cc: Johannes Weiner <hannes at cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov at linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi at ah.jp.nec.com>
Cc: Shakeel Butt <shakeelb at google.com>
Cc: <stable at vger.kernel.org>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
We have a check in do_swap_page that page from lookup_swap_cache should
have PG_swapcache bit set, but these leftover stale pointers may be
reused by new folio without PG_swapcache bit, and that leads to infinite
loop in:
+-> mmap_read_lock
+-> __get_user_pages_locked
+-> for-loop # taken once
+-> __get_user_pages
+-> retry-loop # constantly spinning
+-> faultin_page # return 0 to trigger retry
+-> handle_mm_fault
+-> __handle_mm_fault
+-> handle_pte_fault
+-> do_swap_page
+-> lookup_swap_cache # returns non-NULL
+-> if (swapcache)
+-> if (!folio_test_swapcache || page_private(page) != entry.val)
+-> goto out_page
+-> return 0
(cherry picked from commit fc346d0a70a13d52fe1c4bc49516d83a42cd7c4c)
https://virtuozzo.atlassian.net/browse/PSBM-153264
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Feature: fix ms/mm
---
mm/migrate.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index d36d945cf716..d950f42c0708 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -387,6 +387,7 @@ int folio_migrate_mapping(struct address_space *mapping,
int dirty;
int expected_count = folio_expected_refs(mapping, folio) + extra_count;
long nr = folio_nr_pages(folio);
+ long entries, i;
if (!mapping) {
/* Anonymous page without mapping */
@@ -424,8 +425,10 @@ int folio_migrate_mapping(struct address_space *mapping,
folio_set_swapcache(newfolio);
newfolio->private = folio_get_private(folio);
}
+ entries = nr;
} else {
VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
+ entries = 1;
}
/* Move dirty while page refs frozen and newpage not yet exposed */
@@ -435,7 +438,11 @@ int folio_migrate_mapping(struct address_space *mapping,
folio_set_dirty(newfolio);
}
- xas_store(&xas, newfolio);
+ /* Swap cache still stores N entries instead of a high-order entry */
+ for (i = 0; i < entries; i++) {
+ xas_store(&xas, newfolio);
+ xas_next(&xas);
+ }
/*
* Drop cache reference from old page by unfreezing
More information about the Devel
mailing list