[Devel] [PATCH RHEL7 COMMIT] mm/tcache: fix PCP list corruption from attach/detach race

Konstantin Khorenko khorenko at virtuozzo.com
Thu Apr 9 17:12:11 MSK 2026


The commit is pushed to "branch-rh7-3.10.0-1160.129.1.vz7.226.x-ovz" and will appear at git at bitbucket.org:openvz/vzkernel.git
after rh7-3.10.0-1160.129.1.vz7.226.2
------>
commit 424c3066c0e81212335e6ea669b9eb88ded5e8fa
Author: Konstantin Khorenko <khorenko at virtuozzo.com>
Date:   Mon Mar 16 22:40:38 2026 +0100

    mm/tcache: fix PCP list corruption from attach/detach race
    
    tcache_attach_page() inserts a page into the per-node radix tree under
    tree_lock, then releases the lock and calls tcache_lru_add().  Between
    releasing tree_lock and completing tcache_lru_add(), the page is visible
    in the radix tree but not yet on the tcache LRU.
    
    During this window a concurrent tcache_detach_page() on another CPU can:
      1. Find the page via radix_tree_lookup (RCU)
      2. page_cache_get_speculative(page): refcount 1 -> 2
      3. page_ref_freeze(page, 2): refcount 2 -> 0
      4. Remove the page from the radix tree
      5. tcache_lru_del(): page not on LRU yet, skipped
      6. tcache_put_page() -> free_hot_cold_page(): page freed to PCP list
    
    Now page->lru links into a PCP free list.  When the original CPU then
    executes tcache_lru_add() -> list_add_tail(&page->lru, &pni->lru), it
    overwrites page->lru destroying the PCP list linkage.  Subsequent PCP
    allocations follow the stale pointer and hit a poisoned or cross-linked
    lru, causing "list_del corruption" warnings and eventually a hard lockup
    when free_pcppages_bulk() holds zone->lock forever.
    
    Fix by taking an extra page reference before releasing tree_lock.  This
    makes page_ref_freeze(page, 2) fail on any concurrent detach (refcount
    will be 3, not the expected 2), forcing the detach to retry after the
    page is fully set up (in tree AND on LRU).  The extra reference is
    dropped after tcache_lru_add() completes.
    
    Note: moving tcache_lru_add() inside the tree_lock critical section would
    cause a lock ordering inversion (tree_lock -> pni->lock vs the shrinker's
    pni->lock -> tree_lock path), so the extra-reference approach is used.
    
    https://virtuozzo.atlassian.net/browse/PSBM-161840
    
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
 mm/tcache.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/mm/tcache.c b/mm/tcache.c
index e8ba7ee26cbef..f95b5ed6cb0bc 100644
--- a/mm/tcache.c
+++ b/mm/tcache.c
@@ -810,9 +810,26 @@ tcache_attach_page(struct tcache_node *node, pgoff_t index, struct page *page)
 	 */
 	spin_lock_irqsave(&node->tree_lock, flags);
 	err = tcache_page_tree_insert(node, index, page);
+	if (!err) {
+		/*
+		 * Take an extra reference while the page is visible in
+		 * the tree but not yet on the LRU.  Without this,
+		 * a concurrent tcache_detach_page() on another CPU can
+		 * find the page via radix_tree_lookup, succeed with
+		 * page_ref_freeze(page, 2) and free the page to PCP.
+		 * When we then call tcache_lru_add() below, we overwrite
+		 * page->lru which now links into a PCP free list,
+		 * corrupting that list.  The extra reference makes the
+		 * freeze fail (refcount will be 3, not 2), so the
+		 * concurrent detach retries after we finish setup.
+		 */
+		get_page(page);
+	}
 	spin_unlock(&node->tree_lock);
-	if (!err)
+	if (!err) {
 		tcache_lru_add(node->pool, page);
+		put_page(page);
+	}
 	local_irq_restore(flags); /* Implies rcu_read_lock_sched() */
 	return err;
 }


More information about the Devel mailing list