[Devel] [PATCH vz7] mm/tcache: fix PCP list corruption from attach/detach race
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Tue Mar 17 12:52:26 MSK 2026
On 3/16/26 22:40, Konstantin Khorenko wrote:
> tcache_attach_page() inserts a page into the per-node radix tree under
> tree_lock, then releases the lock and calls tcache_lru_add(). Between
> releasing tree_lock and completing tcache_lru_add(), the page is visible
> in the radix tree but not yet on the tcache LRU.
>
> During this window a concurrent tcache_detach_page() on another CPU can:
> 1. Find the page via radix_tree_lookup (RCU)
> 2. page_cache_get_speculative(page): refcount 1 -> 2
> 3. page_ref_freeze(page, 2): refcount 2 -> 0
> 4. Remove the page from the radix tree
> 5. tcache_lru_del(): page not on LRU yet, skipped
> 6. tcache_put_page() -> free_hot_cold_page(): page freed to PCP list
>
> Now page->lru links into a PCP free list. When the original CPU then
> executes tcache_lru_add() -> list_add_tail(&page->lru, &pni->lru), it
> overwrites page->lru destroying the PCP list linkage. Subsequent PCP
> allocations follow the stale pointer and hit a poisoned or cross-linked
> lru, causing "list_del corruption" warnings and eventually a hard lockup
> when free_pcppages_bulk() holds zone->lock forever.
>
> Fix by taking an extra page reference before releasing tree_lock. This
> makes page_ref_freeze(page, 2) fail on any concurrent detach (refcount
> will be 3, not the expected 2), forcing the detach to retry after the
> page is fully set up (in tree AND on LRU). The extra reference is
> dropped after tcache_lru_add() completes.
>
> Note: moving tcache_lru_add() inside the tree_lock critical section would
> cause a lock ordering inversion (tree_lock -> pni->lock vs the shrinker's
> pni->lock -> tree_lock path), so the extra-reference approach is used.
>
> https: //virtuozzo.atlassian.net/browse/PSBM-161840
nit: ^ excess space
>
Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
> ---
> mm/tcache.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/mm/tcache.c b/mm/tcache.c
> index e8ba7ee26cbef..f95b5ed6cb0bc 100644
> --- a/mm/tcache.c
> +++ b/mm/tcache.c
> @@ -810,9 +810,26 @@ tcache_attach_page(struct tcache_node *node, pgoff_t index, struct page *page)
> */
> spin_lock_irqsave(&node->tree_lock, flags);
> err = tcache_page_tree_insert(node, index, page);
> + if (!err) {
> + /*
> + * Take an extra reference while the page is visible in
> + * the tree but not yet on the LRU. Without this,
> + * a concurrent tcache_detach_page() on another CPU can
> + * find the page via radix_tree_lookup, succeed with
> + * page_ref_freeze(page, 2) and free the page to PCP.
> + * When we then call tcache_lru_add() below, we overwrite
> + * page->lru which now links into a PCP free list,
> + * corrupting that list. The extra reference makes the
> + * freeze fail (refcount will be 3, not 2), so the
> + * concurrent detach retries after we finish setup.
> + */
> + get_page(page);
> + }
> spin_unlock(&node->tree_lock);
> - if (!err)
> + if (!err) {
> tcache_lru_add(node->pool, page);
> + put_page(page);
> + }
> local_irq_restore(flags); /* Implies rcu_read_lock_sched() */
> return err;
> }
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
More information about the Devel
mailing list