[Devel] [PATCH vz7] mm/tcache: fix PCP list corruption from attach/detach race

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Tue Mar 17 12:52:26 MSK 2026


On 3/16/26 22:40, Konstantin Khorenko wrote:
> tcache_attach_page() inserts a page into the per-node radix tree under
> tree_lock, then releases the lock and calls tcache_lru_add().  Between
> releasing tree_lock and completing tcache_lru_add(), the page is visible
> in the radix tree but not yet on the tcache LRU.
> 
> During this window a concurrent tcache_detach_page() on another CPU can:
>   1. Find the page via radix_tree_lookup (RCU)
>   2. page_cache_get_speculative(page): refcount 1 -> 2
>   3. page_ref_freeze(page, 2): refcount 2 -> 0
>   4. Remove the page from the radix tree
>   5. tcache_lru_del(): page not on LRU yet, skipped
>   6. tcache_put_page() -> free_hot_cold_page(): page freed to PCP list
> 
> Now page->lru links into a PCP free list.  When the original CPU then
> executes tcache_lru_add() -> list_add_tail(&page->lru, &pni->lru), it
> overwrites page->lru destroying the PCP list linkage.  Subsequent PCP
> allocations follow the stale pointer and hit a poisoned or cross-linked
> lru, causing "list_del corruption" warnings and eventually a hard lockup
> when free_pcppages_bulk() holds zone->lock forever.
> 
> Fix by taking an extra page reference before releasing tree_lock.  This
> makes page_ref_freeze(page, 2) fail on any concurrent detach (refcount
> will be 3, not the expected 2), forcing the detach to retry after the
> page is fully set up (in tree AND on LRU).  The extra reference is
> dropped after tcache_lru_add() completes.
> 
> Note: moving tcache_lru_add() inside the tree_lock critical section would
> cause a lock ordering inversion (tree_lock -> pni->lock vs the shrinker's
> pni->lock -> tree_lock path), so the extra-reference approach is used.
> 
> https: //virtuozzo.atlassian.net/browse/PSBM-161840

nit:    ^ excess space

> 

Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
> ---
>  mm/tcache.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/tcache.c b/mm/tcache.c
> index e8ba7ee26cbef..f95b5ed6cb0bc 100644
> --- a/mm/tcache.c
> +++ b/mm/tcache.c
> @@ -810,9 +810,26 @@ tcache_attach_page(struct tcache_node *node, pgoff_t index, struct page *page)
>  	 */
>  	spin_lock_irqsave(&node->tree_lock, flags);
>  	err = tcache_page_tree_insert(node, index, page);
> +	if (!err) {
> +		/*
> +		 * Take an extra reference while the page is visible in
> +		 * the tree but not yet on the LRU.  Without this,
> +		 * a concurrent tcache_detach_page() on another CPU can
> +		 * find the page via radix_tree_lookup, succeed with
> +		 * page_ref_freeze(page, 2) and free the page to PCP.
> +		 * When we then call tcache_lru_add() below, we overwrite
> +		 * page->lru which now links into a PCP free list,
> +		 * corrupting that list.  The extra reference makes the
> +		 * freeze fail (refcount will be 3, not 2), so the
> +		 * concurrent detach retries after we finish setup.
> +		 */
> +		get_page(page);
> +	}
>  	spin_unlock(&node->tree_lock);
> -	if (!err)
> +	if (!err) {
>  		tcache_lru_add(node->pool, page);
> +		put_page(page);
> +	}
>  	local_irq_restore(flags); /* Implies rcu_read_lock_sched() */
>  	return err;
>  }

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list