[Devel] [PATCH vz10] cgroup-v2/freezer: avoid sleeping allocations under RCU in freeze-timeout warning

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Mon Jun 15 17:14:39 MSK 2026


Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

On 6/4/26 11:44, Konstantin Khorenko wrote:
> warn_freeze_timeout() holds guard(rcu)() for its whole body (required by
> css_for_each_descendant_post()) and, inside that, both
> warn_freeze_timeout_task() and the "no unfreezable process" tail do
> kmalloc(PATH_MAX / stack-trace buffer, GFP_KERNEL). GFP_KERNEL may sleep,
> which is illegal inside an RCU read-side critical section ("sleeping
> function called from invalid context", caught by
> CONFIG_DEBUG_ATOMIC_SLEEP). The path is reachable from
> cgroup.events show -> check_freeze_timeout() -> warn_freeze_timeout() when
> a freeze exceeds sysctl_freeze_timeout and passes the ratelimit.
> 
> The non-allocating work here (cgroup_path(), stack_trace_save_tsk()) is
> atomic-safe; only the allocations sleep. Use GFP_ATOMIC for the three
> buffers - this is a rare, ratelimited diagnostic path with small (<=PATH_MAX)
> allocations, so the stricter allocation context is acceptable.

Looks good. Alternatively, if we would need it in future, we can take a reference
on the task and release the rcu lock before calling warn_freeze_timeout_task().

> 
> Fixes: b55c88db815e ("cgroup-v2/freezer: Print information about unfreezable process")
> https://virtuozzo.atlassian.net/browse/VSTOR-132310
> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
> ---
>  kernel/cgroup/cgroup.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index e37df4ccb51e..14e93647e999 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -4250,7 +4250,13 @@ void warn_freeze_timeout_task(struct cgroup *cgrp, int timeout,
>  	unsigned long nr_entries, i;
>  	pid_t tgid;
>  
> -	buf = kmalloc(PATH_MAX, GFP_KERNEL);
> +	/*
> +	 * Called from warn_freeze_timeout() under rcu_read_lock() (the
> +	 * css_for_each_descendant_post() iteration), so allocations here must
> +	 * not sleep.  This is a rare, ratelimited diagnostic path with small
> +	 * buffers, so GFP_ATOMIC is fine.
> +	 */
> +	buf = kmalloc(PATH_MAX, GFP_ATOMIC);
>  	if (!buf)
>  		return;
>  
> @@ -4263,7 +4269,7 @@ void warn_freeze_timeout_task(struct cgroup *cgrp, int timeout,
>  	       buf, timeout/HZ, tgid, task->comm);
>  
>  	entries = kmalloc(MAX_STACK_TRACE_DEPTH * sizeof(*entries),
> -			  GFP_KERNEL);
> +			  GFP_ATOMIC);
>  	if (!entries)
>  		return;
>  	nr_entries = stack_trace_save_tsk(task, entries,
> @@ -4297,7 +4303,8 @@ static void warn_freeze_timeout(struct cgroup *cgrp, int timeout)
>  		css_task_iter_end(&it);
>  	}
>  
> -	buf = kmalloc(PATH_MAX, GFP_KERNEL);
> +	/* still under rcu_read_lock() from guard(rcu)() above */
> +	buf = kmalloc(PATH_MAX, GFP_ATOMIC);
>  	if (!buf)
>  		return;
>  

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list