[Devel] [PATCH vz9] sched: Do not set LBF_NEED_BREAK flag if scanned all the tasks

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Fri Dec 15 14:19:37 MSK 2023



On 14/12/2023 06:01, Konstantin Khorenko wrote:
> After ms commit b0defa7ae03e ("sched/fair: Make sure to try to detach at
> least one movable task") detach_tasks() does not stop on the condition
> (env->loop > env->loop_max) in case no movable task found.
> 
> Instead of that (if there are no movable tasks in the rq) exits always
> happen on the loop_break check - thus with LBF_NEED_BREAK flag set.
> 
> It's not a problem for mainstream because load_balance() proceeds with
> balancing in case LBF_NEED_BREAK is set only in case (env.loop <
> busiest->nr_running), but in Virtuozzo kernel with CFS_CPULIMIT feature
> right before that we try to move a whole task group (object of the
> CFS_CPULIMIT feature) and resets env.loop to zero,
> so we get a livelock here.
> 
> Resetting env.loop makes sense in case we have successfully moved some
> tasks (in the scope of the task group), but if we failed to move any
> task, no progress is expected during further balancing attempts.
> 
> Ways to fix it:
>   1. In load_balance() restore old env.loop in case no tasks were moved
>      by move_task_groups()
>   2. Add a check in detach_tasks() to exit without LBF_NEED_BREAK flag in
>      case we have scanned all tasks and have not found movable tasks.
> 
> Current patch implements the second way.
> 
> Caused by ms commit: b0defa7ae03e ("sched/fair: Make sure to try to
> detach at least one movable task")
> 
> Fixes ms commit: bca010328248 ("sched: Port CONFIG_CFS_CPULIMIT feature")

this is not "ms" commit

> 
> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
> 
> Feature: sched: ability to limit number of CPUs available to a CT
> ---
>   kernel/sched/fair.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9367d16a8d85..e068bb90f197 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8771,6 +8771,12 @@ static int detach_tasks(struct lb_env *env)
>   		if (env->loop > env->loop_max &&
>   		    !(env->flags & LBF_ALL_PINNED))
>   			break;
> +		/*
> +		 * Quit if we have scanned all tasks even in case we haven't
> +		 * found any movable task.
> +		 */
> +		if (env->loop > env->src_rq->nr_running)
> +			break;

The env->loop is clearly designed in load_balance function to be reused 
on next iteration of "goto more_balance". The fact that we clear it to 
zero in our code is really strange... Maybe we can have separate 
env.vzloop variable for our part of the code (or we can properly 
integrate with env.loop without zeroing it?)?

Do we also need same crazy LBF_ALL_PINNED change to move_task_groups's 
"if (env->loop > env->loop_max) break;" check? Similar to what 
b0defa7ae03ec ("sched/fair: Make sure to try to detach at least one 
movable task") does in detach_tasks to allow it to handle the case then 
lru tail is pinned?

>   
>   		/* take a breather every nr_migrate tasks */
>   		if (env->loop > env->loop_break) {

-- 
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.


More information about the Devel mailing list