[Devel] [PATCH v2 vz8] kernel/sched/fair.c: Add missing update_rq_clock() calls

Kirill Tkhai ktkhai at virtuozzo.com
Tue Sep 29 11:24:38 MSK 2020


On 28.09.2020 15:03, Andrey Ryabinin wrote:
> We've got a hard lockup which seems to be caused by mgag200
> console printk code calling to schedule_work from scheduler
> with rq->lock held:
>   #5 [ffffb79e034239a8] native_queued_spin_lock_slowpath at ffffffff8b50c6c6
>   #6 [ffffb79e034239a8] _raw_spin_lock at ffffffff8bc96e5c
>   #7 [ffffb79e034239b0] try_to_wake_up at ffffffff8b4e26ff
>   #8 [ffffb79e03423a10] __queue_work at ffffffff8b4ce3f3
>   #9 [ffffb79e03423a58] queue_work_on at ffffffff8b4ce714
>  #10 [ffffb79e03423a68] mga_imageblit at ffffffffc026d666 [mgag200]
>  #11 [ffffb79e03423a80] soft_cursor at ffffffff8b8a9d84
>  #12 [ffffb79e03423ad8] bit_cursor at ffffffff8b8a99b2
>  #13 [ffffb79e03423ba0] hide_cursor at ffffffff8b93bc7a
>  #14 [ffffb79e03423bb0] vt_console_print at ffffffff8b93e07d
>  #15 [ffffb79e03423c18] console_unlock at ffffffff8b518f0e
>  #16 [ffffb79e03423c68] vprintk_emit_log at ffffffff8b51acf7
>  #17 [ffffb79e03423cc0] vprintk_default at ffffffff8b51adcd
>  #18 [ffffb79e03423cd0] printk at ffffffff8b51b3d6
>  #19 [ffffb79e03423d30] __warn_printk at ffffffff8b4b13a0
>  #20 [ffffb79e03423d98] assert_clock_updated at ffffffff8b4dd293
>  #21 [ffffb79e03423da0] deactivate_task at ffffffff8b4e12d1
>  #22 [ffffb79e03423dc8] move_task_group at ffffffff8b4eaa5b
>  #23 [ffffb79e03423e00] cpulimit_balance_cpu_stop at ffffffff8b4f02f3
>  #24 [ffffb79e03423eb0] cpu_stopper_thread at ffffffff8b576b67
>  #25 [ffffb79e03423ee8] smpboot_thread_fn at ffffffff8b4d9125
>  #26 [ffffb79e03423f10] kthread at ffffffff8b4d4fc2
>  #27 [ffffb79e03423f50] ret_from_fork at ffffffff8be00255
> 
> The printk called because assert_clock_updated() triggered
> 	SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP);
> 
> This means that we missing necessary update_rq_clock() call.
> Add one to cpulimit_balance_cpu_stop() to fix the warning.
> Also add one in load_balance() before move_task_groups() call.
> It seems to be another place missing this call.
> 
> https://jira.sw.ru/browse/PSBM-108013
> Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
> ---
>  kernel/sched/fair.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5d3556b15e70..e6dc21d5fa03 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data)
>  
>  		schedstat_inc(sd->clb_count);
>  
> +		update_rq_clock(rq);

Shouldn't we also add the same for target_rq to avoid WARN() coming from attach_task()?

>  		if (do_cpulimit_balance(&env))
>  			schedstat_inc(sd->clb_pushed);
>  		else
> @@ -9176,6 +9177,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  			env.loop = 0;
>  			local_irq_save(rf.flags);
>  			double_rq_lock(env.dst_rq, busiest);
> +			update_rq_clock(env.dst_rq);
>  			cur_ld_moved = ld_moved = move_task_groups(&env);
>  			double_rq_unlock(env.dst_rq, busiest);
>  			local_irq_restore(rf.flags);
> 



More information about the Devel mailing list