[Devel] [PATCH vz9 14/27] sched: Count loadavg under rq::lock in calc_load_nohz_start()

Nikita Yushchenko nikita.yushchenko at virtuozzo.com
Wed Oct 6 11:57:29 MSK 2021


From: Kirill Tkhai <ktkhai at virtuozzo.com>

Since calc_load_fold_active() reads two variables (nr_running
and nr_uninterruptible) it may race with parallel try_to_wake_up().
So it must be executed under rq::lock to prevent that.
This seems to be the reason of negative calc_load_tasks, observed
on several machines.

https://jira.sw.ru/browse/PSBM-68052

Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>

(cherry-picked from vz7 commit 63138412c98b ("sched: Count loadavg under
rq::lock in calc_load_nohz_start()"))

Rebase to vz8:

- move calc_load_migrate under rq lock
- move the rq lock/unlock hunk from calc_load_enter_idle() to
  calc_load_nohz_start()
- use rq_(un)lock helpers

like in patch sent to ms https://lkml.org/lkml/2017/7/7/472

This patch did not hit mainstream for some reason but it is likely still
valid, at least I don't see any explicit fix of this problem.

https://jira.sw.ru/browse/PSBM-127780

Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

khorenko@: we don't disable/enable irq in calc_load_nohz_start() because
we assume irqs already disabled in calc_load_nohz_start().
Stack example:

cpuidle_idle_call
 tick_nohz_idle_stop_tick
  __tick_nohz_idle_stop_tick
   tick_nohz_stop_tick
    calc_load_nohz_start

Cherry-picked from vz8 commit e30538246af4 ("sched: Count loadavg under
rq::lock in calc_load_nohz_start()")).

Possible paths to reach calc_load_fold_active() where rq->nr_running and
rq->nr_uninterruptible get used:

(1) scheduler_tick() <- [code to take rq lock already existed]
  calc_global_load_tick()
    calc_load_fold_active()

(2) sched_tick_remote() <- [code to take rq lock already existed]
      calc_load_nohz_remote()
        calc_load_nohz_fold()
          calc_load_fold_active()

(3) tick_nohz_stop_tick()
      calc_load_nohz_start() <- [rq lock added here by this patch]
        calc_load_nohz_fold()
          calc_load_fold_active()

(4) sched_cpu_dying() <- [rq lock ensured here by this patch]
      calc_load_migrate()
        calc_load_fold_active()

Signed-off-by: Nikita Yushchenko <nikita.yushchenko at virtuozzo.com>
---
 kernel/sched/core.c    | 2 +-
 kernel/sched/loadavg.c | 6 +++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f1689ac77af1..dad3dbd6acca 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9035,9 +9035,9 @@ int sched_cpu_dying(unsigned int cpu)
 		WARN(true, "Dying CPU not properly vacated!");
 		dump_rq_tasks(rq, KERN_WARNING);
 	}
+	calc_load_migrate(rq);
 	rq_unlock_irqrestore(rq, &rf);
 
-	calc_load_migrate(rq);
 	update_max_interval();
 	hrtick_clear(rq);
 	sched_core_cpu_dying(cpu);
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index 0174c40c7468..74cfc755c87d 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -312,11 +312,15 @@ static void calc_load_nohz_fold(struct rq *rq)
 
 void calc_load_nohz_start(void)
 {
+	struct rq *rq = this_rq();
+	struct rq_flags rf;
 	/*
 	 * We're going into NO_HZ mode, if there's any pending delta, fold it
 	 * into the pending NO_HZ delta.
 	 */
-	calc_load_nohz_fold(this_rq());
+	rq_lock(rq, &rf);
+	calc_load_nohz_fold(rq);
+	rq_unlock(rq, &rf);
 }
 
 /*
-- 
2.30.2



More information about the Devel mailing list