[Devel] [PATCH RHEL9 COMMIT] loadavg: Fix handling of cfs_rq->nr_unint values

Konstantin Khorenko khorenko at virtuozzo.com
Mon Jan 16 20:19:01 MSK 2023


The commit is pushed to "branch-rh9-5.14.0-162.6.1.vz9.18.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-162.6.1.vz9.18.2
------>
commit 6555547a385aa5c76feddee263bd1f04e8228b89
Author: Nikolay Borisov <nikolay.borisov at virtuozzo.com>
Date:   Wed Nov 30 14:29:25 2022 +0200

    loadavg: Fix handling of cfs_rq->nr_unint values
    
    Our custom nr_unint logic follows the kernel's one about the global
    nr_uninterruptible one. One invariant that both share is that values
    of nr_unint can go negative when a task is migrated from a CPU.
    What's important is that when summed across all task groups they either result
    in a positive number (we have sleeping tasks in uninterruptible state) or 0
    (no uninterruptible processes).
    
    Current code is broken in that the math in calc_load_ve always deals
    with unsigned integers values which leads to enormous values of
    'nr_active' when ->nr_unint is negative on a tg. The proper way to work
    with nr_unint is to always cast it to an signed integer and also make
    nr_active a signed values as well to properly deal with negative values.
    
    https://jira.sw.ru/browse/PSBM-143355
    Fixes: e303dd8563b5 ("ve/sched/loadavg: Calculate avenrun for Containers root
    cpu cgroups")
    
    Signed-off-by: Nikolay Borisov <nikolay.borisov at virtuozzo.com>
    
    Feature: statistics: loadavg virtualization
---
 kernel/sched/loadavg.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index 3910ed5b8323..d6d3b8db8718 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -113,7 +113,7 @@ extern raw_spinlock_t load_ve_lock;
 
 void calc_load_ve(void)
 {
-	unsigned long nr_unint, nr_active;
+	long nr_unint, nr_active;
 	struct task_group *tg;
 	int i;
 
@@ -133,13 +133,22 @@ void calc_load_ve(void)
 			 * overhead for activate/deactivate operations. So, we
 			 * don't account child cgroup unint tasks here.
 			 */
-			nr_active += tg->cfs_rq[i]->nr_unint;
+			nr_active += (int)tg->cfs_rq[i]->nr_unint;
 #endif
 #ifdef CONFIG_RT_GROUP_SCHED
 			nr_active += tg->rt_rq[i]->rt_nr_running;
 #endif
 		}
-		nr_active *= FIXED_1;
+		/*
+		 * Unless we screwed somewhere, nr_active must always be >= 0,
+		 * because we can have no runnable tasks and no tasks in
+		 * uninterruptible sleep.
+		 */
+		if (WARN_ON_ONCE(nr_active < 0)) {
+			nr_active = 0;
+		} else {
+			nr_active *= FIXED_1;
+		}
 
 		tg->avenrun[0] = calc_load(tg->avenrun[0], EXP_1, nr_active);
 		tg->avenrun[1] = calc_load(tg->avenrun[1], EXP_5, nr_active);


More information about the Devel mailing list