[Devel] [PATCH RH8 1/4] sched: fix cfs_rq::nr_iowait accounting

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Fri Jun 18 14:32:17 MSK 2021


From: Jan Dakinevich <jan.dakinevich at virtuozzo.com>

After recent RedHat (b6be9ae "rh7: import RHEL7 kernel-3.10.0-957.12.2.el7")
following sequence:

  update_stats_dequeue()
    dequeue_sleeper()
      cfs_rq->nr_iowait++

is called conditionally and cfs_rq::nr_iowait incremented if
schedstat_enabled() is true.

However, it is expected that this counter handled independently on
other scheduler statistics gathering. To fix it, move cfs_rq::nr_iowait
incrementing out of schedstat_enabled() checking.

https://jira.sw.ru/browse/PSBM-93850
Signed-off-by: Jan Dakinevich <jan.dakinevich at virtuozzo.com>
Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
Reviewed-by: Konstantin Khorenko <khorenko at virtuozzo.com>

khorenko@ note: after this patch "nr_iowait" should be accounted properly until
disk io limits are set for a Container and throttling is activated. Taking into
account at the moment "nr_iowait" is always broken, let's apply current patch
and rework "nr_iowait" accounting to honor throttle code later.

At the moment throttle_cfs_rq() will inc nr_iowait (in dequeue_entity()) while
unthrottle_cfs_rq() won't decrement it in enqueue_entity().

Changes when porting to VZ8:
- Drop hunk in try_to_wake_up_local() as old code path:
  schedule
    __schedule
      try_to_wake_up_local
        nr_iowait_dec
is now replaced by mainstream with:
  schedule
    sched_submit_work
      wq_worker_sleeping
        wake_up_process
          try_to_wake_up
            nr_iowait_dec
and there is no more try_to_wake_up_local().
- Replace removal hunk in dequeue_sleeper() with corresponding hunk in
update_stats_dequeue.

https://jira.sw.ru/browse/PSBM-127846
(cherry-picked from vz7 commit 0bf288fedba7 ("sched: fix
cfs_rq::nr_iowait accounting"))
Fixes: ebd33cb22f39 ("sched: Account cfs_rq::nr_iowait")
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
 kernel/sched/fair.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index da2e976a6c12..e69d8453b278 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1099,9 +1099,6 @@ update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) {
 		struct task_struct *tsk = task_of(se);
 
-		if (tsk->in_iowait)
-			cfs_rq->nr_iowait++;
-
 		if (tsk->state & TASK_INTERRUPTIBLE)
 			__schedstat_set(se->statistics.sleep_start,
 				      rq_clock(rq_of(cfs_rq)));
@@ -4186,6 +4183,13 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 
 	update_stats_dequeue(cfs_rq, se, flags);
 
+	if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) {
+		struct task_struct *tsk = task_of(se);
+
+		if (tsk->in_iowait)
+			cfs_rq->nr_iowait++;
+	}
+
 	clear_buddies(cfs_rq, se);
 
 	if (cfs_rq->prev == se)
-- 
2.31.1



More information about the Devel mailing list