[Devel] [PATCH RHEL7 COMMIT] ms/rq-qos: fix missed wake-ups in rq_qos_throttle try two

Konstantin Khorenko khorenko at virtuozzo.com
Fri Mar 29 20:54:09 MSK 2024


The commit is pushed to "branch-rh7-3.10.0-1160.108.1.vz7.221.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.108.1.vz7.221.2
------>
commit 83b166fdd5d4a8b07598f3ee0e632a32a25d82b8
Author: Jan Kara <jack at suse.cz>
Date:   Mon Jun 7 13:26:13 2021 +0200

    ms/rq-qos: fix missed wake-ups in rq_qos_throttle try two
    
    Commit 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
    tried to fix a problem that a process could be sleeping in rq_qos_wait()
    without anyone to wake it up. However the fix is not complete and the
    following can still happen:
    
    CPU1 (waiter1)          CPU2 (waiter2)          CPU3 (waker)
    rq_qos_wait()           rq_qos_wait()
      acquire_inflight_cb() -> fails
                              acquire_inflight_cb() -> fails
    
                                                    completes IOs, inflight
                                                      decreased
      prepare_to_wait_exclusive()
                              prepare_to_wait_exclusive()
      has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
                              has_sleeper = !wq_has_single_sleeper() -> true
      io_schedule()           io_schedule()
    
    Deadlock as now there's nobody to wakeup the two waiters. The logic
    automatically blocking when there are already sleepers is really subtle
    and the only way to make it work reliably is that we check whether there
    are some waiters in the queue when adding ourselves there. That way, we
    are guaranteed that at least the first process to enter the wait queue
    will recheck the waiting condition before going to sleep and thus
    guarantee forward progress.
    
    mFixes: 545fbd0775ba ("rq-qos: fix missed wake-ups in rq_qos_throttle")
    CC: stable at vger.kernel.org
    Signed-off-by: Jan Kara <jack at suse.cz>
    Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.cz
    Signed-off-by: Jens Axboe <axboe at kernel.dk>
    
    Porting to vz7 notices:
      - wait_queue_head in mainstream, but still wait_queue_head_t in vz7
      - use waitqueue_active() in vz7 instead of straight list_empty() check in ms
    
    https://virtuozzo.atlassian.net/browse/PSBM-153598
    (cherry picked from commit 11c7aa0ddea8611007768d3e6b58d45dc60a19e1)
    Ported-by: Konstantin Khorenko <khorenko at virtuozzo.com>
---
 block/blk-wbt.c      | 4 ++--
 include/linux/wait.h | 2 +-
 kernel/wait.c        | 9 +++++++--
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 099678cd0d04..b9d9e1d68a40 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -571,8 +571,8 @@ static void __wbt_wait(struct rq_wb *rwb, enum wbt_flags wb_acct,
 	if (!has_sleeper && rq_wait_inc_below(rqw, get_limit(rwb, rw)))
 		return;
 
-	prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
-	has_sleeper = !wq_has_single_sleeper(&rqw->wait);
+	has_sleeper = !prepare_to_wait_exclusive(&rqw->wait, &data.wq,
+						 TASK_UNINTERRUPTIBLE);
 	do {
 		/* The memory barrier in set_task_state saves us here. */
 		if (data.got_token)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 12075edebfd6..3583ceb21a78 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -1044,7 +1044,7 @@ extern long interruptible_sleep_on_timeout(wait_queue_head_t *q,
  * Waitqueues which are removed from the waitqueue_head at wakeup time
  */
 void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
-void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
+bool prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
 void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
 void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
 			unsigned int mode, void *key);
diff --git a/kernel/wait.c b/kernel/wait.c
index 15c1d0892c71..5303bb6de60b 100644
--- a/kernel/wait.c
+++ b/kernel/wait.c
@@ -79,17 +79,22 @@ prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state)
 }
 EXPORT_SYMBOL(prepare_to_wait);
 
-void
+/* Returns true if we are the first waiter in the queue, false otherwise. */
+bool
 prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
 {
 	unsigned long flags;
+	bool was_empty = false;
 
 	wait->flags |= WQ_FLAG_EXCLUSIVE;
 	spin_lock_irqsave(&q->lock, flags);
-	if (list_empty(&wait->task_list))
+	if (list_empty(&wait->task_list)) {
+		was_empty = !waitqueue_active(q);
 		__add_wait_queue_tail(q, wait);
+	}
 	set_current_state(state);
 	spin_unlock_irqrestore(&q->lock, flags);
+	return was_empty;
 }
 EXPORT_SYMBOL(prepare_to_wait_exclusive);
 


More information about the Devel mailing list