[Devel] [PATCH RHEL7 COMMIT] ms/net: Generalise wq_has_sleeper helper
Konstantin Khorenko
khorenko at virtuozzo.com
Fri Oct 14 19:50:17 MSK 2022
The commit is pushed to "branch-rh7-3.10.0-1160.76.1.vz7.189.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1160.76.1.vz7.189.4
------>
commit 742bf453d82d93d52bf1effbbd8248e3dfe8f239
Author: Herbert Xu <herbert at gondor.apana.org.au>
Date: Thu Sep 29 14:30:03 2022 +0300
ms/net: Generalise wq_has_sleeper helper
The memory barrier in the helper wq_has_sleeper is needed by just
about every user of waitqueue_active. This patch generalises it
by making it take a wait_queue_head_t directly. The existing
helper is renamed to skwq_has_sleeper.
Signed-off-by: Herbert Xu <herbert at gondor.apana.org.au>
Signed-off-by: David S. Miller <davem at davemloft.net>
Changes when porting to vz7:
- skip rypto/algif_aead.c hunks
https://jira.sw.ru/browse/PSBM-141883
(cherry picked from commit 1ce0bf50ae2233c7115a18c0c623662d177b434c)
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
=================
Patchset description:
blk-wbt: Fix hardlockup in wbt_done()
We have a hard lockup detected in this stack:
#13 [ffff9103fe603af8] __enqueue_entity at ffffffffb8ce64c5
#14 [ffff9103fe603b00] enqueue_entity at ffffffffb8cee27a
#15 [ffff9103fe603b50] enqueue_task_fair at ffffffffb8ceea9c
#16 [ffff9103fe603ba0] activate_task at ffffffffb8cdd029
#17 [ffff9103fe603bc8] ttwu_do_activate at ffffffffb8cdd491
#18 [ffff9103fe603bf0] try_to_wake_up at ffffffffb8ce124a
#19 [ffff9103fe603c40] default_wake_function at ffffffffb8ce1552
#20 [ffff9103fe603c50] autoremove_wake_function at ffffffffb8ccb178
#21 [ffff9103fe603c78] __wake_up_common at ffffffffb8cd7752
#22 [ffff9103fe603cd0] __wake_up_common_lock at ffffffffb8cd7873
#23 [ffff9103fe603d40] __wake_up at ffffffffb8cd78c3
#24 [ffff9103fe603d50] __wbt_done at ffffffffb8fb6573
#25 [ffff9103fe603d60] wbt_done at ffffffffb8fb65f2
#26 [ffff9103fe603d80] __blk_mq_finish_request at ffffffffb8f8daa1
#27 [ffff9103fe603db8] blk_mq_finish_request at ffffffffb8f8db6a
#28 [ffff9103fe603dc8] blk_mq_sched_put_request at ffffffffb8f93ee0
#29 [ffff9103fe603de8] blk_mq_end_request at ffffffffb8f8d1a4
#30 [ffff9103fe603e08] nvme_complete_rq at ffffffffc033dcfc [nvme_core]
#31 [ffff9103fe603e18] nvme_pci_complete_rq at ffffffffc038be70 [nvme]
#32 [ffff9103fe603e40] __blk_mq_complete_request at ffffffffb8f8d316
#33 [ffff9103fe603e68] blk_mq_complete_request at ffffffffb8f8d3c7
#34 [ffff9103fe603e78] nvme_irq at ffffffffc038c0b2 [nvme]
#35 [ffff9103fe603eb0] __handle_irq_event_percpu at ffffffffb8d66bb4
#36 [ffff9103fe603ef8] handle_irq_event_percpu at ffffffffb8d66d62
#37 [ffff9103fe603f28] handle_irq_event at ffffffffb8d66dec
#38 [ffff9103fe603f50] handle_edge_irq at ffffffffb8d69c0f
#39 [ffff9103fe603f70] handle_irq at ffffffffb8c30524
#40 [ffff9103fe603fb8] do_IRQ at ffffffffb93d898d
which is exactly the same as ubuntu problem here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810998
this is because we have writeback throttling ported which does not work
well in some cases.
In launchpad bug it helped to port these patches from mainstream:
* CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
- blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
- blk-wbt: move disable check into get_limit()
- blk-wbt: use wq_has_sleeper() for wq active check
- blk-wbt: fix has-sleeper queueing check
- blk-wbt: abstract out end IO completion handler
- blk-wbt: improve waking of tasks
which fixes similar lockup issues in wbt.
More over I've found some more small and useful patches which fix races
(missed wakeups) in this code, so I've also put them in the patchset.
Anchal Agarwal (1):
blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
Herbert Xu (1):
net: Generalise wq_has_sleeper helper
Jens Axboe (5):
blk-wbt: move disable check into get_limit()
blk-wbt: use wq_has_sleeper() for wq active check
blk-wbt: fix has-sleeper queueing check
blk-wbt: abstract out end IO completion handler
blk-wbt: improve waking of tasks
Josef Bacik (5):
wait: add wq_has_single_sleeper helper
rq-qos: fix missed wake-ups in rq_qos_throttle
rq-qos: don't reset has_sleepers on spurious wakeups
rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
rq-qos: use a mb for got_token
https://jira.sw.ru/browse/PSBM-141883
Ported-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
crypto/algif_skcipher.c | 4 ++--
include/linux/wait.h | 21 +++++++++++++++++++++
include/net/sock.h | 15 +++++----------
net/atm/common.c | 4 ++--
net/core/sock.c | 8 ++++----
net/core/stream.c | 2 +-
net/dccp/output.c | 2 +-
net/iucv/af_iucv.c | 2 +-
net/rxrpc/af_rxrpc.c | 2 +-
net/sctp/socket.c | 2 +-
net/tipc/socket.c | 4 ++--
net/unix/af_unix.c | 2 +-
12 files changed, 42 insertions(+), 26 deletions(-)
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index 9a62fa9e02ec..ad4a88628f95 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -186,7 +186,7 @@ static void skcipher_wmem_wakeup(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, POLLIN |
POLLRDNORM |
POLLRDBAND);
@@ -236,7 +236,7 @@ static void skcipher_data_wakeup(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, POLLOUT |
POLLRDNORM |
POLLRDBAND);
diff --git a/include/linux/wait.h b/include/linux/wait.h
index b52741b1775a..2cd2201fc1e4 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -123,6 +123,27 @@ static inline int waitqueue_active(wait_queue_head_t *q)
return !list_empty(&q->task_list);
}
+/**
+ * wq_has_sleeper - check if there are any waiting processes
+ * @wq: wait queue head
+ *
+ * Returns true if wq has waiting processes
+ *
+ * Please refer to the comment for waitqueue_active.
+ */
+static inline bool wq_has_sleeper(wait_queue_head_t *wq)
+{
+ /*
+ * We need to be sure we are in sync with the
+ * add_wait_queue modifications to the wait queue.
+ *
+ * This memory barrier should be paired with one on the
+ * waiting side.
+ */
+ smp_mb();
+ return waitqueue_active(wq);
+}
+
extern void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait);
extern void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait);
extern void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait);
diff --git a/include/net/sock.h b/include/net/sock.h
index e67f4de07c6b..a8609a8c04a1 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -59,6 +59,7 @@
#include <linux/static_key.h>
#include <linux/aio.h>
#include <linux/sched.h>
+#include <linux/wait.h>
#include <linux/filter.h>
#include <linux/rculist_nulls.h>
@@ -2054,12 +2055,12 @@ static inline bool sk_has_allocations(const struct sock *sk)
}
/**
- * wq_has_sleeper - check if there are any waiting processes
+ * skwq_has_sleeper - check if there are any waiting processes
* @wq: struct socket_wq
*
* Returns true if socket_wq has waiting processes
*
- * The purpose of the wq_has_sleeper and sock_poll_wait is to wrap the memory
+ * The purpose of the skwq_has_sleeper and sock_poll_wait is to wrap the memory
* barrier call. They were added due to the race found within the tcp code.
*
* Consider following tcp code paths:
@@ -2085,15 +2086,9 @@ static inline bool sk_has_allocations(const struct sock *sk)
* data on the socket.
*
*/
-static inline bool wq_has_sleeper(struct socket_wq *wq)
+static inline bool skwq_has_sleeper(struct socket_wq *wq)
{
- /* We need to be sure we are in sync with the
- * add_wait_queue modifications to the wait queue.
- *
- * This memory barrier is paired in the sock_poll_wait.
- */
- smp_mb();
- return wq && waitqueue_active(&wq->wait);
+ return wq && wq_has_sleeper(&wq->wait);
}
/**
diff --git a/net/atm/common.c b/net/atm/common.c
index ecaface2878d..6325ab578401 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -96,7 +96,7 @@ static void vcc_def_wakeup(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up(&wq->wait);
rcu_read_unlock();
}
@@ -117,7 +117,7 @@ static void vcc_write_space(struct sock *sk)
if (vcc_writable(sk)) {
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible(&wq->wait);
sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
diff --git a/net/core/sock.c b/net/core/sock.c
index 937b705e2c82..130ac5ae3107 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2396,7 +2396,7 @@ static void sock_def_wakeup(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_all(&wq->wait);
rcu_read_unlock();
}
@@ -2407,7 +2407,7 @@ static void sock_def_error_report(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_poll(&wq->wait, POLLERR);
sk_wake_async(sk, SOCK_WAKE_IO, POLL_ERR);
rcu_read_unlock();
@@ -2419,7 +2419,7 @@ static void sock_def_readable(struct sock *sk, int len)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, POLLIN | POLLPRI |
POLLRDNORM | POLLRDBAND);
sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
@@ -2437,7 +2437,7 @@ static void sock_def_write_space(struct sock *sk)
*/
if ((atomic_read(&sk->sk_wmem_alloc) << 1) <= sk->sk_sndbuf) {
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, POLLOUT |
POLLWRNORM | POLLWRBAND);
diff --git a/net/core/stream.c b/net/core/stream.c
index d70f77a0c889..8ff9d63b4265 100644
--- a/net/core/stream.c
+++ b/net/core/stream.c
@@ -35,7 +35,7 @@ void sk_stream_write_space(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_poll(&wq->wait, POLLOUT |
POLLWRNORM | POLLWRBAND);
if (wq && wq->fasync_list && !(sk->sk_shutdown & SEND_SHUTDOWN))
diff --git a/net/dccp/output.c b/net/dccp/output.c
index 8876078859da..c60ddcdcc6c9 100644
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -201,7 +201,7 @@ void dccp_write_space(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible(&wq->wait);
/* Should agree with poll, otherwise some programs break */
if (sock_writeable(sk))
diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index af877285f5e0..a7ba44d17c84 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -305,7 +305,7 @@ static void iucv_sock_wake_msglim(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_all(&wq->wait);
sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
rcu_read_unlock();
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index e61aa6001c65..73d2767d530a 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -67,7 +67,7 @@ static void rxrpc_write_space(struct sock *sk)
if (rxrpc_writable(sk)) {
struct socket_wq *wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible(&wq->wait);
sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 67f083895cdd..2b524f9939ce 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7625,7 +7625,7 @@ void sctp_data_ready(struct sock *sk, int len)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, POLLIN |
POLLRDNORM | POLLRDBAND);
sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 2b1d7c2d677d..2b1c75efba7f 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1116,7 +1116,7 @@ static void tipc_write_space(struct sock *sk)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, POLLOUT |
POLLWRNORM | POLLWRBAND);
rcu_read_unlock();
@@ -1133,7 +1133,7 @@ static void tipc_data_ready(struct sock *sk, int len)
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait, POLLIN |
POLLRDNORM | POLLRDBAND);
rcu_read_unlock();
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2ecdf86ce707..44f7d932b23e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -441,7 +441,7 @@ static void unix_write_space(struct sock *sk)
rcu_read_lock();
if (unix_writable(sk)) {
wq = rcu_dereference(sk->sk_wq);
- if (wq_has_sleeper(wq))
+ if (skwq_has_sleeper(wq))
wake_up_interruptible_sync_poll(&wq->wait,
POLLOUT | POLLWRNORM | POLLWRBAND);
sk_wake_async(sk, SOCK_WAKE_SPACE, POLL_OUT);
More information about the Devel
mailing list