<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div class="elementToProof" style="font-family: Aptos, Aptos_EmbeddedFont, Aptos_MSFontService, Calibri, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hello, Den.<br>
Who could help with preparing RK for this patch ?</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>От:</b> Alexey Kuznetsov <kuznet@virtuozzo.com><br>
<b>Отправлено:</b> 8 мая 2025 г. 14:15<br>
<b>Кому:</b> Kui Liu <kui.liu@virtuozzo.com><br>
<b>Копия:</b> devel@openvz.org <devel@openvz.org>; Andrey Zaitsev <azaitsev@virtuozzo.com>; Konstantin Khorenko <khorenko@virtuozzo.com><br>
<b>Тема:</b> Re: [PATCH VZ9] fs/fuse kio: fix hang rpc over rdma io</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Hello!<br>
<br>
Can we make this to ready-kernel patch for 5.14.0-427.33.1.vz9.72.5?<br>
I promised customer in <a href="https://virtuozzo.atlassian.net/browse/ASUP-1425">
https://virtuozzo.atlassian.net/browse/ASUP-1425</a><br>
to repair this issue without reboot.<br>
<br>
On Thu, May 8, 2025 at 7:11 PM Alexey Kuznetsov <kuznet@virtuozzo.com> wrote:<br>
><br>
> Ack<br>
><br>
> On Thu, May 8, 2025 at 12:26 PM Liu Kui <kui.liu@virtuozzo.com> wrote:<br>
> ><br>
> > When a large msg is being sent out over rdma and in the stage of waiting<br>
> > for read ack from peer, it is moved from rio->write_queue to rio->active_txs.<br>
> > However the msg in rio->active_txs is not checked by pcs_rdma_next_timeout()<br>
> > to return a correct timeout back to rpc, as a result the rpc timer is not<br>
> > started. When the peer somehow becomes unresponsive, the msg at rio->active_txs<br>
> > can be stuck at waiting for read ack stage forever, because it can't be killed<br>
> > by the calendar timer since it's now under network I/O. As a result, the rpc<br>
> > can hang forever without detecting the stuck msg at underlying rdma io.<br>
> ><br>
> > Apparently pcs_rdma_next_timeout should return the next timeout based on<br>
> > first msg in rio->active_txs.<br>
> ><br>
> > Fixes: #VSTOR-105982<br>
> > <a href="https://virtuozzo.atlassian.net/browse/VSTOR-105982">https://virtuozzo.atlassian.net/browse/VSTOR-105982</a><br>
> ><br>
> > Signed-off-by: Liu Kui <kui.liu@virtuozzo.com><br>
> > ---<br>
> > fs/fuse/kio/pcs/pcs_rdma_io.c | 15 +++++++++++----<br>
> > 1 file changed, 11 insertions(+), 4 deletions(-)<br>
> ><br>
> > diff --git a/fs/fuse/kio/pcs/pcs_rdma_io.c b/fs/fuse/kio/pcs/pcs_rdma_io.c<br>
> > index 2755b13fb8a5..6fa38338ad0c 100644<br>
> > --- a/fs/fuse/kio/pcs/pcs_rdma_io.c<br>
> > +++ b/fs/fuse/kio/pcs/pcs_rdma_io.c<br>
> > @@ -1668,14 +1668,21 @@ static unsigned long pcs_rdma_next_timeout(struct pcs_netio *netio)<br>
> > struct pcs_rdmaio *rio = rio_from_netio(netio);<br>
> > struct pcs_rpc *ep = netio->parent;<br>
> > struct pcs_msg *msg;<br>
> > + struct rio_tx *tx;<br>
> ><br>
> > BUG_ON(!mutex_is_locked(&ep->mutex));<br>
> ><br>
> > - if (list_empty(&rio->write_queue))<br>
> > - return 0;<br>
> > + if (!list_empty(&rio->active_txs)) {<br>
> > + tx = list_first_entry(&rio->active_txs, struct rio_tx, list);<br>
> > + return tx->msg->start_time + rio->send_timeout;<br>
> > + }<br>
> ><br>
> > - msg = list_first_entry(&rio->write_queue, struct pcs_msg, list);<br>
> > - return msg->start_time + rio->send_timeout;<br>
> > + if (!list_empty(&rio->write_queue)) {<br>
> > + msg = list_first_entry(&rio->write_queue, struct pcs_msg, list);<br>
> > + return msg->start_time + rio->send_timeout;<br>
> > + }<br>
> > +<br>
> > + return 0;<br>
> > }<br>
> ><br>
> > static int pcs_rdma_sync_send(struct pcs_netio *netio, struct pcs_msg *msg)<br>
> > --<br>
> > 2.39.5 (Apple Git-154)<br>
</div>
</span></font></div>
</body>
</html>