[Devel] [PATCH] sunrpc: handle -ENETUNREACH error on connect as fatal if VE is dying

Stanislav Kinsburskiy skinsbursky at virtuozzo.com
Wed Feb 22 09:05:32 PST 2017


Have no idea, why kernel_connect() in xs_tcp_setup_socket() return -ENETUNREACH
with this call stack:

[root at uqvm098 ~]# cat /proc/14753/stack
[<ffffffffa029bde1>] rpc_wait_bit_killable+0x11/0x60 [sunrpc]
[<ffffffffa029bdcd>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
[<ffffffffa06720b2>] _nfs4_proc_delegreturn+0x222/0x280 [nfsv4]
[<ffffffffa0677f21>] nfs4_proc_delegreturn+0x81/0x110 [nfsv4]
[<ffffffffa068b1c9>] nfs_do_return_delegation+0x29/0x40 [nfsv4]
[<ffffffffa068bd77>] nfs_inode_return_delegation_noreclaim+0x27/0x30 [nfsv4]
[<ffffffffa068a389>] nfs4_evict_inode+0x29/0x70 [nfsv4]
[<ffffffff81232447>] evict+0xa7/0x170
[<ffffffff8123254e>] dispose_list+0x3e/0x50
[<ffffffff81233374>] evict_inodes+0x114/0x140
[<ffffffff81218658>] generic_shutdown_super+0x48/0xf0
[<ffffffff81218a62>] kill_anon_super+0x12/0x20
[<ffffffffa06125bb>] nfs_kill_super+0x1b/0x30 [nfs]
[<ffffffff81218ff9>] deactivate_locked_super+0x49/0x80
[<ffffffff81219076>] deactivate_super+0x46/0x60
[<ffffffff81236c75>] mntput_no_expire+0xc5/0x120
[<ffffffff81236cf4>] mntput+0x24/0x40
[<ffffffff81236e28>] namespace_unlock+0x118/0x130
[<ffffffff81239a5b>] put_mnt_ns+0x4b/0x60
[<ffffffff810b72eb>] free_nsproxy+0x1b/0x90
[<ffffffff811007be>] ve_drop_context+0x7e/0xd0
[<ffffffff811025d4>] ve_exit_ns+0x94/0xc0
[<ffffffff8112534c>] zap_pid_ns_processes+0x1ac/0x220
[<ffffffff8108cb4d>] do_exit+0xadd/0xb20
[<ffffffff8108cc0f>] do_group_exit+0x3f/0xa0
[<ffffffff8109e200>] get_signal_to_deliver+0x1d0/0x6d0
[<ffffffff8102a387>] do_signal+0x57/0x6b0
[<ffffffff8102aa3f>] do_notify_resume+0x5f/0xb0
[<ffffffff816911fd>] int_signal+0x12/0x17
[<ffffffffffffffff>] 0xffffffffffffffff

But this makes an infinite loop, and process can't be killed since it's dead
alredy.
Don't see any better solution so far (except investigation, why -ENETUNREACH
is returned).
So let's simply break the loop when it's called from ve_drop_context().

https://jira.sw.ru/browse/PSBM-60905

Signed-off-by: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>
---
 net/sunrpc/xprt.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 774e351..9277800 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -777,10 +777,18 @@ static void xprt_connect_status(struct rpc_task *task)
 	}
 
 	switch (task->tk_status) {
+	case -ENETUNREACH:
+		if (current->task_ve->ve_netns == NULL) {
+			dprintk("RPC: %5u xprt_connect_status: error %d connecting to "
+					"server %s\n", task->tk_pid, -task->tk_status,
+					xprt->servername);
+			dprintk("RPC: %5u host ve is dying\n", task->tk_pid);
+			task->tk_status = -EIO;
+			break;
+		}
 	case -ECONNREFUSED:
 	case -ECONNRESET:
 	case -ECONNABORTED:
-	case -ENETUNREACH:
 	case -EHOSTUNREACH:
 	case -EPIPE:
 	case -EAGAIN:



More information about the Devel mailing list