[Devel] [PATCH] sunrpc: handle -ENETUNREACH error on connect as fatal if VE is dying

Wed Feb 22 09:04:10 PST 2017

On 02/22/2017 08:05 PM, Stanislav Kinsburskiy wrote:
> Have no idea, why kernel_connect() in xs_tcp_setup_socket() return -ENETUNREACH
> with this call stack:
>
> [root at uqvm098 ~]# cat /proc/14753/stack
> [<ffffffffa029bde1>] rpc_wait_bit_killable+0x11/0x60 [sunrpc]
> [<ffffffffa029bdcd>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
> [<ffffffffa06720b2>] _nfs4_proc_delegreturn+0x222/0x280 [nfsv4]
> [<ffffffffa0677f21>] nfs4_proc_delegreturn+0x81/0x110 [nfsv4]
> [<ffffffffa068b1c9>] nfs_do_return_delegation+0x29/0x40 [nfsv4]
> [<ffffffffa068bd77>] nfs_inode_return_delegation_noreclaim+0x27/0x30 [nfsv4]
> [<ffffffffa068a389>] nfs4_evict_inode+0x29/0x70 [nfsv4]
> [<ffffffff81232447>] evict+0xa7/0x170
> [<ffffffff8123254e>] dispose_list+0x3e/0x50
> [<ffffffff81233374>] evict_inodes+0x114/0x140
> [<ffffffff81218658>] generic_shutdown_super+0x48/0xf0
> [<ffffffff81218a62>] kill_anon_super+0x12/0x20
> [<ffffffffa06125bb>] nfs_kill_super+0x1b/0x30 [nfs]
> [<ffffffff81218ff9>] deactivate_locked_super+0x49/0x80
> [<ffffffff81219076>] deactivate_super+0x46/0x60
> [<ffffffff81236c75>] mntput_no_expire+0xc5/0x120
> [<ffffffff81236cf4>] mntput+0x24/0x40
> [<ffffffff81236e28>] namespace_unlock+0x118/0x130
> [<ffffffff81239a5b>] put_mnt_ns+0x4b/0x60
> [<ffffffff810b72eb>] free_nsproxy+0x1b/0x90
> [<ffffffff811007be>] ve_drop_context+0x7e/0xd0
> [<ffffffff811025d4>] ve_exit_ns+0x94/0xc0
> [<ffffffff8112534c>] zap_pid_ns_processes+0x1ac/0x220
> [<ffffffff8108cb4d>] do_exit+0xadd/0xb20
> [<ffffffff8108cc0f>] do_group_exit+0x3f/0xa0
> [<ffffffff8109e200>] get_signal_to_deliver+0x1d0/0x6d0
> [<ffffffff8102a387>] do_signal+0x57/0x6b0
> [<ffffffff8102aa3f>] do_notify_resume+0x5f/0xb0
> [<ffffffff816911fd>] int_signal+0x12/0x17
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> But this makes an infinite loop, and process can't be killed since it's dead
> alredy.
> Don't see any better solution so far (except investigation, why -ENETUNREACH
> is returned).
> So let's simply break the loop when it's called from ve_drop_context().
>
> https://jira.sw.ru/browse/PSBM-60905
>
> Signed-off-by: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>

I thought, you decided that it's better to find the reason (blocked
port or whatnot)?

The patch LGTM,
Reviewed-by: Dmitry Safonov <dsafonov at virtuozzo.com>

> ---
>  net/sunrpc/xprt.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index 774e351..9277800 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -777,10 +777,18 @@ static void xprt_connect_status(struct rpc_task *task)
>  	}
>
>  	switch (task->tk_status) {
> +	case -ENETUNREACH:
> +		if (current->task_ve->ve_netns == NULL) {
> +			dprintk("RPC: %5u xprt_connect_status: error %d connecting to "
> +					"server %s\n", task->tk_pid, -task->tk_status,
> +					xprt->servername);
> +			dprintk("RPC: %5u host ve is dying\n", task->tk_pid);
> +			task->tk_status = -EIO;
> +			break;
> +		}
>  	case -ECONNREFUSED:
>  	case -ECONNRESET:
>  	case -ECONNABORTED:
> -	case -ENETUNREACH:
>  	case -EHOSTUNREACH:
>  	case -EPIPE:
>  	case -EAGAIN:
>

-- 
              Dmitry