[Devel] [PATCH] sunrpc: handle -ENETUNREACH error on connect as fatal if VE is dying

Stanislav Kinsburskiy skinsbursky at virtuozzo.com
Wed Feb 22 09:11:02 PST 2017



22.02.2017 18:04, Dmitry Safonov пишет:
> On 02/22/2017 08:05 PM, Stanislav Kinsburskiy wrote:
>> Have no idea, why kernel_connect() in xs_tcp_setup_socket() return 
>> -ENETUNREACH
>> with this call stack:
>>
>> [root at uqvm098 ~]# cat /proc/14753/stack
>> [<ffffffffa029bde1>] rpc_wait_bit_killable+0x11/0x60 [sunrpc]
>> [<ffffffffa029bdcd>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc]
>> [<ffffffffa06720b2>] _nfs4_proc_delegreturn+0x222/0x280 [nfsv4]
>> [<ffffffffa0677f21>] nfs4_proc_delegreturn+0x81/0x110 [nfsv4]
>> [<ffffffffa068b1c9>] nfs_do_return_delegation+0x29/0x40 [nfsv4]
>> [<ffffffffa068bd77>] nfs_inode_return_delegation_noreclaim+0x27/0x30 
>> [nfsv4]
>> [<ffffffffa068a389>] nfs4_evict_inode+0x29/0x70 [nfsv4]
>> [<ffffffff81232447>] evict+0xa7/0x170
>> [<ffffffff8123254e>] dispose_list+0x3e/0x50
>> [<ffffffff81233374>] evict_inodes+0x114/0x140
>> [<ffffffff81218658>] generic_shutdown_super+0x48/0xf0
>> [<ffffffff81218a62>] kill_anon_super+0x12/0x20
>> [<ffffffffa06125bb>] nfs_kill_super+0x1b/0x30 [nfs]
>> [<ffffffff81218ff9>] deactivate_locked_super+0x49/0x80
>> [<ffffffff81219076>] deactivate_super+0x46/0x60
>> [<ffffffff81236c75>] mntput_no_expire+0xc5/0x120
>> [<ffffffff81236cf4>] mntput+0x24/0x40
>> [<ffffffff81236e28>] namespace_unlock+0x118/0x130
>> [<ffffffff81239a5b>] put_mnt_ns+0x4b/0x60
>> [<ffffffff810b72eb>] free_nsproxy+0x1b/0x90
>> [<ffffffff811007be>] ve_drop_context+0x7e/0xd0
>> [<ffffffff811025d4>] ve_exit_ns+0x94/0xc0
>> [<ffffffff8112534c>] zap_pid_ns_processes+0x1ac/0x220
>> [<ffffffff8108cb4d>] do_exit+0xadd/0xb20
>> [<ffffffff8108cc0f>] do_group_exit+0x3f/0xa0
>> [<ffffffff8109e200>] get_signal_to_deliver+0x1d0/0x6d0
>> [<ffffffff8102a387>] do_signal+0x57/0x6b0
>> [<ffffffff8102aa3f>] do_notify_resume+0x5f/0xb0
>> [<ffffffff816911fd>] int_signal+0x12/0x17
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> But this makes an infinite loop, and process can't be killed since 
>> it's dead
>> alredy.
>> Don't see any better solution so far (except investigation, why 
>> -ENETUNREACH
>> is returned).
>> So let's simply break the loop when it's called from ve_drop_context().
>>
>> https://jira.sw.ru/browse/PSBM-60905
>>
>> Signed-off-by: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>
>
> I thought, you decided that it's better to find the reason (blocked
> port or whatnot)?
>
> The patch LGTM,
> Reviewed-by: Dmitry Safonov <dsafonov at virtuozzo.com>
>

You can have a look at the issue.
It's revealed, that the address-port pair SUNRPC tries to connect to is 
unmasked.
The issue is somewhere in network layer (return of -ENETUNREACH for 
whatever reason) on container shutdown.
I suspect, that the issue is global and not related by vz kernel. But I 
don't have time to dig further.

>> ---
>>  net/sunrpc/xprt.c |   10 +++++++++-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
>> index 774e351..9277800 100644
>> --- a/net/sunrpc/xprt.c
>> +++ b/net/sunrpc/xprt.c
>> @@ -777,10 +777,18 @@ static void xprt_connect_status(struct rpc_task 
>> *task)
>>      }
>>
>>      switch (task->tk_status) {
>> +    case -ENETUNREACH:
>> +        if (current->task_ve->ve_netns == NULL) {
>> +            dprintk("RPC: %5u xprt_connect_status: error %d 
>> connecting to "
>> +                    "server %s\n", task->tk_pid, -task->tk_status,
>> +                    xprt->servername);
>> +            dprintk("RPC: %5u host ve is dying\n", task->tk_pid);
>> +            task->tk_status = -EIO;
>> +            break;
>> +        }
>>      case -ECONNREFUSED:
>>      case -ECONNRESET:
>>      case -ECONNABORTED:
>> -    case -ENETUNREACH:
>>      case -EHOSTUNREACH:
>>      case -EPIPE:
>>      case -EAGAIN:
>>
>
>



More information about the Devel mailing list