[Devel] [PATCH v3] nfs: protect callback execution against per-net callback thread shutdown

Tue Nov 7 14:30:37 MSK 2017

On 07.11.2017 13:39, Stanislav Kinsburskiy wrote:
> From: Stanislav Kinsburskiy <skinsbursky at parallels.com>
> 
> Here is the race:
> 
> CPU #0				CPU#1
> 
> cleanup_mnt			nfs41_callback_svc (get xprt from the list)
> nfs_callback_down		...
> ...				...
> svc_close_net			...
> ...				...
> svc_xprt_free			...
> svc_bc_sock_free		bc_svc_process
> kfree(xprt)			svc_process_common
> 				rqstp->rq_xprt->xpt_ops (use after free)
> 
> The problem is that per-net SUNRPC transports shutdown is done regardless
> current callback execution. This is a race leading to transport use-after-free
> in callback handler.
> This patch fixes it in stright-forward way. I.e. it protects callback
> execution with the same mutex used for per-net data creation and destruction.
> Hopefully, it won't slow down NFS client significantly.
> 
> https://jira.sw.ru/browse/PSBM-75751
> 
> v3: Fix mutex deadlock, when shutdown callback waits for thread to exit (with
> mutex taken), while thread wait for the mutex to take.
> The idea is to simply check if thread has to exit, if mutex lock has failed.
> This is a busy loop, but it shouldn't happend often and for long.
> 
> Signed-off-by: Stanislav Kinsburskiy <skinsbursky at parallels.com>
> ---
>  fs/nfs/callback.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index 0beb275..bbb07e4 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -117,7 +117,11 @@ nfs41_callback_svc(void *vrqstp)
>  		if (try_to_freeze())
>  			continue;
>  
> +		if (!mutex_trylock(&nfs_callback_mutex))
> +		       continue;

This looks like a busy loop (that especially bad, because mutex-owner may sleep
at the moment we are looping).

It seems the solution may be to change nfs_callback_down() function.
Can't we flush pending request in nfs_callback_down()? Or just delete
it from sv_cb_list without handling (does nfs proto allow that)?

Also, one more additional argument for flush is a suspicion there may be more than
one pending request in the list. Can it be?

> +
>  		prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
> +
>  		spin_lock_bh(&serv->sv_cb_lock);
>  		if (!list_empty(&serv->sv_cb_list)) {
>  			req = list_first_entry(&serv->sv_cb_list,
> @@ -129,8 +133,10 @@ nfs41_callback_svc(void *vrqstp)
>  			error = bc_svc_process(serv, req, rqstp);
>  			dprintk("bc_svc_process() returned w/ error code= %d\n",
>  				error);
> +			mutex_unlock(&nfs_callback_mutex);
>  		} else {
>  			spin_unlock_bh(&serv->sv_cb_lock);
> +			mutex_unlock(&nfs_callback_mutex);
>  			schedule();
>  			finish_wait(&serv->sv_cb_waitq, &wq);
>  		}
>