[Devel] [PATCH RHEL7 COMMIT] ms/netlink: do not enter direct reclaim from netlink_dump()

Konstantin Khorenko khorenko at virtuozzo.com
Mon May 18 22:35:43 MSK 2020


The commit is pushed to "branch-rh7-3.10.0-1127.8.2.vz7.151.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.8.2.vz7.151.1
------>
commit 978945e17507c29ee959262af5ca64cb34ad26d5
Author: Vasily Averin <vvs at virtuozzo.com>
Date:   Mon May 18 22:35:43 2020 +0300

    ms/netlink: do not enter direct reclaim from netlink_dump()
    
        [ Upstream commit d35c99ff77ecb2eb239731b799386f3b3637a31e ]
    
        Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
        allocations.
    
        Due to struct skb_shared_info ~320 bytes overhead, we end up using
        order-3 (on x86) page allocations, that might trigger direct reclaim and
        add stress.
    
        The intent was really to attempt a large allocation but immediately
        fallback to a smaller one (order-1 on x86) in case of memory stress.
    
        On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
        meet the goal. Old kernels would need to remove __GFP_WAIT
    
        While we are at it, since we do an order-3 allocation, allow to use
        all the allocated bytes instead of 16384 to reduce syscalls during
        large dumps.
    
        iproute2 already uses 32KB recvmsg() buffer sizes.
    
        Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)
    
        Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
        Signed-off-by: Eric Dumazet <edumazet at google.com>
        Reported-by: Alexei Starovoitov <ast at kernel.org>
        Cc: Greg Thelen <gthelen at google.com>
        Reviewed-by: Greg Rose <grose at lightfleet.com>
        Acked-by: Alexei Starovoitov <ast at kernel.org>
        Signed-off-by: David S. Miller <davem at davemloft.net>
        Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
        Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
    
    [vvs@: taken from stable 3.19]
    https://jira.sw.ru/browse/PSBM-104086
    Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
---
 net/netlink/af_netlink.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 382141c8a0d71..c36d6c354dfc5 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1802,7 +1802,7 @@ static int netlink_recvmsg(struct kiocb *kiocb, struct socket *sock,
 	/* Record the max length of recvmsg() calls for future allocations */
 	nlk->max_recvmsg_len = max(nlk->max_recvmsg_len, len);
 	nlk->max_recvmsg_len = min_t(size_t, nlk->max_recvmsg_len,
-				     16384);
+				     SKB_WITH_OVERHEAD(32768));
 
 	copied = data_skb->len - skip;
 	if (len < copied) {
@@ -2082,9 +2082,8 @@ static int netlink_dump(struct sock *sk)
 		skb = netlink_alloc_skb(sk,
 					nlk->max_recvmsg_len,
 					nlk->portid,
-					GFP_KERNEL |
-					__GFP_NOWARN |
-					__GFP_NORETRY);
+					(GFP_KERNEL & ~__GFP_WAIT) |
+					__GFP_NOWARN | __GFP_NORETRY);
 		/* available room should be exact amount to avoid MSG_TRUNC */
 		if (skb)
 			skb_reserve(skb, skb_tailroom(skb) -
@@ -2092,7 +2091,7 @@ static int netlink_dump(struct sock *sk)
 	}
 	if (!skb)
 		skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
-					GFP_KERNEL);
+					(GFP_KERNEL & ~__GFP_WAIT));
 	if (!skb)
 		goto errout_skb;
 	netlink_skb_set_owner_r(skb, sk);


More information about the Devel mailing list