[Devel] [PATCH RHEL7 COMMIT] ms/netfilter: ipset: Fix suspicious rcu_dereference_protected()

Konstantin Khorenko khorenko at virtuozzo.com
Mon Sep 30 15:21:54 MSK 2024


The commit is pushed to "branch-rh7-3.10.0-1160.119.1.vz7.224.x-ovz" and will appear at git at bitbucket.org:openvz/vzkernel.git
after rh7-3.10.0-1160.119.1.vz7.224.3
------>
commit b16d522c51180a964c22415a612322192ebcd492
Author: Jozsef Kadlecsik <kadlec at netfilter.org>
Date:   Wed Sep 25 17:35:46 2024 +0800

    ms/netfilter: ipset: Fix suspicious rcu_dereference_protected()
    
    When destroying all sets, we are either in pernet exit phase or
    are executing a "destroy all sets command" from userspace. The latter
    was taken into account in ip_set_dereference() (nfnetlink mutex is held),
    but the former was not. The patch adds the required check to
    rcu_dereference_protected() in ip_set_dereference().
    
    mFixes: 4e7aaa6b82d6 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type")
    Reported-by: syzbot+b62c37cdd58103293a5a at syzkaller.appspotmail.com
    Reported-by: syzbot+cfbe1da5fdfc39efc293 at syzkaller.appspotmail.com
    Reported-by: kernel test robot <oliver.sang at intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202406141556.e0b6f17e-lkp@intel.com
    Signed-off-by: Jozsef Kadlecsik <kadlec at netfilter.org>
    Signed-off-by: Pablo Neira Ayuso <pablo at netfilter.org>
    
    https://virtuozzo.atlassian.net/browse/PSBM-155867
    (cherry picked from commit 8ecd06277a7664f4ef018abae3abd3451d64e7a6)
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    =================
    Patchset description:
    netfilter: ipset: Fix possible cause of memory corruption
    
    Patch [1] fixes possible race between swap/destroy and add/del/test.
    
    Here is is possible order of events when this race can lead to double
    free with kfree_rcu on already freed hash bucket:
    
     # Thread 1
      +-> ip_set_add
        +-> set = ip_set_rcu_get(xt_net(par), index)
        < pause >
    
     # Thread 2
    ipset swap
    ipset destroy
      +-> mtype_destroy
        +-> mtype_ahash_destroy
          +-> n = __ipset_dereference(hbucket(t, i))
          +-> kfree(n)
    
     # Thread 1
        < unpause >
        +-> ip_set_lock(set)
        +-> hash_net4_kadt
          +-> mtype_variant->adt[adt]
            +-> net4_kadt_add (mtype_add)
              +-> n = rcu_dereference_bh(hbucket(t, key))
              +-> if (n->pos >= n->size)
                +-> old = n
              +-> if (old != ERR_PTR(-ENOENT))
                +-> kfree_rcu(old, rcu)
    
    That can in it's turn lead to possible rcu free list corruption if this
    double fried memory is reused just after rcu_free and before actual rcu
    callback.
    
    note1: The patch [1] has a reproducer but, sadly, I was unable to
    reproduce the situation, even adding mdelay(100) to ip_set_add/del/test.
    
    note2: All other patches are fixups to the original fixing patch.
    
    note3: We don't have proof that this a fix to original issue, we only
    know that ipset hash buckets were several times seen double freed just
    before crash, which might indirectly indicate that the original problem
    is related to ipset hash buckets.
    
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    https://virtuozzo.atlassian.net/browse/PSBM-155867
    
    Alexander Maltsev (1):
      netfilter: ipset: Add list flush to cancel_gc
    
    Eric Dumazet (1):
      netns: add pre_exit method to struct pernet_operations
    
    Jozsef Kadlecsik (5):
      netfilter: ipset: fix race condition between swap/destroy and kernel
        side add/del/test [1]
      netfilter: ipset: fix performance regression in swap operation
      netfilter: ipset: Missing gc cancellations fixed
      netfilter: ipset: Fix race between namespace cleanup and gc in the
        list:set type
      netfilter: ipset: Fix suspicious rcu_dereference_protected()
---
 net/netfilter/ipset/ip_set_core.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 61d457961708..d7a4a42a2031 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -56,12 +56,13 @@ MODULE_DESCRIPTION("core IP set support");
 MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_IPSET);
 
 /* When the nfnl mutex or ip_set_ref_lock is held: */
-#define ip_set_dereference(p)		\
-	rcu_dereference_protected(p,	\
+#define ip_set_dereference(inst)	\
+	rcu_dereference_protected((inst)->ip_set_list,	\
 		lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET) || \
-		lockdep_is_held(&ip_set_ref_lock))
+		lockdep_is_held(&ip_set_ref_lock) || \
+		(inst)->is_deleted)
 #define ip_set(inst, id)		\
-	ip_set_dereference((inst)->ip_set_list)[id]
+	ip_set_dereference(inst)[id]
 #define ip_set_ref_netlink(inst,id)	\
 	rcu_dereference_raw((inst)->ip_set_list)[id]
 #define ip_set_dereference_nfnl(p)	\
@@ -989,7 +990,7 @@ ip_set_create(struct sock *ctnl, struct sk_buff *skb,
 		if (!list)
 			goto cleanup;
 		/* nfnl mutex is held, both lists are valid */
-		tmp = ip_set_dereference(inst->ip_set_list);
+		tmp = ip_set_dereference(inst);
 		memcpy(list, tmp, sizeof(struct ip_set *) * inst->ip_set_max);
 		rcu_assign_pointer(inst->ip_set_list, list);
 		/* Make sure all current packets have passed through */


More information about the Devel mailing list