[Devel] [PATCH RHEL8 COMMIT] net: Primitives to enable conntrack allocation

Konstantin Khorenko khorenko at virtuozzo.com
Mon May 24 16:20:35 MSK 2021


The commit is pushed to "branch-rh8-4.18.0-240.1.1.vz8.5.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-240.1.1.vz8.5.32
------>
commit c91d7ce836f6382071096067b59f60b97658a368
Author: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>
Date:   Mon May 24 16:20:35 2021 +0300

    net: Primitives to enable conntrack allocation
    
    Patchset description:
    
    Create conntrack structures only if they are really needed
    
    Allocate conntracks only after there is a rule which uses them.
    
    v2: Allow after there is a rule and never prohibit.
    
    khorenko@: the idea behind all of this:
    we want to provide the possibility to Containers to use iptables rules which
    require conntracks. At the same time we'd like to avoid problem we currently
    have in case we just enable conntracks allocation for all Containers and
    Hardware Node by default:
    1) in case conntracks are really not used by a CT - structures are still
       allocated decreasing the performance
    2) number of conntracks in the system is limited => DDoS is possible
    
    So we decided to implement a feature:
    not to allocate conntracks until there are rules in the netspace which require
    them.
    
    Disadvantage: if a user on live system loads iptables rule which requires
    conntracks, connections which are already alive can be handled not that
    precise. i believe this is OK.
    
    Once conntracks allocation is enabled, it cannot be disabled until reboot/CT
    restart. This is done in order to:
    a) simplify the code
    b) to have a possbility to unconditionally enable conntracks, for example for
       userspace conntrack users (http://conntrack-tools.netfilter.org/manual.html)
    c) adding a new iptables rule is implemented in the following way:
       - all rules are unloaded
       - new rule is added to the bunch of rules
       - all rules (including the new one) are uploaded to the kernel
       => each new rule add results in conntrack allocation disable/enable =>
       race window for unhandled connections
    
    =======================
    This patch description:
    
    Allocation are allowed only when there are conntracks users.
    By default they are prohibited.
    
    https://jira.sw.ru/browse/PSBM-51050
    
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    
    Reviewed-by: Andrei Vagin <avagin at virtuozzo.com>
    
    +++
    ve/net: Move net->ct.can_alloc check up to resolve_normal_ct()
    
    Move it up on stack to break creation of a CT earlier.
    This avoids us to search in CT hashes and speeds work up.
    
    So, now nf_conntrack_alloc() creates a CT certanly,
    __nf_conntrack_alloc() doesn't return NULL and it does not
    need to be external.
    
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    
    Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    To be merged to commit 874e7b5c6eb9
    "net: Primitives to enable conntrack allocation"
    
    https://jira.sw.ru/browse/PSBM-54823
    
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    
    +++
    ve/net: Do not initialize netns_ct::can_alloc twice
    
    It's already initialized to zero during net creation
    in net_alloc(), so do not do that twice.
    
    Also, some conntrack allowing modules do not depend
    on nf_conntrack.ko, so it rewrites can_alloc to zero,
    if it's loaded later.
    
    (This may be merged with "commit af2b974e4755 "net: Primitives to enable conntrack allocation")
    
    https://jira.sw.ru/browse/PSBM-56500
    
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    
    =======================
    
    net: Do not allow conntrack if netlink conntrack is requested
    
    The scheme with allowing conntracks suggestes to allow conntrack
    only after a rule is inserted. But this place is not inserting
    a rule, it's a manual conntrack creation.
    
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    (cherry picked from vz7 commit ("550b98d291cb net: Primitives to enable
    conntrack allocation"))
    
    VZ 8 rebase part https://jira.sw.ru/browse/PSBM-127783
    
    Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
---
 include/net/net_namespace.h       | 10 ++++++++++
 include/net/netns/conntrack.h     |  1 +
 net/netfilter/nf_conntrack_core.c |  6 ++++++
 net/netfilter/nf_synproxy_core.c  |  1 +
 4 files changed, 18 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 93838c430818..634d107dff8b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -344,6 +344,16 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
 #define __net_initconst	__initconst
 #endif
 
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+static inline void allow_conntrack_allocation(struct net *net)
+{
+	net->ct.can_alloc = true;
+	smp_wmb(); /* Pairs with rmb in resolve_normal_ct() */
+}
+#else
+static inline void allow_conntrack_allocation(struct net *net) { }
+#endif
+
 int peernet2id_alloc(struct net *net, struct net *peer, gfp_t gfp);
 int peernet2id(struct net *net, struct net *peer);
 bool peernet_has_id(struct net *net, struct net *peer);
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 19bcf4173ccb..1094ad116224 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -106,6 +106,7 @@ struct ct_pcpu {
 
 struct netns_ct {
 	atomic_t		count;
+	bool			can_alloc; /* Initialized in 0 by net_alloc */
 	unsigned int		max;
 	unsigned int		expect_count;
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 7deb88926a8c..91eb2f3f0c7d 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1659,6 +1659,12 @@ resolve_normal_ct(struct nf_conn *tmpl,
 	struct nf_conn *ct;
 	u32 hash;
 
+	if (!state->net->ct.can_alloc) {
+		/* No rules loaded */
+		return 0;
+	}
+	smp_rmb(); /* Pairs with wmb in allow_conntrack_allocation() */
+
 	if (!nf_ct_get_tuple(skb, skb_network_offset(skb),
 			     dataoff, state->pf, protonum, state->net,
 			     &tuple)) {
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index 3996ca086ec2..eae42e67af47 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -340,6 +340,7 @@ static int __net_init synproxy_net_init(struct net *net)
 	struct nf_conn *ct;
 	int err = -ENOMEM;
 
+	allow_conntrack_allocation(net);
 	ct = nf_ct_tmpl_alloc(net, &nf_ct_zone_dflt, GFP_KERNEL);
 	if (!ct)
 		goto err1;


More information about the Devel mailing list