[Devel] [PATCH 11/14] net: Primitives to enable conntrack allocation

Alexander Mikhalitsyn alexander.mikhalitsyn at virtuozzo.com
Fri Apr 30 15:45:39 MSK 2021


From: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>

Patchset description:

Create conntrack structures only if they are really needed

Allocate conntracks only after there is a rule which uses them.

v2: Allow after there is a rule and never prohibit.

khorenko@: the idea behind all of this:
we want to provide the possibility to Containers to use iptables rules which
require conntracks. At the same time we'd like to avoid problem we currently
have in case we just enable conntracks allocation for all Containers and
Hardware Node by default:
1) in case conntracks are really not used by a CT - structures are still
   allocated decreasing the performance
2) number of conntracks in the system is limited => DDoS is possible

So we decided to implement a feature:
not to allocate conntracks until there are rules in the netspace which require
them.

Disadvantage: if a user on live system loads iptables rule which requires
conntracks, connections which are already alive can be handled not that
precise. i believe this is OK.

Once conntracks allocation is enabled, it cannot be disabled until reboot/CT
restart. This is done in order to:
a) simplify the code
b) to have a possbility to unconditionally enable conntracks, for example for
   userspace conntrack users (http://conntrack-tools.netfilter.org/manual.html)
c) adding a new iptables rule is implemented in the following way:
   - all rules are unloaded
   - new rule is added to the bunch of rules
   - all rules (including the new one) are uploaded to the kernel
   => each new rule add results in conntrack allocation disable/enable =>
   race window for unhandled connections

=======================
This patch description:

Allocation are allowed only when there are conntracks users.
By default they are prohibited.

https://jira.sw.ru/browse/PSBM-51050

Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
Reviewed-by: Andrei Vagin <avagin at virtuozzo.com>

+++
ve/net: Move net->ct.can_alloc check up to resolve_normal_ct()

Move it up on stack to break creation of a CT earlier.
This avoids us to search in CT hashes and speeds work up.

So, now nf_conntrack_alloc() creates a CT certanly,
__nf_conntrack_alloc() doesn't return NULL and it does not
need to be external.

Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

To be merged to commit 874e7b5c6eb9
"net: Primitives to enable conntrack allocation"

https://jira.sw.ru/browse/PSBM-54823

Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>

+++
ve/net: Do not initialize netns_ct::can_alloc twice

It's already initialized to zero during net creation
in net_alloc(), so do not do that twice.

Also, some conntrack allowing modules do not depend
on nf_conntrack.ko, so it rewrites can_alloc to zero,
if it's loaded later.

(This may be merged with "commit af2b974e4755 "net: Primitives to enable conntrack allocation")

https://jira.sw.ru/browse/PSBM-56500

Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>

=======================

net: Do not allow conntrack if netlink conntrack is requested

The scheme with allowing conntracks suggestes to allow conntrack
only after a rule is inserted. But this place is not inserting
a rule, it's a manual conntrack creation.

Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
(cherry picked from commit 550b98d291cb0fb0b0270ab83dfc0fb6f48aadfe)

VZ 8 rebase part https://jira.sw.ru/browse/PSBM-127783

Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
---
 include/net/net_namespace.h       | 10 ++++++++++
 include/net/netns/conntrack.h     |  1 +
 net/netfilter/nf_conntrack_core.c |  6 ++++++
 net/netfilter/nf_synproxy_core.c  |  1 +
 4 files changed, 18 insertions(+)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 93838c430818..634d107dff8b 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -344,6 +344,16 @@ static inline struct net *read_pnet(const possible_net_t *pnet)
 #define __net_initconst	__initconst
 #endif
 
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+static inline void allow_conntrack_allocation(struct net *net)
+{
+	net->ct.can_alloc = true;
+	smp_wmb(); /* Pairs with rmb in resolve_normal_ct() */
+}
+#else
+static inline void allow_conntrack_allocation(struct net *net) { }
+#endif
+
 int peernet2id_alloc(struct net *net, struct net *peer, gfp_t gfp);
 int peernet2id(struct net *net, struct net *peer);
 bool peernet_has_id(struct net *net, struct net *peer);
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 19bcf4173ccb..1094ad116224 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -106,6 +106,7 @@ struct ct_pcpu {
 
 struct netns_ct {
 	atomic_t		count;
+	bool			can_alloc; /* Initialized in 0 by net_alloc */
 	unsigned int		max;
 	unsigned int		expect_count;
 #ifdef CONFIG_NF_CONNTRACK_EVENTS
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 6ac5168d6c84..3a1057d8c368 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1660,6 +1660,12 @@ resolve_normal_ct(struct nf_conn *tmpl,
 	struct nf_conn *ct;
 	u32 hash;
 
+	if (!state->net->ct.can_alloc) {
+		/* No rules loaded */
+		return 0;
+	}
+	smp_rmb(); /* Pairs with wmb in allow_conntrack_allocation() */
+
 	if (!nf_ct_get_tuple(skb, skb_network_offset(skb),
 			     dataoff, state->pf, protonum, state->net,
 			     &tuple)) {
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index 3996ca086ec2..eae42e67af47 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -340,6 +340,7 @@ static int __net_init synproxy_net_init(struct net *net)
 	struct nf_conn *ct;
 	int err = -ENOMEM;
 
+	allow_conntrack_allocation(net);
 	ct = nf_ct_tmpl_alloc(net, &nf_ct_zone_dflt, GFP_KERNEL);
 	if (!ct)
 		goto err1;
-- 
2.28.0



More information about the Devel mailing list