[Devel] [PATCH RHEL10 COMMIT] ve/unshare: allow CLONE_NEWVE with other namespace flags
Konstantin Khorenko
khorenko at virtuozzo.com
Thu May 14 18:52:40 MSK 2026
The commit is pushed to "branch-rh10-6.12.0-55.52.1.5.x.vz10-ovz" and will appear at git at bitbucket.org:openvz/vzkernel.git
after rh10-6.12.0-55.52.1.5.24.vz10
------>
commit fb916bdf14247ca3dff814fbf42ce63c425f2216
Author: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Date: Wed Apr 29 15:41:40 2026 +0200
ve/unshare: allow CLONE_NEWVE with other namespace flags
The check_unshare_flags() previously rejected CLONE_NEWVE in combination
with anything other than CLONE_NEWUSER. The justification was that
get_exec_env() still returned the previous ve while unshare was creating
new mount and network namespaces, so their ->owner_ve / ve_owner links
would point at the wrong ve. The previous patch fixes that by threading
the freshly allocated ve_namespace from unshare_ve_namespace() down to
copy_mnt_ns() and copy_net_ns() via unshare_nsproxy_namespaces(), so
the guard is no longer needed.
Drop it. unshare(CLONE_NEWUSER | CLONE_NEWVE | CLONE_NEWNS |
CLONE_NEWNET | ...) now works in a single syscall and the resulting
namespaces are owned by the new ve.
https://virtuozzo.atlassian.net/browse/VSTOR-129744
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Reviewed-by: Vasileios Almpanis <vasileios.almpanis at virtuozzo.com>
Feature: ve: ve generic structures
======
Patchset description:
ve: fix owner_ve of net/mnt namespaces created together with CLONE_NEWVE
When CLONE_NEWVE is combined with CLONE_NEWNET and/or CLONE_NEWNS in a
single clone3() or unshare(), copy_net_ns() and copy_mnt_ns() resolve
the owning ve via get_exec_env(), which still points at the parent ve
at that point. The freshly created net/mnt namespaces end up wired to
the wrong ve, and unshare(CLONE_NEWVE | CLONE_NEW{NS,NET}) is rejected
outright by check_unshare_flags().
Fix it by threading the new ve from copy_namespaces() and
unshare_nsproxy_namespaces() down into copy_net_ns() and copy_mnt_ns(),
so the correct ve is charged for the new netns and for every mount in
the new mntns.
Patches 1-4 are pure plumbing (signature changes, no behaviour change).
Patch 5 is the actual fix that forwards the new ve. Patch 6 drops the
now-redundant CLONE_NEWVE-alone restriction in check_unshare_flags().
Patch 7 exposes ve.mnt_nr via cgroupfs to make per-ve mount accounting
observable from userspace. Patch 8 adds a selftest covering both the
clone3() and unshare() paths.
Verified with crash on a vzctl-started container: task_ve,
nsproxy->net_ns->owner_ve, nsproxy->mnt_ns->ve_owner and
nsproxy->mnt_ns->root.ve_owner all resolve to the new ve.
The new selftest passes both cases.
---
kernel/fork.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index a1b9fec275799..dfd074a794b5a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -3237,20 +3237,6 @@ static int check_unshare_flags(unsigned long unshare_flags)
return -EINVAL;
}
- /*
- * Unshare creates all namespaces first and only then switches to them.
- * So get_exec_env() yet returns previous VE while we are creating
- * other namespaces. That leads to network and mount namespace
- * initialized incorrectly, having ->owner_ve links set to previous VE.
- * To avoid confusion, only allow CLONE_NEWVE together with CLONE_NEWUSER.
- * CLONE_NEWUSER is allowed as it should own VE namespace, not vice versa.
- */
- if (unshare_flags & CLONE_NEWVE) {
- unsigned long allowed_with_ve = CLONE_NEWVE | CLONE_NEWUSER;
- if (unshare_flags & ~allowed_with_ve)
- return -EINVAL;
- }
-
return 0;
}
More information about the Devel
mailing list