[Devel] [PATCH VZ10 v2 11/12] ve_namespace: Reorder VE namespace creation before other namespaces

Konstantin Khorenko khorenko at virtuozzo.com
Fri Dec 12 19:46:00 MSK 2025


On 12/10/25 11:34, Pavel Tikhomirov wrote:
> On clone3() we first create and switch to user namespace, when ve
> namespace owned by this user namespace and switch to it, and only then
> create other namespaces, this way we can make sure that all the
> Container namespaces created together with VE namespace have
> get_exec_env() pointing to new VE namespace.
> 
> On unshare() we can't easily enforce it, as this code path requires
> creating all namespaces first and only then joins them (basically
> namespace change either happens for all requested namespaces or it does
> not happen atomically). So while in this patch we reorder it for code
> consistency, it does not really work, so we prohibit using unshare()
> with CLONE_NEWVE together with non CLONE_NEWUSER namespaces.
> 
> Fixes: 8a771a3d6bea ("ve: Introduce VE namespace")
> https://virtuozzo.atlassian.net/browse/VSTOR-118289
> Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
> 
> Feature: ve: ve generic structures
> ---
> v2: new path
> ---
>   kernel/fork.c | 46 +++++++++++++++++++++++++++++-----------------
>   1 file changed, 29 insertions(+), 17 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 5ce4c2ac91c1..b4e09af18288 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2389,15 +2389,15 @@ __latent_entropy struct task_struct *copy_process(
>   	retval = copy_mm(clone_flags, p);
>   	if (retval)
>   		goto bad_fork_cleanup_signal;
> -	retval = copy_namespaces(clone_flags, p);
> +	retval = copy_ve_ns(clone_flags, p);
>   	if (retval)
>   		goto bad_fork_cleanup_mm;
> -	retval = copy_ve_ns(clone_flags, p);
> +	retval = copy_namespaces(clone_flags, p);
>   	if (retval)
> -		goto bad_fork_cleanup_namespaces;
> +		goto bad_fork_cleanup_ve_namespace;
>   	retval = copy_io(clone_flags, p);
>   	if (retval)
> -		goto bad_fork_cleanup_ve_namespace;
> +		goto bad_fork_cleanup_namespaces;
>   	retval = copy_thread(p, args);
>   	if (retval)
>   		goto bad_fork_cleanup_io;
> @@ -2652,10 +2652,10 @@ __latent_entropy struct task_struct *copy_process(
>   bad_fork_cleanup_io:
>   	if (p->io_context)
>   		exit_io_context(p);
> -bad_fork_cleanup_ve_namespace:
> -	exit_ve_namespace(p);
>   bad_fork_cleanup_namespaces:
>   	exit_task_namespaces(p);
> +bad_fork_cleanup_ve_namespace:
> +	exit_ve_namespace(p);
>   bad_fork_cleanup_mm:
>   	if (p->mm) {
>   		mm_clear_owner(p->mm, p);
> @@ -3237,6 +3237,18 @@ static int check_unshare_flags(unsigned long unshare_flags)
>   			return -EINVAL;
>   	}
>   
> +	/*
> +	 * Unhsare creates all namespaces first and only then switches to them.
             ^^^
Unshare (typo)

> +	 * So get_exec_env() yet returns previous VE while we are creating
> +	 * other namespaces. That leads to network and mount namespace
> +	 * initialized incorrectly, having ->owner_ve links set to previous VE.
> +	 * To avoid confusion, forbid unshare of new VE together with other
> +	 * namespaces. CLONE_NEWUSER is allowed as it should own VE namespace,
> +	 * not vice versa.
> +	 */
> +	if (unshare_flags & CLONE_NEWVE && unshare_flags & ~CLONE_NEWUSER)
> +		return -EINVAL;

1. as far as i see you never can pass this check in case you have CLONE_NEWVE set:
if unshare_flags has even only CLONE_NEWVE, then (unshare_flags & ~CLONE_NEWUSER) will already be 
true. Even with no other namespaces bits.

2. Not sure, but do we really need to prohibit ALL flags or only namespace flags?
CLONE_FS ? CLONE_VFORK ?

May be

      1 │if (unshare_flags & CLONE_NEWVE) {
      2 │    unsigned long ns_flags = unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS |
      3 │                                               CLONE_NEWIPC | CLONE_NEWNET |
      4 │                                               CLONE_NEWPID | CLONE_NEWCGROUP |
      5 │                                               CLONE_NEWTIME);
      6 │    if (ns_flags)
      7 │        return -EINVAL;
      8 │}


> +
>   	return 0;
>   }
>   
> @@ -3342,25 +3354,20 @@ int ksys_unshare(unsigned long unshare_flags)
>   	err = unshare_userns(unshare_flags, &new_cred);
>   	if (err)
>   		goto bad_unshare_cleanup_fd;
> +	err = unshare_ve_namespace(unshare_flags, &new_ve_ns, new_cred);
> +	if (err)
> +		goto bad_unshare_cleanup_cred;
>   	err = unshare_nsproxy_namespaces(unshare_flags, &new_nsproxy,
>   					 new_cred, new_fs);
>   	if (err)
> -		goto bad_unshare_cleanup_cred;
> -	err = unshare_ve_namespace(unshare_flags, &new_ve_ns, new_cred);
> -	if (err) {
> -		if (new_nsproxy)
> -			free_nsproxy(new_nsproxy);
> -		goto bad_unshare_cleanup_cred;
> -	}
> +		goto bad_unshare_cleanup_ve_namespace;
>   
>   	if (new_cred) {
>   		err = set_cred_ucounts(new_cred);
>   		if (err) {
>   			if (new_nsproxy)
>   				free_nsproxy(new_nsproxy);
> -			if (new_ve_ns)
> -				free_ve_ns(new_ve_ns);
> -			goto bad_unshare_cleanup_cred;
> +			goto bad_unshare_cleanup_ve_namespace;
>   		}
>   	}
>   
> @@ -3380,8 +3387,10 @@ int ksys_unshare(unsigned long unshare_flags)
>   		if (new_nsproxy)
>   			switch_task_namespaces(current, new_nsproxy);
>   
> -		if (new_ve_ns)
> +		if (new_ve_ns) {
>   			switch_ve_namespace(current, new_ve_ns);
> +			new_ve_ns = NULL;
> +		}
>   
>   		task_lock(current);
>   
> @@ -3410,6 +3419,9 @@ int ksys_unshare(unsigned long unshare_flags)
>   
>   	perf_event_namespaces(current);
>   
> +bad_unshare_cleanup_ve_namespace:
> +	if (new_ve_ns)
> +		free_ve_ns(new_ve_ns);
>   bad_unshare_cleanup_cred:
>   	if (new_cred)
>   		put_cred(new_cred);



More information about the Devel mailing list