[Devel] Re: How much of a mess does OpenVZ make? ; ) Was: What can OpenVZ do?

Serge E. Hallyn serue at us.ibm.com
Tue Mar 10 16:28:19 PDT 2009


Quoting Alexey Dobriyan (adobriyan at gmail.com):
> On Thu, Feb 26, 2009 at 06:57:55PM +0300, Alexey Dobriyan wrote:
> > On Thu, Feb 12, 2009 at 03:04:05PM -0800, Dave Hansen wrote:
> > > dave at nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... kernel/cpt/ | diffstat 
> 
> > >  47 files changed, 20702 insertions(+)
> > > 
> > > One important thing that leaves out is the interaction that this code
> > > has with the rest of the kernel.  That's critically important when
> > > considering long-term maintenance, and I'd be curious how the OpenVZ
> > > folks view it. 
> > 
> > OpenVZ as-is in some cases wants some functions to be made global
> > (and if C/R code will be modular, exported). Or probably several
> > iterators added.
> > 
> > But it's negligible amount of changes compared to main code.
> 
> Here is what C/R code wants from pid allocator.

Yup.  Agreed.  That is exactly what I would have thought it would look
like.  We may have found the first bit of helper code we can all agree
on for c/r?  :)

Eric may disagree as he wanted to play games with
/proc/sys/kernel/pid_max, but that seems hard to pull off for nested
pid namespaces.


thanks,
-serge

> With the introduction of hierarchical PID namespaces, struct pid can
> have not one but many numbers -- tuple (pid_0, pid_1, ..., pid_N),
> where pid_i is pid number in pid_ns which has level i.
> 
> Now root pid_ns of container has level n -- numbers from level n to N
> inclusively should be dumped and restored.
> 
> During struct pid creation first n-1 numbers can be anything, because the're
> outside of pid_ns, but the rest should be the same.
> 
> Code will be ifdeffed and commented, but anyhow, this is an example of
> change C/R will require from the rest of the kernel.
> 
> 
> 
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -182,6 +182,34 @@ static int alloc_pidmap(struct pid_namespace *pid_ns)
>  	return -1;
>  }
> 
> +static int set_pidmap(struct pid_namespace *pid_ns, pid_t pid)
> +{
> +	int offset;
> +	struct pidmap *map;
> +
> +	offset = pid & BITS_PER_PAGE_MASK;
> +	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
> +	if (unlikely(!map->page)) {
> +		void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
> +		/*
> +		 * Free the page if someone raced with us
> +		 * installing it:
> +		 */
> +		spin_lock_irq(&pidmap_lock);
> +		if (map->page)
> +			kfree(page);
> +		else
> +			map->page = page;
> +		spin_unlock_irq(&pidmap_lock);
> +		if (unlikely(!map->page))
> +			return -ENOMEM;
> +	}
> +	if (test_and_set_bit(offset, map->page))
> +		return -EBUSY;
> +	atomic_dec(&map->nr_free);
> +	return pid;
> +}
> +
>  int next_pidmap(struct pid_namespace *pid_ns, int last)
>  {
>  	int offset;
> @@ -239,7 +267,7 @@ void free_pid(struct pid *pid)
>  	call_rcu(&pid->rcu, delayed_put_pid);
>  }
> 
> -struct pid *alloc_pid(struct pid_namespace *ns)
> +struct pid *alloc_pid(struct pid_namespace *ns, int *cr_nr, unsigned int cr_level)
>  {
>  	struct pid *pid;
>  	enum pid_type type;
> @@ -253,7 +281,10 @@ struct pid *alloc_pid(struct pid_namespace *ns)
> 
>  	tmp = ns;
>  	for (i = ns->level; i >= 0; i--) {
> -		nr = alloc_pidmap(tmp);
> +		if (cr_nr && ns->level - i <= cr_level)
> +			nr = set_pidmap(tmp, cr_nr[ns->level - i]);
> +		else
> +			nr = alloc_pidmap(tmp);
>  		if (nr < 0)
>  			goto out_free;
> 
> _______________________________________________
> Containers mailing list
> Containers at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list