[Devel] Re: The issues for agreeing on a virtualization/namespaces implementation.

Eric W. Biederman ebiederm at xmission.com
Thu Feb 9 14:25:35 PST 2006


ebiederm at xmission.com (Eric W. Biederman) writes:

> I think I can boil the discussion down into some of the fundamental
> questions that we are facing.
>
> Currently everyone seems to agree that we need something like
> my namespace concept that isolates multiple resources.
>
> We need these for 
> PIDS
> UIDS
> SYSVIPC
> NETWORK
> UTSNAME
> FILESYSTEM
> etc.
>
> The questions seem to break down into:
> 1) Where do we put the references to the different namespaces?
>    - Do we put the references in a struct container that we reference from
> struct task_struct?
>    - Do we put the references directly in struct task_struct?

Answer in the task_struct.  It is the simplest and most flexible
route and the other implementations are still possible.

> 2) What is the syscall interface to create these namespaces?
>    - Do we add clone flags?  
>      (Plan 9 style)
>    - Do we add a syscall (similar to setsid) per namespace?
>      (Traditional unix style)?
>    - Do we in addition add syscalls to manipulate containers generically?

The answer seems to be we decide on a per namespace basis with
additional syscalls being mandatory if we have any additional data to
pass.

> 3) How do we refer to namespaces and containers when we are not members?

I have seen no arguments against referring to namespaces or containers
by global ids.  So it seems we do not need a container id.

> 4) How do we implement each of these namespaces?
>    Besides being maintainable are there other constraints?

Largely quite.  But I have not heard additional constraints.

> 5) How do we control the resource inside a namespace starting
>    from a process that is outside of that namespace?
>    - The filesystem mount namespace gave an interesting answer.
>      So it is quite possible other namespaces will give
>      equally interesting and surprising answers.

Not yet resolved, but a bit of speculation.

> 6) How do we do all of this efficiently without a noticeable impact on
>    performance?
>    - I have already heard concerns that I might be introducing cache
>      line bounces and thus increasing tasklist_lock hold time.
>      Which on big way systems can be a problem.

A little discussion.  At the level of the last few cache line I
think this needs to be addressed when we merge.  Simply not
messing up existing optimizations sounds like a good initial
target.  Basically at this stage trying hard would be a 
premature optimization.

> 7) How do we allow a process inside a container to create containers
>    for it's children?
>    - In general this is trivial but there are a few ugly issues
>      here.

This look mostly like something to be discussed when we merge
namespaces.  But as long as we keep it in mind it is easy.

Eric




More information about the Devel mailing list