[CRIU] Introspecting userns relationships to other namespaces?

Fri Jul 8 20:05:18 PDT 2016

James Bottomley <James.Bottomley at HansenPartnership.com> writes:

> On Fri, 2016-07-08 at 18:52 -0500, Eric W. Biederman wrote:
>> James Bottomley <James.Bottomley at Hansenpartnership.com> writes:
>> 
>> > On July 8, 2016 1:38:19 PM PDT, Andrew Vagin <avagin at virtuozzo.com>
>> > wrote:
>> 
>> > > What do you think about the idea to mount nsfs and be able to 
>> > > look up any alive namespace by inum:
>> > 
>> > I think I like it.  It will give us a way to enter any extant
>> > namespace.  It will work for Eric's fs namespaces as well.  Perhaps 
>> > a /process/ns/<inum> Directory?
>
> As you understood, I meant /proc/ns/<inum> (damn mobile phone
> completions).
>
>> *Shivers*
>> 
>> That makes it very easy to bypass any existing controls that exist 
>> for getting at namespaces.  It is true that everything of that kind 
>> is directory based but still.
>> 
>> Plus I think it would serve as information leak to information 
>> outside of the container.
>> 
>> An operation to get a user namespace file descriptor from some kernel
>> object sounds reasonably sane.
>> 
>> A great big list of things sounds about as scary as it can get.  This 
>> is not the time to be making it easier to escape from containers.
>
> To be honest, I think this argument is rubbish.  If we're afraid of
> giving out a list of all the namespaces, it means we're afraid there's
> some security bug and we're trying to obscure it by making the list
> hard to get.  All we've done is allayed fears about the bug but the
> hackers still know the portals to get through.
>
> If such a bug exists, it will be possible to exploit it by simply
> reconstructing the information from the individual process directories,
> so obscurity doesn't protect us and all it does is give us a false
> sense of security.   If such a bug doesn't exist, then all the security
> mechanisms currently in place (like no re-entry to prior namespace)
> should protect us and we can give out the list.
>
> Let's deal with the world as we'd like it to be (no obscure namespace
> bugs) and accept the consequences and the responsibility for fixing
> them if we turn out to be slightly incorrect.  We'll end up in a far
> better place than security by obscurity would land us.

No.  That is not the fear.  The permission checks on /proc/self/ns/xxx
are different than if the namespace is bind mounted somewhere.

That was done deliberately and with a reasonable amount of forethought.
You are asking to throw those permission checks out.   The answer is no.

Furthermore there is a much clearer reason not to go with a list of all
namespaces. A list of all namespaces breaks CRIU.  As you have described
it the list will change depending upon which machine you restore a
checkpoint on.  I honestly don't know what kind of havoc that will cause
but it is certainly something we won't be able to checkpoint no matter
how hard we try.

A global list of namespaces especially of the kind that you can open
and get a handle to the namespace is just not appropriate.

I know inode numbers comes darn close to names but they aren't really
names and if it comes to it we can figure out how to preserve an
applications view of it all across a checkpoint/restart.  So far it
hasn't proven necessary to preserve any inode numbers across
checkpoint/restart but again it is theoretically possible if it becomes
necessary.

Throwing away checkpoint/restart support for the sake of
checkpoint/restart is a no-go.

Containers fundamentally imply you don't have global visibility,
and that is a good thing.

Eric