[Devel] [PATCH v4 1/7] user namespace support for upstream containers

Fri May 17 13:14:45 PDT 2013

On 05/17/2013 04:11 AM, Kir Kolyshkin wrote:
>> +    if ((arg->userns_p != -1) && (read(arg->userns_p, &ret,
>> sizeof(ret)) != sizeof(ret))) {
>> +        logger(-1, errno, "Cannot read from user namespace pipe");
> 
> We don't close arg->userns_p in case of error here.
> And it seems we do not close the other end of it here.
> I had that concern before, you said we're closing all fds somewhere,
> I'd rather close it explicitly here to be on the safe side
> (and pass both fds to the child to be able to do so).
> 

No need. We also don't pass the other side of the pipe to the child for
the other 2 pipes that we use. We don't close all fds "somewhere".

We close all fds in close_fds right before we call exec_container_init.

As for the error here, I changed it, even though it is not strictly
needed, since we always exit on errors here.

>> +        }
>>       }
>> +    close(userns_p[0]);
>> +    close(userns_p[1]);
>>         snprintf(procpath, STR_SIZE, "/proc/%d/ns/net", ret);
>>       ret = symlink(procpath, ctpath);
>>       if (ret) {
>>           logger(-1, errno, "Can't symlink into netns file %s", ctpath);
>> -        destroy_container(arg->veid);
>> -        return VZ_RESOURCE_ERROR;
>> +        goto err_noclose;
>>       }
>>         return 0;
>> +err:
>> +    close(userns_p[0]);
>> +    close(userns_p[1]);
>> +    /* FIXME: remove ourselves from container first */
>> +err_noclose:
>> +    destroy_container(arg->veid);
>> +    return VZ_RESOURCE_ERROR;
>>   }
> 
> I don't really enjoy this mess of close()'s, goto's and returns,
> it can be done a bit simple IMHO.
> 
> But looks like there are no logic errors in here, so OK.
>
Well, I changed this to avoid the goto's. We have more duplicate code,
but less confusion.

>>       char path[STR_SIZE];
>> +    char upath[STR_SIZE];
>>       struct stat st;
>> +    unsigned long *local_uid = param->res.misc.local_uid;
>>         ret = container_init();
>>       if (ret) {
>> @@ -592,6 +813,9 @@ int ct_do_open(vps_handler *h, vps_param *param)
>>       if (snprintf(path, sizeof(path), "/proc/%d/ns/pid", getpid()) < 0)
>>           return VZ_RESOURCE_ERROR;
>>   +    if (snprintf(upath, sizeof(upath), "/proc/%d/ns/user",
>> getpid()) < 0)
>> +        return VZ_RESOURCE_ERROR;
>> +
> 
> Same, I fail to see how snprintf() can ever return negative value here.
> 
I fail as well, but in case it does, we will return an error. I don't
expect this branch to be ever taken, but it is there as a safeguard. I
will remove it if you want me to. I removed the other.

>>       ret = mkdir(NETNS_RUN_DIR,
>> S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH);
>>         if (ret && (errno != EEXIST)) {
>> @@ -601,6 +825,20 @@ int ct_do_open(vps_handler *h, vps_param *param)
>>       }
>>         h->can_join_pidns = !stat(path, &st);
>> +    /*
>> +     * Being able to join the user namespace is a good indication
>> that the
>> +     * user namespace is complete. For a long time, the user namespace
>> +     * existed, but were far away from being feature complete.  When
>> +     * running in such a kernel, joining the user namespace will just
>> +     * cripple our container, since we won't be able to do anything.
>> It is
>> +     * only good for people who are okay running containers as root.
>> +     *
>> +     * It is not enough, however, for user namespaces to be present
>> in the
>> +     * kernel. The container must have been setup to use it (we need the
>> +     * mapped user to own the files, etc. So we also need to find
>> suitable
>> +     * configuration in the config files.
>> +     */
>> +    h->can_join_userns = !stat(upath, &st) && local_uid;
> 
> Maybe && local_gid as well?
> 
Here we are just testing if it is enable or disabled. It will be clear
later on when we document it, but local_uid is the only thing we need to
test. If it is disabled, we should not even look at local_gid, since we
are, already, essentially root.

local_uid enabled and local_gid disabled is a valid, though weird,
configuration.