[Devel] [PATCH v4 1/7] user namespace support for upstream containers
Glauber Costa
glommer at parallels.com
Fri May 17 13:14:45 PDT 2013
On 05/17/2013 04:11 AM, Kir Kolyshkin wrote:
>> + if ((arg->userns_p != -1) && (read(arg->userns_p, &ret,
>> sizeof(ret)) != sizeof(ret))) {
>> + logger(-1, errno, "Cannot read from user namespace pipe");
>
> We don't close arg->userns_p in case of error here.
> And it seems we do not close the other end of it here.
> I had that concern before, you said we're closing all fds somewhere,
> I'd rather close it explicitly here to be on the safe side
> (and pass both fds to the child to be able to do so).
>
No need. We also don't pass the other side of the pipe to the child for
the other 2 pipes that we use. We don't close all fds "somewhere".
We close all fds in close_fds right before we call exec_container_init.
As for the error here, I changed it, even though it is not strictly
needed, since we always exit on errors here.
>> + }
>> }
>> + close(userns_p[0]);
>> + close(userns_p[1]);
>> snprintf(procpath, STR_SIZE, "/proc/%d/ns/net", ret);
>> ret = symlink(procpath, ctpath);
>> if (ret) {
>> logger(-1, errno, "Can't symlink into netns file %s", ctpath);
>> - destroy_container(arg->veid);
>> - return VZ_RESOURCE_ERROR;
>> + goto err_noclose;
>> }
>> return 0;
>> +err:
>> + close(userns_p[0]);
>> + close(userns_p[1]);
>> + /* FIXME: remove ourselves from container first */
>> +err_noclose:
>> + destroy_container(arg->veid);
>> + return VZ_RESOURCE_ERROR;
>> }
>
> I don't really enjoy this mess of close()'s, goto's and returns,
> it can be done a bit simple IMHO.
>
> But looks like there are no logic errors in here, so OK.
>
Well, I changed this to avoid the goto's. We have more duplicate code,
but less confusion.
>> char path[STR_SIZE];
>> + char upath[STR_SIZE];
>> struct stat st;
>> + unsigned long *local_uid = param->res.misc.local_uid;
>> ret = container_init();
>> if (ret) {
>> @@ -592,6 +813,9 @@ int ct_do_open(vps_handler *h, vps_param *param)
>> if (snprintf(path, sizeof(path), "/proc/%d/ns/pid", getpid()) < 0)
>> return VZ_RESOURCE_ERROR;
>> + if (snprintf(upath, sizeof(upath), "/proc/%d/ns/user",
>> getpid()) < 0)
>> + return VZ_RESOURCE_ERROR;
>> +
>
> Same, I fail to see how snprintf() can ever return negative value here.
>
I fail as well, but in case it does, we will return an error. I don't
expect this branch to be ever taken, but it is there as a safeguard. I
will remove it if you want me to. I removed the other.
>> ret = mkdir(NETNS_RUN_DIR,
>> S_IRWXU|S_IRGRP|S_IXGRP|S_IROTH|S_IXOTH);
>> if (ret && (errno != EEXIST)) {
>> @@ -601,6 +825,20 @@ int ct_do_open(vps_handler *h, vps_param *param)
>> }
>> h->can_join_pidns = !stat(path, &st);
>> + /*
>> + * Being able to join the user namespace is a good indication
>> that the
>> + * user namespace is complete. For a long time, the user namespace
>> + * existed, but were far away from being feature complete. When
>> + * running in such a kernel, joining the user namespace will just
>> + * cripple our container, since we won't be able to do anything.
>> It is
>> + * only good for people who are okay running containers as root.
>> + *
>> + * It is not enough, however, for user namespaces to be present
>> in the
>> + * kernel. The container must have been setup to use it (we need the
>> + * mapped user to own the files, etc. So we also need to find
>> suitable
>> + * configuration in the config files.
>> + */
>> + h->can_join_userns = !stat(upath, &st) && local_uid;
>
> Maybe && local_gid as well?
>
Here we are just testing if it is enable or disabled. It will be clear
later on when we document it, but local_uid is the only thing we need to
test. If it is disabled, we should not even look at local_gid, since we
are, already, essentially root.
local_uid enabled and local_gid disabled is a valid, though weird,
configuration.
More information about the Devel
mailing list