[CRIU] [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces

Eric W. Biederman ebiederm at xmission.com
Mon Jul 25 06:18:42 PDT 2016


"Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:

> Hi Andrey,
>
> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>> <mtk.manpages at gmail.com> wrote:
>>> Hi Andrey,
>>>
>>>
>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>
>>>> On Thu, Jul 21, 2016 at 04:41:12PM +0200, Michael Kerrisk (man-pages)
>>>> wrote:
>>>>>
>>>>> Hi Andrey,
>>>>>
>>>>> On 07/14/2016 08:20 PM, Andrey Vagin wrote:
>>>>
>>>>
>>>> <snip>
>>>>
>>>>>
>>>>> Could you add here an of the API in detail: what do these FDs refer to,
>>>>> and how do you use them to solve the use case? And could you you add
>>>>> that info to the commit messages please.
>>>>
>>>>
>>>> Hi Michael,
>>>>
>>>> A patch for man-pages is attached. It adds the following text to
>>>> namespaces(7).
>>>>
>>>> Since  Linux 4.X, the following ioctl(2) calls are supported for names‐
>>>> pace file descriptors.  The correct syntax is:
>>>>
>>>>       fd = ioctl(ns_fd, ioctl_type);
>>>>
>>>> where ioctl_type is one of the following:
>>>>
>>>> NS_GET_USERNS
>>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>>       pace.
>>>>
>>>> NS_GET_PARENT
>>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>>       ing.
>
> For each of the above, I think it is worth mentioning that the
> close-on-exec flag is set for the returned file descriptor.

Hmm.  That is an odd default.

>>>>
>>>> In addition to generic ioctl(2) errors, the following specific ones can
>>>> occur:
>>>>
>>>> EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
>>>>
>>>> EPERM  The  requested  namespace  is  outside  of the current namespace
>>>>       scope.
>
> Perhaps add "and the caller does not have CAP_SYS_ADMIN" in the initial
> user namespace"?

Having looked at that bit of code I don't think capabilities really
have a role to play.

>>>> ENOENT ns_fd refers to the init namespace.
>>>
>>>
>>> Thanks for this. But still part of the question remains unanswered.
>>> How do we (in user-space) use the file descriptors to answer any of
>>> the questions that this patch series was designed to solve? (This
>>> info should be in the commit message and the man-pages patch.)
>>
>> I'm sorry, but I am not sure that I understand what you ask.
>>
>> Here are the origin questions:
>> Someone else then asked me a question that led me to wonder about
>> generally introspecting on the parental relationships between user
>> namespaces and the association of other namespaces types with user
>> namespaces. One use would be visualization, in order to understand the
>> running system. Another would be to answer the question I already
>> mentioned: what capability does process X have to perform operations
>> on a resource governed by namespace Y?
>>
>> Here is an example which shows how we can get the owning namespace
>> inode number by using these ioctl-s.
>>
>> $ ls -l /proc/13929/ns/pid
>> lrwxrwxrwx 1 root root 0 Jul 22 21:03 /proc/13929/ns/pid -> 'pid:[4026532228]'
>>
>> $ ./nsowner /proc/13929/ns/pid
>> user:[4026532227]
>>
>> The owning user namespace for pid:[4026532228] is user:[4026532227].
>>
>> The nsowner  tool is cimpiled from this code:
>>
>> int main(int argc, char *argv[])
>> {
>>         char buf[128], path[] = "/proc/self/fd/0123456789";
>>         int ns, uns, ret;
>>
>>         ns = open(argv[1], O_RDONLY);
>>         if (ns < 0)
>>                 return 1;
>>
>>         uns = ioctl(ns, NS_GET_USERNS);
>>         if (uns < 0)
>>                 return 1;
>>
>>         snprintf(path, sizeof(path), "/proc/self/fd/%d", uns);
>>         ret = readlink(path, buf, sizeof(buf) - 1);
>>         if (ret < 0)
>>                 return 1;
>>         buf[ret] = 0;
>>
>>         printf("%s\n", buf);
>>
>>         return 0;
>> }
>
> So, from my point of view, the important piece that was missing from
> your commit message was the note to use readlink("/proc/self/fd/%d")
> on the returned FDs. I think that detail needs to be part of the
> commit message (and also the man page text). I think it even be
> helpful to include the above program as part of the commit message:
> it helps people more quickly grasp the API.

Please, please make the standard way to compare these things fstat.
That is much less magic than a symlink, and a little more future proof.
Possibly even kcmp.

At some point we will care about migrating a migrating sub-container and we
may have to have some minor changes.

Eric



More information about the CRIU mailing list