[CRIU] [PATCH 1/2] usernsd: The way to restore priviledged stuff in userns

Andrew Vagin avagin at parallels.com
Thu Feb 12 14:02:01 PST 2015


On Thu, Feb 12, 2015 at 01:39:15PM +0300, Pavel Emelyanov wrote:
> We have collected a good set of calls that cannot be done inside
> user namespaces, but we need to [1]. Some of them has already
> being addressed, like prctl mm bits restore, but some are not.
> 
> I'm pretty sceptical about the ability to relax the security
> checks on quite a lot of them (e.g. open-by-handle is indeed a
> very dangerous operation if allowed to unpriviledged user), so
> we need some way to call those things even in user namespaces.
> 
> The good news about it its that all the calls I've found operate
> on file descriptors this way or another. So if we had a process,
> that lived outside of user namespace, we could ask one to do the
> high priority operation we need and exchange the affected file 
> descriptor via unix socket.
> 
> So the usernsd is the one doing exactly this. It starts before we
> create the user namespace and accepts requests via unix socket.
> Clients (the processes we restore) send him the functions they
> want to call, the descriptor they want to operate on and the
> arguments blob. Optionally, they can request some file descriptor
> back after the call.
> 
> In non usernamespace case the daemon is not started and the calls
> are done right in the requestor's process environment.
> 
> In the next patch there's an example of how to use this daemon
> to do the priviledged SO_SNDBUFFORCE/_RCVBUFFORCE sockopt on
> a socket.
> 
> [1] http://criu.org/UserNamespace
> 
> Signed-off-by: Pavel Emelyanov <xemul at parallels.com>

....

> +static inline void unsc_msg_init(struct unsc_msg *m, uns_call_t *c,
> +		int *x, void *arg, size_t asize, int fd)
> +{
> +	m->h.msg_iov = m->iov;
> +	m->h.msg_iovlen = 2;
> +
> +	m->iov[0].iov_base = c;
> +	m->iov[0].iov_len = sizeof(*c);
> +	m->iov[1].iov_base = x;
> +	m->iov[1].iov_len = sizeof(*x);
> +
> +	if (arg) {
> +		m->iov[2].iov_base = arg;
> +		m->iov[2].iov_len = asize;
> +		m->h.msg_iovlen++;
> +	}
> +
> +	m->h.msg_name = NULL;
> +	m->h.msg_namelen = 0;
> +	m->h.msg_flags = 0;
> +
> +	if (fd == -1) {

We save a return code in fd, so I think it's better to check that fd
isn't negative.

> +		m->h.msg_control = NULL;
> +		m->h.msg_controllen = 0;
> +	} else {
> +		struct cmsghdr *ch;
> +
> +		m->h.msg_control = &m->c;
> +		m->h.msg_controllen = sizeof(m->c);
> +		ch = CMSG_FIRSTHDR(&m->h);
> +		ch->cmsg_len = CMSG_LEN(sizeof(int));
> +		ch->cmsg_level = SOL_SOCKET;
> +		ch->cmsg_type = SCM_RIGHTS;
> +		*((int *)CMSG_DATA(ch)) = fd;
> +	}
> +}
...

> +int start_usernsd(void)
> +{
> +	int sk[2];
> +
> +	if (!(root_ns_mask & CLONE_NEWUSER))
> +		return 0;
> +
> +	/*
> +	 * Seqpacket to
> +	 *
> +	 * a) Help daemon distinguish individual requests from
> +	 *    each other easily. Stream socket require manual
> +	 *    messages boundaries.
> +	 *
> +	 * b) Make callers note the damon death by seeing the
> +	 *    disconnected socket. In case of dgram socket
> +	 *    callers would just get stuck in receiving the
> +	 *    responce.
> +	 */
> +
> +	if (socketpair(PF_UNIX, SOCK_SEQPACKET, 0, sk)) {
> +		pr_perror("Can't make usernsd socket");
> +		return -1;
> +	}
> +
> +	usernsd_pid = fork();

We need to handle errors here.

> +	if (usernsd_pid == 0) {
> +		int ret;
> +
> +		close(sk[0]);
> +		ret = usernsd(sk[1]);
> +		exit(ret);
> +	}
> +
> +	close(sk[1]);
> +	install_service_fd(USERNSD_SK, sk[0]);

and here

> +	close(sk[0]);
> +
> +	return 0;
> +}
> +


More information about the CRIU mailing list