[CRIU] [PATCH] kernel: reduce required permission for prctl_set_mm

Andrew Vagin avagin at parallels.com
Wed Feb 12 15:08:57 PST 2014


On Wed, Feb 12, 2014 at 01:50:35PM -0800, Kees Cook wrote:
> On Wed, Feb 12, 2014 at 1:32 PM, Andrew Morton
> <akpm at linux-foundation.org> wrote:
> > On Wed, 12 Feb 2014 19:40:11 +0400 Andrey Vagin <avagin at openvz.org> wrote:
> >
> >> Currently prctl_set_mm requires the global CAP_SYS_RESOURCE,
> >> this patch reduce requiremence to CAP_SYS_RESOURCE in the current
> >> namespace.
> >>
> >> When we restore a task we need to set up text, data and data heap sizes
> >> from userspace to the values a task had at checkpoint time.
> >>
> >> Currently we can not restore these parameters, if a task lives in
> >> a non-root user name space, because it has no capabilities in the
> >> parent namespace.
> >>
> >> prctl_set_mm() changes parameters of the current task and doesn't affect
> >> other tasks.
> >>
> >> This patch affects the RLIMIT_DATA limit, because a consumtiuon is
> >> calculated relatively to mm->end_data, mm->start_data, mm->start_brk.
> >
> > I can't for the life of me work out what you were trying to say here.
> > Please fix and resend this paragraph?
> >
> >> rlim = rlimit(RLIMIT_DATA);
> >> if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
> >>               (mm->end_data - mm->start_data) > rlim)
> >>       goto out;
> >>
> >> This limit affects calls to brk() and sbrk(), but it doesn't affect
> >> mmap. So I think requirement of CAP_SYS_RESOURCE in the current
> >> namespace is enough for this limit.
> >>
> >> ...
> >>
> >> Cc: security at kernel.org
> >
> > That list is for reporting kernel security bugs.
> >
> >>
> >> --- a/kernel/sys.c
> >> +++ b/kernel/sys.c
> >> @@ -1701,7 +1701,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> >>       if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
> >>               return -EINVAL;
> >>
> >> -     if (!capable(CAP_SYS_RESOURCE))
> >> +     if (!ns_capable(current_user_ns(), CAP_SYS_RESOURCE))
> >>               return -EPERM;
> >>
> >>       if (opt == PR_SET_MM_EXE_FILE)
> >
> > This looks harmless.
> 
> I want to be convinced of this, but weakening this cap check seems
> like an easy way for a process to hide itself trivially from the real
> root user. It can change it's exe file link, and dodge RLIMIT_DATA by
> changing the brk addresses. The whole reason this cap check was there
> was to stop that kind of thing. Limiting it to a namespace isn't great
> since USER_NS means unprivileged processes can enter a new NS as the
> NS root user.

All what you are describing here we are doing on restoring tasks. We
need a way how to restore these parameters. One of our targets is to be
able to dump and restore Linux Containers. All processes of a container
live in a separate set of namespaces.

I was thinking to restore these parameters before entering into userns,
but this idea failed, because a process can't enter in pidns, but pidns
must be created in userns...


>> It can change it's exe file link
We can change memory content with help of ptrace. So if we want to hide
a process, we can execute another process and inject our code into it.

It can be equivalent to changing exe file link. Yes, it's a bit
harder, but we can do that even without this patch.

>> dodge RLIMIT_DATA

This limit affects calls to brk(2) and sbrk(2). But a task can use mmap() to
allocate memory. How is this limit used?

Sorry if I miss something.

> 
> -Kees
> 
> -- 
> Kees Cook
> Chrome OS Security


More information about the CRIU mailing list