[CRIU] [PATCH] kernel: reduce required permission for prctl_set_mm
Andrew Vagin
avagin at parallels.com
Wed Feb 12 15:08:57 PST 2014
On Wed, Feb 12, 2014 at 01:50:35PM -0800, Kees Cook wrote:
> On Wed, Feb 12, 2014 at 1:32 PM, Andrew Morton
> <akpm at linux-foundation.org> wrote:
> > On Wed, 12 Feb 2014 19:40:11 +0400 Andrey Vagin <avagin at openvz.org> wrote:
> >
> >> Currently prctl_set_mm requires the global CAP_SYS_RESOURCE,
> >> this patch reduce requiremence to CAP_SYS_RESOURCE in the current
> >> namespace.
> >>
> >> When we restore a task we need to set up text, data and data heap sizes
> >> from userspace to the values a task had at checkpoint time.
> >>
> >> Currently we can not restore these parameters, if a task lives in
> >> a non-root user name space, because it has no capabilities in the
> >> parent namespace.
> >>
> >> prctl_set_mm() changes parameters of the current task and doesn't affect
> >> other tasks.
> >>
> >> This patch affects the RLIMIT_DATA limit, because a consumtiuon is
> >> calculated relatively to mm->end_data, mm->start_data, mm->start_brk.
> >
> > I can't for the life of me work out what you were trying to say here.
> > Please fix and resend this paragraph?
> >
> >> rlim = rlimit(RLIMIT_DATA);
> >> if (rlim < RLIM_INFINITY && (brk - mm->start_brk) +
> >> (mm->end_data - mm->start_data) > rlim)
> >> goto out;
> >>
> >> This limit affects calls to brk() and sbrk(), but it doesn't affect
> >> mmap. So I think requirement of CAP_SYS_RESOURCE in the current
> >> namespace is enough for this limit.
> >>
> >> ...
> >>
> >> Cc: security at kernel.org
> >
> > That list is for reporting kernel security bugs.
> >
> >>
> >> --- a/kernel/sys.c
> >> +++ b/kernel/sys.c
> >> @@ -1701,7 +1701,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
> >> if (arg5 || (arg4 && opt != PR_SET_MM_AUXV))
> >> return -EINVAL;
> >>
> >> - if (!capable(CAP_SYS_RESOURCE))
> >> + if (!ns_capable(current_user_ns(), CAP_SYS_RESOURCE))
> >> return -EPERM;
> >>
> >> if (opt == PR_SET_MM_EXE_FILE)
> >
> > This looks harmless.
>
> I want to be convinced of this, but weakening this cap check seems
> like an easy way for a process to hide itself trivially from the real
> root user. It can change it's exe file link, and dodge RLIMIT_DATA by
> changing the brk addresses. The whole reason this cap check was there
> was to stop that kind of thing. Limiting it to a namespace isn't great
> since USER_NS means unprivileged processes can enter a new NS as the
> NS root user.
All what you are describing here we are doing on restoring tasks. We
need a way how to restore these parameters. One of our targets is to be
able to dump and restore Linux Containers. All processes of a container
live in a separate set of namespaces.
I was thinking to restore these parameters before entering into userns,
but this idea failed, because a process can't enter in pidns, but pidns
must be created in userns...
>> It can change it's exe file link
We can change memory content with help of ptrace. So if we want to hide
a process, we can execute another process and inject our code into it.
It can be equivalent to changing exe file link. Yes, it's a bit
harder, but we can do that even without this patch.
>> dodge RLIMIT_DATA
This limit affects calls to brk(2) and sbrk(2). But a task can use mmap() to
allocate memory. How is this limit used?
Sorry if I miss something.
>
> -Kees
>
> --
> Kees Cook
> Chrome OS Security
More information about the CRIU
mailing list