[Devel] [PATCH RH9 6/7] ve/prctl_set_mm: allow setting exe link while unprivileged for spfs

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Tue Oct 5 15:55:52 MSK 2021


In criu we do:

  +-> restore_one_alive_task
    +-> set_user_ns #1

  +-> restore_one_alive_task
    +-> sigreturn_restore #2
      +-> arch_export_restore_task
	+-> __export_restore_task
	  +-> sys_prctl(PR_SET_MM, PR_SET_MM_MAP,...)

So we call PR_SET_MM after we've switched to unprivileged userns, but
PR_SET_MM_MAP is already available in unprivileged context. In case of
fallback where PR_SET_MM_MAP is not available there would be a problem,
but on our kernel we have it so criu should just work fine.

In spfs we do PR_SET_MM + PR_SET_MM_EXE_FILE from parasite (can be
unprivileged userns). PR_SET_MM_EXE_FILE one is not available in
mainstream.

Here are descriptions of patches which allowed PR_SET_MM_EXE_FILE
everywhere and all other PR_SET_MM flags in ve:

+++

ve/prctl_set_mm: allow to change mm content in ve

This is required to be able to change /proc/pid/exe of a process, running on
NFS.
SPFS manager, which does this change, is a child of criu process, which is
being started in container from the early beginning.

https://jira.sw.ru/browse/PSBM-26967

Signed-off-by: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>

(cherry picked from vz8 commit 850d71b3cebc0796b87d45659c832d44234328d6)
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

+++

prctl: reduce requirements to exe link change

Do not request for CAP_SYS_RESOURCE anymore to change exe link.
This is needed to allow spfs manager to change it in unprivileged process.
In case of CRIU this restriction wasn't a problem, since CRIU is a priviledged
process and drops capabilities _after_ exe link change.
But then spfs manager is not able to do the same thing for unpriviledged
process.
We are not going to push NFS to upstream anymore. And thus can relax
requirements in our kernel.
Note: this limitation is somewhat strange, because exe link can be changed
upon execve system call.

https://jira.sw.ru/browse/PSBM-50867

Signed-off-by: Stanislav Kinsburskiy <skinsbursky at virtuozzo.com>
Acked-by: Konstantin Khorenko <khorenko at virtuozzo.com>

khorenko@: this allows to migrate online unprivileged processes which binaries
lay on an NFS volume.

(cherry picked from commit 4737d188f94f05eb58e770c040f64f1fa49efbce)
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>

Let's make it more restrictive and only allow PR_SET_MM_EXE_FILE which
seem the only thing we actually need here.

https://jira.sw.ru/browse/PSBM-133993

Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
---
 kernel/sys.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index b0caeed760bd..fedc4d14b1af 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2159,12 +2159,12 @@ static int prctl_set_mm(int opt, unsigned long addr,
 		return prctl_set_mm_map(opt, (const void __user *)addr, arg4);
 #endif
 
-	if (!capable(CAP_SYS_RESOURCE))
-		return -EPERM;
-
 	if (opt == PR_SET_MM_EXE_FILE)
 		return prctl_set_mm_exe_file(mm, (unsigned int)addr);
 
+	if (!capable(CAP_SYS_RESOURCE))
+		return -EPERM;
+
 	if (opt == PR_SET_MM_AUXV)
 		return prctl_set_auxv(mm, addr, arg4);
 
-- 
2.31.1



More information about the Devel mailing list