[Devel] [PATCH vz10 0/7] per-VE ve.proc_permissions and sysfs permission fixes
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Tue Jun 30 15:50:22 MSK 2026
We definitely need a kselftest for it.
On 6/28/26 11:25, Mirian Shilakadze wrote:
> This series adds ve.proc_permissions, the procfs counterpart of
> ve.sysfs_permissions. It is a per-VE allowlist of /proc paths that the host
> exposes to a container, keyed per VE so the single shared proc tree gives
> per-VE answers.
>
> The motivation is GPU support in containers. A containerized GPU workload
> needs a few host /proc files visible (the nvidia entries it probes), and
> ve.proc_permissions exposes them through a generic per-VE allowlist rather
> than an nvidia specific passthrough.
>
> While implementing it I found several pre-existing defects in the shared
> sysfs and kernfs per-VE permission path, so the series is fix-first. Reading
> ve.sysfs_permissions under load already panicked the host on a stock kernel
> (NULL deref in kmapset_lookup), which the early patches fix before the procfs
> work builds on the same code. Of these defects the ve_perms_map
> use-after-free in patch 5 (the __rcu annotation) was found by code analysis.
> The rest surfaced through testing, the NULL deref and the wrong rwsem from the
> runtime crash and lockdep, and the rcu-list walks from PROVE_RCU_LIST.
>
> Layout:
> 1: lib/kmapset annotates the kmapset_lookup rcu-list walk so it is honest
> under CONFIG_PROVE_RCU_LIST.
> 2 to 5: fix the kernfs seq read and the VFS readers, skip a NULL map, lock
> the tree that is actually walked, take rcu_read_lock around the kmapset
> lookup, and mark ve_perms_map __rcu to close a use-after-free.
> 6: factors the filesystem agnostic core into fs/ve_perms.c with no
> functional change beyond an rcu_assign_pointer publish.
> 7: adds the procfs feature on top.
>
> Testing: Built and booted a debug kernel with KASAN, kmemleak, lockdep and
> PROVE_RCU_LIST. Ran concurrent reader, writer and teardown stress on both
> ve.sysfs_permissions and ve.proc_permissions, including in-container /proc
> and sysfs access and container start and stop. The original NULL deref
> reproduces on a stock kernel and no longer crashes with this series. No KASAN
> use-after-free, no kmemleak leak, and no rcu-list or lockdep splat in the
> ve_perms paths. gcov line coverage of the four touched files reached 93 to
> 99 percent (fs/kernfs/ve.c 99, fs/proc/ve.c 97, fs/ve_perms.c 95,
> lib/kmapset.c 93), the remainder being inlined fortify checks, error and
> boot-only init paths. Per-VE correctness was checked separately on both
> filesystems. A path becomes visible and readable inside a container only
> after it is added to that VE allowlist, access is revoked when it is
> removed, the host is unaffected, and the entry never leaks to another VE.
>
> Mirian Shilakadze (7):
> lib/kmapset: annotate the kmapset_lookup rcu-list walk with the held
> lock
> fs/kernfs, ve: skip NULL ve_perms_map in kernfs_perms_shown
> fs/kernfs, ve: lock the walked tree rwsem in kernfs_perms_start
> fs/kernfs, ve: take rcu_read_lock around the ve_perms kmapset lookup
> fs/kernfs, ve: fix ve_perms_map use-after-free, annotate it __rcu
> fs: factor per-VE permission core into ve_perms helpers
> fs/proc, ve: add per-VE ve.proc_permissions
>
> fs/Makefile | 1 +
> fs/kernfs/ve.c | 167 ++++++++----------
> fs/proc/Makefile | 1 +
> fs/proc/generic.c | 48 +++++-
> fs/proc/inode.c | 2 +
> fs/proc/internal.h | 25 +++
> fs/proc/root.c | 1 +
> fs/proc/ve.c | 345 ++++++++++++++++++++++++++++++++++++++
> fs/sysfs/ve.c | 2 +-
> fs/ve_perms.c | 136 +++++++++++++++
> include/linux/kernfs-ve.h | 2 +-
> include/linux/kernfs.h | 2 +-
> include/linux/ve-perms.h | 28 ++++
> include/linux/ve.h | 1 +
> kernel/ve/ve.c | 7 +
> lib/kmapset.c | 3 +-
> 16 files changed, 665 insertions(+), 106 deletions(-)
> create mode 100644 fs/proc/ve.c
> create mode 100644 fs/ve_perms.c
> create mode 100644 include/linux/ve-perms.h
>
> --
> 2.43.0
>
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
More information about the Devel
mailing list