[Devel] [PATCH vz10 0/7] per-VE ve.proc_permissions and sysfs permission fixes

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Tue Jun 30 15:50:22 MSK 2026


We definitely need a kselftest for it.

On 6/28/26 11:25, Mirian Shilakadze wrote:
> This series adds ve.proc_permissions, the procfs counterpart of
> ve.sysfs_permissions. It is a per-VE allowlist of /proc paths that the host
> exposes to a container, keyed per VE so the single shared proc tree gives
> per-VE answers.
> 
> The motivation is GPU support in containers. A containerized GPU workload
> needs a few host /proc files visible (the nvidia entries it probes), and
> ve.proc_permissions exposes them through a generic per-VE allowlist rather
> than an nvidia specific passthrough.
> 
> While implementing it I found several pre-existing defects in the shared
> sysfs and kernfs per-VE permission path, so the series is fix-first. Reading
> ve.sysfs_permissions under load already panicked the host on a stock kernel
> (NULL deref in kmapset_lookup), which the early patches fix before the procfs
> work builds on the same code. Of these defects the ve_perms_map
> use-after-free in patch 5 (the __rcu annotation) was found by code analysis.
> The rest surfaced through testing, the NULL deref and the wrong rwsem from the
> runtime crash and lockdep, and the rcu-list walks from PROVE_RCU_LIST.
> 
> Layout:
>   1: lib/kmapset annotates the kmapset_lookup rcu-list walk so it is honest
>      under CONFIG_PROVE_RCU_LIST.
>   2 to 5: fix the kernfs seq read and the VFS readers, skip a NULL map, lock
>      the tree that is actually walked, take rcu_read_lock around the kmapset
>      lookup, and mark ve_perms_map __rcu to close a use-after-free.
>   6: factors the filesystem agnostic core into fs/ve_perms.c with no
>      functional change beyond an rcu_assign_pointer publish.
>   7: adds the procfs feature on top.
> 
> Testing: Built and booted a debug kernel with KASAN, kmemleak, lockdep and
> PROVE_RCU_LIST. Ran concurrent reader, writer and teardown stress on both
> ve.sysfs_permissions and ve.proc_permissions, including in-container /proc
> and sysfs access and container start and stop. The original NULL deref
> reproduces on a stock kernel and no longer crashes with this series. No KASAN
> use-after-free, no kmemleak leak, and no rcu-list or lockdep splat in the
> ve_perms paths. gcov line coverage of the four touched files reached 93 to
> 99 percent (fs/kernfs/ve.c 99, fs/proc/ve.c 97, fs/ve_perms.c 95,
> lib/kmapset.c 93), the remainder being inlined fortify checks, error and
> boot-only init paths. Per-VE correctness was checked separately on both
> filesystems. A path becomes visible and readable inside a container only
> after it is added to that VE allowlist, access is revoked when it is
> removed, the host is unaffected, and the entry never leaks to another VE.
> 
> Mirian Shilakadze (7):
>   lib/kmapset: annotate the kmapset_lookup rcu-list walk with the held
>     lock
>   fs/kernfs, ve: skip NULL ve_perms_map in kernfs_perms_shown
>   fs/kernfs, ve: lock the walked tree rwsem in kernfs_perms_start
>   fs/kernfs, ve: take rcu_read_lock around the ve_perms kmapset lookup
>   fs/kernfs, ve: fix ve_perms_map use-after-free, annotate it __rcu
>   fs: factor per-VE permission core into ve_perms helpers
>   fs/proc, ve: add per-VE ve.proc_permissions
> 
>  fs/Makefile               |   1 +
>  fs/kernfs/ve.c            | 167 ++++++++----------
>  fs/proc/Makefile          |   1 +
>  fs/proc/generic.c         |  48 +++++-
>  fs/proc/inode.c           |   2 +
>  fs/proc/internal.h        |  25 +++
>  fs/proc/root.c            |   1 +
>  fs/proc/ve.c              | 345 ++++++++++++++++++++++++++++++++++++++
>  fs/sysfs/ve.c             |   2 +-
>  fs/ve_perms.c             | 136 +++++++++++++++
>  include/linux/kernfs-ve.h |   2 +-
>  include/linux/kernfs.h    |   2 +-
>  include/linux/ve-perms.h  |  28 ++++
>  include/linux/ve.h        |   1 +
>  kernel/ve/ve.c            |   7 +
>  lib/kmapset.c             |   3 +-
>  16 files changed, 665 insertions(+), 106 deletions(-)
>  create mode 100644 fs/proc/ve.c
>  create mode 100644 fs/ve_perms.c
>  create mode 100644 include/linux/ve-perms.h
> 
> --
> 2.43.0
> 

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list