[Devel] [PATCH vz10 0/7] per-VE ve.proc_permissions and sysfs permission fixes
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Tue Jun 30 15:58:03 MSK 2026
On 6/30/26 14:50, Pavel Tikhomirov wrote:
> We definitely need a kselftest for it.
Though in general the series looks good.
>
> On 6/28/26 11:25, Mirian Shilakadze wrote:
>> This series adds ve.proc_permissions, the procfs counterpart of
>> ve.sysfs_permissions. It is a per-VE allowlist of /proc paths that the host
>> exposes to a container, keyed per VE so the single shared proc tree gives
>> per-VE answers.
>>
>> The motivation is GPU support in containers. A containerized GPU workload
>> needs a few host /proc files visible (the nvidia entries it probes), and
>> ve.proc_permissions exposes them through a generic per-VE allowlist rather
>> than an nvidia specific passthrough.
>>
>> While implementing it I found several pre-existing defects in the shared
>> sysfs and kernfs per-VE permission path, so the series is fix-first. Reading
>> ve.sysfs_permissions under load already panicked the host on a stock kernel
>> (NULL deref in kmapset_lookup), which the early patches fix before the procfs
>> work builds on the same code. Of these defects the ve_perms_map
>> use-after-free in patch 5 (the __rcu annotation) was found by code analysis.
>> The rest surfaced through testing, the NULL deref and the wrong rwsem from the
>> runtime crash and lockdep, and the rcu-list walks from PROVE_RCU_LIST.
>>
>> Layout:
>> 1: lib/kmapset annotates the kmapset_lookup rcu-list walk so it is honest
>> under CONFIG_PROVE_RCU_LIST.
>> 2 to 5: fix the kernfs seq read and the VFS readers, skip a NULL map, lock
>> the tree that is actually walked, take rcu_read_lock around the kmapset
>> lookup, and mark ve_perms_map __rcu to close a use-after-free.
>> 6: factors the filesystem agnostic core into fs/ve_perms.c with no
>> functional change beyond an rcu_assign_pointer publish.
>> 7: adds the procfs feature on top.
>>
>> Testing: Built and booted a debug kernel with KASAN, kmemleak, lockdep and
>> PROVE_RCU_LIST. Ran concurrent reader, writer and teardown stress on both
>> ve.sysfs_permissions and ve.proc_permissions, including in-container /proc
>> and sysfs access and container start and stop. The original NULL deref
>> reproduces on a stock kernel and no longer crashes with this series. No KASAN
>> use-after-free, no kmemleak leak, and no rcu-list or lockdep splat in the
>> ve_perms paths. gcov line coverage of the four touched files reached 93 to
>> 99 percent (fs/kernfs/ve.c 99, fs/proc/ve.c 97, fs/ve_perms.c 95,
>> lib/kmapset.c 93), the remainder being inlined fortify checks, error and
>> boot-only init paths. Per-VE correctness was checked separately on both
>> filesystems. A path becomes visible and readable inside a container only
>> after it is added to that VE allowlist, access is revoked when it is
>> removed, the host is unaffected, and the entry never leaks to another VE.
>>
>> Mirian Shilakadze (7):
>> lib/kmapset: annotate the kmapset_lookup rcu-list walk with the held
>> lock
>> fs/kernfs, ve: skip NULL ve_perms_map in kernfs_perms_shown
>> fs/kernfs, ve: lock the walked tree rwsem in kernfs_perms_start
>> fs/kernfs, ve: take rcu_read_lock around the ve_perms kmapset lookup
>> fs/kernfs, ve: fix ve_perms_map use-after-free, annotate it __rcu
>> fs: factor per-VE permission core into ve_perms helpers
>> fs/proc, ve: add per-VE ve.proc_permissions
>>
>> fs/Makefile | 1 +
>> fs/kernfs/ve.c | 167 ++++++++----------
>> fs/proc/Makefile | 1 +
>> fs/proc/generic.c | 48 +++++-
>> fs/proc/inode.c | 2 +
>> fs/proc/internal.h | 25 +++
>> fs/proc/root.c | 1 +
>> fs/proc/ve.c | 345 ++++++++++++++++++++++++++++++++++++++
>> fs/sysfs/ve.c | 2 +-
>> fs/ve_perms.c | 136 +++++++++++++++
>> include/linux/kernfs-ve.h | 2 +-
>> include/linux/kernfs.h | 2 +-
>> include/linux/ve-perms.h | 28 ++++
>> include/linux/ve.h | 1 +
>> kernel/ve/ve.c | 7 +
>> lib/kmapset.c | 3 +-
>> 16 files changed, 665 insertions(+), 106 deletions(-)
>> create mode 100644 fs/proc/ve.c
>> create mode 100644 fs/ve_perms.c
>> create mode 100644 include/linux/ve-perms.h
>>
>> --
>> 2.43.0
>>
>
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
More information about the Devel
mailing list