[Devel] [PATCH vz10 0/7] per-VE ve.proc_permissions and sysfs permission fixes

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Tue Jun 30 15:58:03 MSK 2026


On 6/30/26 14:50, Pavel Tikhomirov wrote:
> We definitely need a kselftest for it.

Though in general the series looks good.

> 
> On 6/28/26 11:25, Mirian Shilakadze wrote:
>> This series adds ve.proc_permissions, the procfs counterpart of
>> ve.sysfs_permissions. It is a per-VE allowlist of /proc paths that the host
>> exposes to a container, keyed per VE so the single shared proc tree gives
>> per-VE answers.
>>
>> The motivation is GPU support in containers. A containerized GPU workload
>> needs a few host /proc files visible (the nvidia entries it probes), and
>> ve.proc_permissions exposes them through a generic per-VE allowlist rather
>> than an nvidia specific passthrough.
>>
>> While implementing it I found several pre-existing defects in the shared
>> sysfs and kernfs per-VE permission path, so the series is fix-first. Reading
>> ve.sysfs_permissions under load already panicked the host on a stock kernel
>> (NULL deref in kmapset_lookup), which the early patches fix before the procfs
>> work builds on the same code. Of these defects the ve_perms_map
>> use-after-free in patch 5 (the __rcu annotation) was found by code analysis.
>> The rest surfaced through testing, the NULL deref and the wrong rwsem from the
>> runtime crash and lockdep, and the rcu-list walks from PROVE_RCU_LIST.
>>
>> Layout:
>>   1: lib/kmapset annotates the kmapset_lookup rcu-list walk so it is honest
>>      under CONFIG_PROVE_RCU_LIST.
>>   2 to 5: fix the kernfs seq read and the VFS readers, skip a NULL map, lock
>>      the tree that is actually walked, take rcu_read_lock around the kmapset
>>      lookup, and mark ve_perms_map __rcu to close a use-after-free.
>>   6: factors the filesystem agnostic core into fs/ve_perms.c with no
>>      functional change beyond an rcu_assign_pointer publish.
>>   7: adds the procfs feature on top.
>>
>> Testing: Built and booted a debug kernel with KASAN, kmemleak, lockdep and
>> PROVE_RCU_LIST. Ran concurrent reader, writer and teardown stress on both
>> ve.sysfs_permissions and ve.proc_permissions, including in-container /proc
>> and sysfs access and container start and stop. The original NULL deref
>> reproduces on a stock kernel and no longer crashes with this series. No KASAN
>> use-after-free, no kmemleak leak, and no rcu-list or lockdep splat in the
>> ve_perms paths. gcov line coverage of the four touched files reached 93 to
>> 99 percent (fs/kernfs/ve.c 99, fs/proc/ve.c 97, fs/ve_perms.c 95,
>> lib/kmapset.c 93), the remainder being inlined fortify checks, error and
>> boot-only init paths. Per-VE correctness was checked separately on both
>> filesystems. A path becomes visible and readable inside a container only
>> after it is added to that VE allowlist, access is revoked when it is
>> removed, the host is unaffected, and the entry never leaks to another VE.
>>
>> Mirian Shilakadze (7):
>>   lib/kmapset: annotate the kmapset_lookup rcu-list walk with the held
>>     lock
>>   fs/kernfs, ve: skip NULL ve_perms_map in kernfs_perms_shown
>>   fs/kernfs, ve: lock the walked tree rwsem in kernfs_perms_start
>>   fs/kernfs, ve: take rcu_read_lock around the ve_perms kmapset lookup
>>   fs/kernfs, ve: fix ve_perms_map use-after-free, annotate it __rcu
>>   fs: factor per-VE permission core into ve_perms helpers
>>   fs/proc, ve: add per-VE ve.proc_permissions
>>
>>  fs/Makefile               |   1 +
>>  fs/kernfs/ve.c            | 167 ++++++++----------
>>  fs/proc/Makefile          |   1 +
>>  fs/proc/generic.c         |  48 +++++-
>>  fs/proc/inode.c           |   2 +
>>  fs/proc/internal.h        |  25 +++
>>  fs/proc/root.c            |   1 +
>>  fs/proc/ve.c              | 345 ++++++++++++++++++++++++++++++++++++++
>>  fs/sysfs/ve.c             |   2 +-
>>  fs/ve_perms.c             | 136 +++++++++++++++
>>  include/linux/kernfs-ve.h |   2 +-
>>  include/linux/kernfs.h    |   2 +-
>>  include/linux/ve-perms.h  |  28 ++++
>>  include/linux/ve.h        |   1 +
>>  kernel/ve/ve.c            |   7 +
>>  lib/kmapset.c             |   3 +-
>>  16 files changed, 665 insertions(+), 106 deletions(-)
>>  create mode 100644 fs/proc/ve.c
>>  create mode 100644 fs/ve_perms.c
>>  create mode 100644 include/linux/ve-perms.h
>>
>> --
>> 2.43.0
>>
> 

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list