[Devel] [PATCH vz10 1/2] lib/kmapset: fix undefined behavior in kmapset_cmp()
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Tue Jun 3 08:34:50 MSK 2025
On 6/3/25 13:24, Pavel Tikhomirov wrote:
>
>
> On 6/2/25 19:55, Konstantin Khorenko wrote:
>> From: Pavel Butsykin <pbutsykin at virtuozzo.com>
>>
>> Here we are dealing with undefined behavior when taking the address
>> from lvalue
>> which doesn't designate an object. The expression '&a->b' is undefined
>> behavior
>> in the C language in that case if 'a' is NULL pointer. But in most
>> cases this
>> causes no problems, until compiler starts using an aggressive
>> optimization
>> policy.
>>
>> That's what I got after compiling the kernel with gcc9:
>>
>> [ 0.422039] BUG: unable to handle kernel paging request at
>> fffffffffffffff0
>> [ 0.422557] IP: [<ffffffff9134bd9c>] kmapset_hash+0x37c/0x730
>> [ 0.422971] PGD 6a69e067 PUD 6a6a0067 PMD 0
>> [ 0.423276] Oops: 0000 [#1] SMP KASAN
>> [ 0.423546] Modules linked in:
>> [ 0.423782] CPU: 0 PID: 0 Comm: swapper/0 ve: 0 Not tainted
>> 3.10.0-862.20.2.ovz.73.6 #67 73.6
>> [ 0.424341] Hardware name: Virtuozzo KVM, BIOS rel-1.10.2-0-
>> g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>> [ 0.425044] task: ffffffff926a53c0 ti: ffffffff92688000 task.ti:
>> ffffffff92688000
>> [ 0.425528] RIP: 0010:[<ffffffff9134bd9c>] [<ffffffff9134bd9c>]
>> kmapset_hash+0x37c/0x730
>> [ 0.426072] RSP: 0000:ffffffff9268fae8 EFLAGS: 00010a02
>> [ 0.426416] RAX: 0000000000000000 RBX: dffffc0000000000 RCX:
>> 1ffffffffffffffe
>> [ 0.426876] RDX: 0000000000000000 RSI: 1ffffffffffffffe RDI:
>> fffffffffffffff0
>> [ 0.427339] RBP: ffffffff9268fb48 R08: 0000000000000000 R09:
>> 0000000000000000
>> [ 0.427947] R10: 0000000000000000 R11: 0000000000000000 R12:
>> ffff880067191508
>> [ 0.428466] R13: ffffffffffffffe8 R14: ffff8800671915a0 R15:
>> ffffffffffffffe8
>> [ 0.428937] FS: 0000000000000000(0000) GS:ffff880067600000(0000)
>> knlGS:0000000000000000
>> [ 0.429455] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 0.429826] CR2: fffffffffffffff0 CR3: 000000006a69a000 CR4:
>> 00000000000606b0
>> [ 0.430296] Call Trace:
>> [ 0.430464] [<ffffffff9134c529>] kmapset_commit+0x99/0x160
>> [ 0.430828] [<ffffffff9134c490>] ? kmapset_put+0xf0/0xf0
>> [ 0.431193] [<ffffffff910bf830>] kernfs_get_ve_perms+0xc0/0x120
>> [ 0.431587] [<ffffffff910b9b69>] kernfs_add_one+0x289/0x4b0
>> [ 0.431965] [<ffffffff910b9e87>] kernfs_create_dir_ns+0xf7/0x150
>> [ 0.432358] [<ffffffff910c2e43>] sysfs_create_dir_ns+0xc3/0x1f0
>> [ 0.432746] [<ffffffff912ff4dc>] kobject_add_internal+0x1ec/0xa20
>> [ 0.433160] [<ffffffff91300164>] kobject_add+0x124/0x190
>> ...
>>
>> It's an attempt to obtain link_a->key when map_a->links.first is NULL.
>> Disassembly listing showed that this happened because the compiler
>> optimized
>> condition - "while (&link_a->map_link)". Looks like gcc noticed the
>> check of
>> map->links.first pointer to NULL in kmapset_hash() above, and decided
>> that if
>> the pointer isn't NULL then 'while' condition is always true, but if the
>> pointer is NULL then it's undefined behavior and gcc doesn’t care
>> about it.
>>
>> The second undefined behavior is pointer overflow, the compiler could
>> assumes
>> &link_a->map_link cannot be NULL if the member's offset is greater
>> than 0. In
>> the case when map_a->links.first is NULL, then link_a will be equal to
>> (NULL - 24) and expression &((struct kmapset_link *)(NULL - 24))-
>> >map_link will
>> refer to NULL. But the problem is that POINTER_PLUS_EXPR in gcc is
>> unsigned
>> sizetype, and negative pointer (NULL - 24) appear as very large
>> positive, so for
>> this reason the compiler can also optimize this 'while' condition.
>>
>> But gcc with -fno-delete-null-pointer-checks and -fno-strict-overflow
>> flags
>> should not optimize such NULL pointer checks. The kernel uses these
>> flags, so it
>> turned out to be gcc9 bug - https://gcc.gnu.org/bugzilla/show_bug.cgi?
>> id=88367
>
> I don't think that explanation about undefined behaviour is correct:
>
> 1) offsetof uses this construct "&((TYPE *)0)->MEMBER"
> 2) delete-null-pointer-checks talks about NULL pointer dereferencing and
> not accessing member by NULL pointer and then getting it's address.
>
> I think that original problem was in "while (&link_a->map_link)" being
> an infinite loop and thus we got negative pointer from list_entry on
> last iteration and crashed on it.
>
> We might consider using hlist_entry_safe which won't give a negative
> pointer even if map_b's hlist ends unexpectedly, and check for link_b
> being not NULL. (Just to make code more robust, current code should also
> work as we have size check before the loop.)
Ok maybe I'm partially wrong, "The second undefined behavior is pointer
overflow, the compiler could assumes &link_a->map_link cannot be NULL if
the member's offset is greater than 0." this part looks valid, else it
would be no crash on negative pointer, but just successfully finished loop.
>
>>
>> Let's fix this UB stuff by using hlist_for_each_entry() that safely
>> iterates
>> hlist with hlist_entry_safe().
>>
>> Signed-off-by: Pavel Butsykin <pbutsykin at virtuozzo.com>
>> Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
>>
>> Rebase to RHEL10 6.12.0-55.13.1.el10_0 kernel notes:
>>
>> * lib/kmapset.c: In function ‘kmapset_cmp’:
>> lib/kmapset.c:49:16: warning: the comparison will always evaluate
>> as ‘true’ for the address of ‘map_link’ will never be NULL [-Waddress]
>> 49 | while (&link_a->map_link) {
>> | ^
>>
>> (cherry picked from vz7 commit bbe5b9a053ec3898c3fb61b0be8e6dc1df634639)
>> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
>> ---
>> lib/kmapset.c | 8 ++------
>> 1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/kmapset.c b/lib/kmapset.c
>> index 5fc692149014b..03af368adeb11 100644
>> --- a/lib/kmapset.c
>> +++ b/lib/kmapset.c
>> @@ -42,18 +42,14 @@ static long kmapset_cmp(struct kmapset_map *map_a,
>> struct kmapset_map *map_b)
>> if (map_a->size != map_b->size)
>> return map_a->size - map_b->size;
>> - link_a = hlist_entry(map_a->links.first,
>> - struct kmapset_link, map_link);
>> link_b = hlist_entry(map_b->links.first,
>> struct kmapset_link, map_link);
>> - while (&link_a->map_link) {
>> + hlist_for_each_entry(link_a, &map_a->links, map_link) {
>> if (link_a->key != link_b->key)
>> return (long)link_a->key - (long)link_b->key;
>> if (link_a->value != link_b->value)
>> return link_a->value - link_b->value;
>> - link_a = list_entry(link_a->map_link.next,
>> - struct kmapset_link, map_link);
>> - link_b = list_entry(link_b->map_link.next,
>> + link_b = hlist_entry(link_b->map_link.next,
>> struct kmapset_link, map_link);
>> }
>
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
More information about the Devel
mailing list