[Devel] [PATCH vz10 1/2] lib/kmapset: fix undefined behavior in kmapset_cmp()

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Tue Jun 3 08:34:50 MSK 2025



On 6/3/25 13:24, Pavel Tikhomirov wrote:
> 
> 
> On 6/2/25 19:55, Konstantin Khorenko wrote:
>> From: Pavel Butsykin <pbutsykin at virtuozzo.com>
>>
>> Here we are dealing with undefined behavior when taking the address 
>> from lvalue
>> which doesn't designate an object. The expression '&a->b' is undefined 
>> behavior
>> in the C language in that case if 'a' is NULL pointer. But in most 
>> cases this
>> causes no problems, until compiler starts using an aggressive 
>> optimization
>> policy.
>>
>> That's what I got after compiling the kernel with gcc9:
>>
>> [    0.422039] BUG: unable to handle kernel paging request at 
>> fffffffffffffff0
>> [    0.422557] IP: [<ffffffff9134bd9c>] kmapset_hash+0x37c/0x730
>> [    0.422971] PGD 6a69e067 PUD 6a6a0067 PMD 0
>> [    0.423276] Oops: 0000 [#1] SMP KASAN
>> [    0.423546] Modules linked in:
>> [    0.423782] CPU: 0 PID: 0 Comm: swapper/0 ve: 0 Not tainted 
>> 3.10.0-862.20.2.ovz.73.6 #67 73.6
>> [    0.424341] Hardware name: Virtuozzo KVM, BIOS rel-1.10.2-0- 
>> g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
>> [    0.425044] task: ffffffff926a53c0 ti: ffffffff92688000 task.ti: 
>> ffffffff92688000
>> [    0.425528] RIP: 0010:[<ffffffff9134bd9c>]  [<ffffffff9134bd9c>] 
>> kmapset_hash+0x37c/0x730
>> [    0.426072] RSP: 0000:ffffffff9268fae8  EFLAGS: 00010a02
>> [    0.426416] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 
>> 1ffffffffffffffe
>> [    0.426876] RDX: 0000000000000000 RSI: 1ffffffffffffffe RDI: 
>> fffffffffffffff0
>> [    0.427339] RBP: ffffffff9268fb48 R08: 0000000000000000 R09: 
>> 0000000000000000
>> [    0.427947] R10: 0000000000000000 R11: 0000000000000000 R12: 
>> ffff880067191508
>> [    0.428466] R13: ffffffffffffffe8 R14: ffff8800671915a0 R15: 
>> ffffffffffffffe8
>> [    0.428937] FS:  0000000000000000(0000) GS:ffff880067600000(0000) 
>> knlGS:0000000000000000
>> [    0.429455] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    0.429826] CR2: fffffffffffffff0 CR3: 000000006a69a000 CR4: 
>> 00000000000606b0
>> [    0.430296] Call Trace:
>> [    0.430464]  [<ffffffff9134c529>] kmapset_commit+0x99/0x160
>> [    0.430828]  [<ffffffff9134c490>] ? kmapset_put+0xf0/0xf0
>> [    0.431193]  [<ffffffff910bf830>] kernfs_get_ve_perms+0xc0/0x120
>> [    0.431587]  [<ffffffff910b9b69>] kernfs_add_one+0x289/0x4b0
>> [    0.431965]  [<ffffffff910b9e87>] kernfs_create_dir_ns+0xf7/0x150
>> [    0.432358]  [<ffffffff910c2e43>] sysfs_create_dir_ns+0xc3/0x1f0
>> [    0.432746]  [<ffffffff912ff4dc>] kobject_add_internal+0x1ec/0xa20
>> [    0.433160]  [<ffffffff91300164>] kobject_add+0x124/0x190
>> ...
>>
>> It's an attempt to obtain link_a->key when map_a->links.first is NULL.
>> Disassembly listing showed that this happened because the compiler 
>> optimized
>> condition - "while (&link_a->map_link)". Looks like gcc noticed the 
>> check of
>> map->links.first pointer to NULL in kmapset_hash() above, and decided 
>> that if
>> the pointer isn't NULL then 'while' condition is always true, but if the
>> pointer is NULL then it's undefined behavior and gcc doesn’t care 
>> about it.
>>
>> The second undefined behavior is pointer overflow, the compiler could 
>> assumes
>> &link_a->map_link cannot be NULL if the member's offset is greater 
>> than 0. In
>> the case when map_a->links.first is NULL, then link_a will be equal to
>> (NULL - 24) and expression &((struct kmapset_link *)(NULL - 24))- 
>> >map_link will
>> refer to NULL. But the problem is that POINTER_PLUS_EXPR in gcc is 
>> unsigned
>> sizetype, and negative pointer (NULL - 24) appear as very large 
>> positive, so for
>> this reason the compiler can also optimize this 'while' condition.
>>
>> But gcc with -fno-delete-null-pointer-checks and -fno-strict-overflow 
>> flags
>> should not optimize such NULL pointer checks. The kernel uses these 
>> flags, so it
>> turned out to be gcc9 bug - https://gcc.gnu.org/bugzilla/show_bug.cgi? 
>> id=88367
> 
> I don't think that explanation about undefined behaviour is correct:
> 
> 1) offsetof uses this construct "&((TYPE *)0)->MEMBER"
> 2) delete-null-pointer-checks talks about NULL pointer dereferencing and 
> not accessing member by NULL pointer and then getting it's address.
> 
> I think that original problem was in "while (&link_a->map_link)" being 
> an infinite loop and thus we got negative pointer from list_entry on 
> last iteration and crashed on it.
> 
> We might consider using hlist_entry_safe which won't give a negative 
> pointer even if map_b's hlist ends unexpectedly, and check for link_b 
> being not NULL. (Just to make code more robust, current code should also 
> work as we have size check before the loop.)

Ok maybe I'm partially wrong, "The second undefined behavior is pointer 
overflow, the compiler could assumes &link_a->map_link cannot be NULL if 
the member's offset is greater than 0." this part looks valid, else it 
would be no crash on negative pointer, but just successfully finished loop.

> 
>>
>> Let's fix this UB stuff by using hlist_for_each_entry() that safely 
>> iterates
>> hlist with hlist_entry_safe().
>>
>> Signed-off-by: Pavel Butsykin <pbutsykin at virtuozzo.com>
>> Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
>>
>> Rebase to RHEL10 6.12.0-55.13.1.el10_0 kernel notes:
>>
>>   * lib/kmapset.c: In function ‘kmapset_cmp’:
>>     lib/kmapset.c:49:16: warning: the comparison will always evaluate 
>> as ‘true’ for the address of ‘map_link’ will never be NULL [-Waddress]
>>        49 |         while (&link_a->map_link) {
>>           |          ^
>>
>> (cherry picked from vz7 commit bbe5b9a053ec3898c3fb61b0be8e6dc1df634639)
>> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
>> ---
>>   lib/kmapset.c | 8 ++------
>>   1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/kmapset.c b/lib/kmapset.c
>> index 5fc692149014b..03af368adeb11 100644
>> --- a/lib/kmapset.c
>> +++ b/lib/kmapset.c
>> @@ -42,18 +42,14 @@ static long kmapset_cmp(struct kmapset_map *map_a, 
>> struct kmapset_map *map_b)
>>       if (map_a->size != map_b->size)
>>           return map_a->size - map_b->size;
>> -    link_a = hlist_entry(map_a->links.first,
>> -            struct kmapset_link, map_link);
>>       link_b = hlist_entry(map_b->links.first,
>>               struct kmapset_link, map_link);
>> -    while (&link_a->map_link) {
>> +    hlist_for_each_entry(link_a, &map_a->links, map_link) {
>>           if (link_a->key != link_b->key)
>>               return (long)link_a->key - (long)link_b->key;
>>           if (link_a->value != link_b->value)
>>               return link_a->value - link_b->value;
>> -        link_a = list_entry(link_a->map_link.next,
>> -                struct kmapset_link, map_link);
>> -        link_b = list_entry(link_b->map_link.next,
>> +        link_b = hlist_entry(link_b->map_link.next,
>>                   struct kmapset_link, map_link);
>>       }
> 

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list