[Devel] [PATCH vz10 1/2] lib/kmapset: fix undefined behavior in kmapset_cmp()

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Tue Jun 3 08:24:47 MSK 2025



On 6/2/25 19:55, Konstantin Khorenko wrote:
> From: Pavel Butsykin <pbutsykin at virtuozzo.com>
> 
> Here we are dealing with undefined behavior when taking the address from lvalue
> which doesn't designate an object. The expression '&a->b' is undefined behavior
> in the C language in that case if 'a' is NULL pointer. But in most cases this
> causes no problems, until compiler starts using an aggressive optimization
> policy.
> 
> That's what I got after compiling the kernel with gcc9:
> 
> [    0.422039] BUG: unable to handle kernel paging request at fffffffffffffff0
> [    0.422557] IP: [<ffffffff9134bd9c>] kmapset_hash+0x37c/0x730
> [    0.422971] PGD 6a69e067 PUD 6a6a0067 PMD 0
> [    0.423276] Oops: 0000 [#1] SMP KASAN
> [    0.423546] Modules linked in:
> [    0.423782] CPU: 0 PID: 0 Comm: swapper/0 ve: 0 Not tainted 3.10.0-862.20.2.ovz.73.6 #67 73.6
> [    0.424341] Hardware name: Virtuozzo KVM, BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [    0.425044] task: ffffffff926a53c0 ti: ffffffff92688000 task.ti: ffffffff92688000
> [    0.425528] RIP: 0010:[<ffffffff9134bd9c>]  [<ffffffff9134bd9c>] kmapset_hash+0x37c/0x730
> [    0.426072] RSP: 0000:ffffffff9268fae8  EFLAGS: 00010a02
> [    0.426416] RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 1ffffffffffffffe
> [    0.426876] RDX: 0000000000000000 RSI: 1ffffffffffffffe RDI: fffffffffffffff0
> [    0.427339] RBP: ffffffff9268fb48 R08: 0000000000000000 R09: 0000000000000000
> [    0.427947] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880067191508
> [    0.428466] R13: ffffffffffffffe8 R14: ffff8800671915a0 R15: ffffffffffffffe8
> [    0.428937] FS:  0000000000000000(0000) GS:ffff880067600000(0000) knlGS:0000000000000000
> [    0.429455] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.429826] CR2: fffffffffffffff0 CR3: 000000006a69a000 CR4: 00000000000606b0
> [    0.430296] Call Trace:
> [    0.430464]  [<ffffffff9134c529>] kmapset_commit+0x99/0x160
> [    0.430828]  [<ffffffff9134c490>] ? kmapset_put+0xf0/0xf0
> [    0.431193]  [<ffffffff910bf830>] kernfs_get_ve_perms+0xc0/0x120
> [    0.431587]  [<ffffffff910b9b69>] kernfs_add_one+0x289/0x4b0
> [    0.431965]  [<ffffffff910b9e87>] kernfs_create_dir_ns+0xf7/0x150
> [    0.432358]  [<ffffffff910c2e43>] sysfs_create_dir_ns+0xc3/0x1f0
> [    0.432746]  [<ffffffff912ff4dc>] kobject_add_internal+0x1ec/0xa20
> [    0.433160]  [<ffffffff91300164>] kobject_add+0x124/0x190
> ...
> 
> It's an attempt to obtain link_a->key when map_a->links.first is NULL.
> Disassembly listing showed that this happened because the compiler optimized
> condition - "while (&link_a->map_link)". Looks like gcc noticed the check of
> map->links.first pointer to NULL in kmapset_hash() above, and decided that if
> the pointer isn't NULL then 'while' condition is always true, but if the
> pointer is NULL then it's undefined behavior and gcc doesn’t care about it.
> 
> The second undefined behavior is pointer overflow, the compiler could assumes
> &link_a->map_link cannot be NULL if the member's offset is greater than 0. In
> the case when map_a->links.first is NULL, then link_a will be equal to
> (NULL - 24) and expression &((struct kmapset_link *)(NULL - 24))->map_link will
> refer to NULL. But the problem is that POINTER_PLUS_EXPR in gcc is unsigned
> sizetype, and negative pointer (NULL - 24) appear as very large positive, so for
> this reason the compiler can also optimize this 'while' condition.
> 
> But gcc with -fno-delete-null-pointer-checks and -fno-strict-overflow flags
> should not optimize such NULL pointer checks. The kernel uses these flags, so it
> turned out to be gcc9 bug - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88367

I don't think that explanation about undefined behaviour is correct:

1) offsetof uses this construct "&((TYPE *)0)->MEMBER"
2) delete-null-pointer-checks talks about NULL pointer dereferencing and 
not accessing member by NULL pointer and then getting it's address.

I think that original problem was in "while (&link_a->map_link)" being 
an infinite loop and thus we got negative pointer from list_entry on 
last iteration and crashed on it.

We might consider using hlist_entry_safe which won't give a negative 
pointer even if map_b's hlist ends unexpectedly, and check for link_b 
being not NULL. (Just to make code more robust, current code should also 
work as we have size check before the loop.)

> 
> Let's fix this UB stuff by using hlist_for_each_entry() that safely iterates
> hlist with hlist_entry_safe().
> 
> Signed-off-by: Pavel Butsykin <pbutsykin at virtuozzo.com>
> Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
> 
> Rebase to RHEL10 6.12.0-55.13.1.el10_0 kernel notes:
> 
>   * lib/kmapset.c: In function ‘kmapset_cmp’:
>     lib/kmapset.c:49:16: warning: the comparison will always evaluate as ‘true’ for the address of ‘map_link’ will never be NULL [-Waddress]
>        49 |         while (&link_a->map_link) {
>           |		  ^
> 
> (cherry picked from vz7 commit bbe5b9a053ec3898c3fb61b0be8e6dc1df634639)
> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
> ---
>   lib/kmapset.c | 8 ++------
>   1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/kmapset.c b/lib/kmapset.c
> index 5fc692149014b..03af368adeb11 100644
> --- a/lib/kmapset.c
> +++ b/lib/kmapset.c
> @@ -42,18 +42,14 @@ static long kmapset_cmp(struct kmapset_map *map_a, struct kmapset_map *map_b)
>   	if (map_a->size != map_b->size)
>   		return map_a->size - map_b->size;
>   
> -	link_a = hlist_entry(map_a->links.first,
> -			struct kmapset_link, map_link);
>   	link_b = hlist_entry(map_b->links.first,
>   			struct kmapset_link, map_link);
> -	while (&link_a->map_link) {
> +	hlist_for_each_entry(link_a, &map_a->links, map_link) {
>   		if (link_a->key != link_b->key)
>   			return (long)link_a->key - (long)link_b->key;
>   		if (link_a->value != link_b->value)
>   			return link_a->value - link_b->value;
> -		link_a = list_entry(link_a->map_link.next,
> -				struct kmapset_link, map_link);
> -		link_b = list_entry(link_b->map_link.next,
> +		link_b = hlist_entry(link_b->map_link.next,
>   				struct kmapset_link, map_link);
>   	}
>   

-- 
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.



More information about the Devel mailing list