[Devel] double faults in Virtuozzo KVM

Fri Sep 29 11:25:20 MSK 2017

>> > > _Some_ of them related to the fact that during the faults RSP points
>> > > to userspace and it leads to double-fault scenario.
>> >
>> > The postmortem you quote doesn't support that.
>>
>>
>> I'll post a relevant trace

Here it is:

[32065.459255] double fault: 0000 [#1] SMP
[32065.459975] Modules linked in: dm_mod hcpdriver(POE) kmodlve(OE)
vzdev ppdev pcspkr sg i
2c_piix4 parport_pc parport ip_tables ext4 mbcache jbd2 sd_mod
crc_t10dif crct10dif_generic
 crct10dif_common virtio_console virtio_scsi virtio_net sr_mod cdrom
ata_generic pata_acpi
bochs_drm drm_kms_helper ttm drm serio_raw virtio_pci ata_piix
virtio_ring virtio i2c_core
libata floppy
[32065.460041] CPU: 0 PID: 22951 Comm: cdp-2-6 ve: 0 Tainted: P
   OE  ------------
  3.10.0-714.10.2.lve1.4.61.el7.x86_64 #1 29.2
[32065.460041] Hardware name: Virtuozzo KVM, BIOS 1.9.1-5.3.2.vz7.6 04/01/2014
[32065.460041] task: ffff8801e9ab8ff0 ti: ffff8800ab598000 task.ti:
ffff8800ab598000
[32065.460041] RIP: 0010:[<ffffffff816a1bdd>]  [<ffffffff816a1bdd>]
async_page_fault+0xd/0x
30
[32065.460041] RSP: 002b:00007f1a1290afe8  EFLAGS: 00010016
[32065.460041] RAX: 00000000816a192c RBX: 0000000000000001 RCX: ffffffff816a192c
[32065.460041] RDX: 0000000000000008 RSI: 0000000000000000 RDI: 00007f1a1290b0a8
[32065.460041] RBP: 00007f1a1290b098 R08: 0000000000000001 R09: 0000000000000000
[32065.460041] R10: 0000000000000000 R11: 0000000000000000 R12: 00007f1a12919960
[32065.460041] R13: 0000000000000028 R14: 0000000000000000 R15: 00000000011f3f20
[32065.460041] FS:  00007f1a1291e700(0000) GS:ffff88023fc00000(0000)
knlGS:0000000000000000
[32065.460041] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[32065.460041] CR2: 00007f1a1290afd8 CR3: 0000000036794000 CR4: 00000000000007f0
[32065.460041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[32065.460041] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[32065.460041] Stack:
[32065.460041] BUG: unable to handle kernel paging request at 00007f1a1290afe8
[32065.460041] IP: [<ffffffff8102d6c9>] show_stack_log_lvl+0x109/0x180
[32065.460041] PGD 36799067 PUD 3679a067 PMD 220382067 PTE 0
[32065.460041] Oops: 0000 [#2] SMP
[32065.460041] Modules linked in: dm_mod hcpdriver(POE) kmodlve(OE)
vzdev ppdev pcspkr sg i2c_piix4 parport_pc parport ip_tables ext4
mbcache jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_common
virtio_console virtio_scsi virtio_net sr_mod cdrom ata_generic
pata_acpi bochs_drm drm_kms_helper ttm drm serio_raw virtio_pci
ata_piix virtio_ring virtio i2c_core libata floppy
[32065.460041] CPU: 0 PID: 22951 Comm: cdp-2-6 ve: 0 Tainted: P
   OE  ------------   3.10.0-714.10.2.lve1.4.61.el7.x86_64 #1 29.2
[32065.460041] Hardware name: Virtuozzo KVM, BIOS 1.9.1-5.3.2.vz7.6 04/01/2014
[32065.460041] task: ffff8801e9ab8ff0 ti: ffff8800ab598000 task.ti:
ffff8800ab598000
[32065.460041] RIP: 0010:[<ffffffff8102d6c9>]  [<ffffffff8102d6c9>]
show_stack_log_lvl+0x109/0x180
[32065.460041] RSP: 002b:ffff88023fc04e18  EFLAGS: 00010046
[32065.460041] RAX: 00007f1a1290aff0 RBX: 00007f1a1290afe8 RCX: 0000000000000000
[32065.460041] RDX: ffff88023fc03fc0 RSI: ffff88023fc04f58 RDI: 0000000000000000
[32065.460041] RBP: ffff88023fc04e68 R08: ffff88023fbfffc0 R09: ffff8800369f5900
[32065.460041] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88023fc04f58
[32065.460041] R13: 0000000000000000 R14: ffffffff818d31b8 R15: 0000000000000000
[32065.460041] FS:  00007f1a1291e700(0000) GS:ffff88023fc00000(0000)
knlGS:0000000000000000
[32065.460041] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[32065.460041] CR2: 00007f1a1290afe8 CR3: 0000000036794000 CR4: 00000000000007f0
[32065.460041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[32065.460041] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[32065.460041] Stack:
[32065.460041]  ffff880200000008 ffff88023fc04e78 ffff88023fc04e38
000000007ddc4235
[32065.460041]  00007f1a1290afe8 ffff88023fc04f58 00007f1a1290afe8
ffff88023fc04f58
[32065.460041]  000000000000002b 00000000011f3f20 ffff88023fc04ec8
ffffffff8102d7f6
[32065.460041] Call Trace:
[32065.460041]  <#DF>
[32065.460041]
[32065.460041]  [<ffffffff8102d7f6>] show_regs+0xb6/0x240
[32065.460041]  [<ffffffff816a2b0f>] __die+0x9f/0xf0
[32065.460041]  [<ffffffff8102e878>] die+0x38/0x70
[32065.460041]  [<ffffffff8102b5f2>] do_double_fault+0x72/0x80
[32065.460041]  [<ffffffff816aba88>] double_fault+0x28/0x30
[32065.460041]  [<ffffffff816a192c>] ? restore_args+0x30/0x30
[32065.460041]  [<ffffffff816a1bdd>] ? async_page_fault+0xd/0x30
[32065.460041]  <<EOE>>
[32065.460041] Code:
[32065.460041] 4d b8 4c 89 45 c0 48 89 55 c8 48 8b 5b f8 e8 37 6b 66
00 48 8b 55 c8 4c 8b 45 c0 8b 4d b8 85 c9 74 05 f6 c1 03 74 4c 48 8d
43 08 <48> 8b 33 48 c7 c7 b0 31 8d 81 89 4d b4 4c 89 45 b8 48 89 45 c8
[32065.460041] RIP  [<ffffffff8102d6c9>] show_stack_log_lvl+0x109/0x180
[32065.460041]  RSP <ffff88023fc04e18>
[32065.460041] CR2: 00007f1a1290afe8

>> >
>> > > Is it known problem?
>> >
>> > There used to be a bug in async pagefault machinery which caused L0
>> > hypervisor to inject async pagefaults into L2 guest instead of L1.
>> > This
>> > must've been fixed in sufficiently recent
>>
>>
>> Yep, I saw the patch and it's imho about the different thing. The patch
>> fixes the wrong PF injected to an unrelated guest and thus a guest ends
>> up
>> with the 'CPU stuck' messages since it can't get the requested page
>
> Not quite.
>
> The idea of async_pf is that when a guest task hits a page which is
> present in the guest page tables but absent in the hypervisor ones, the
> hypervisor, instead of descheduling the whole vCPU thread until the
> fault is resolved, injects a specially crafted #PF into the guest so
> that the guest can deschedule that task and put it on a waiting list,
> but otherwise continue working.  Once the fault is resolved in the
> hypervisor, it injects another #PF matching the first one, and the guest
> looks up the task and resumes it.  The bug was that those special #PF's
> were occasionally injected into L2 guest instead of L1.  If the guest
> received the first kind of async_pf, but not the second, the task will
> remain stuck forever.  If, vice versa, the first one was missing, the
> second one wouldn't match any suspended task and would be considered a
> regular #PF by the guest kernel, so an arbitrary task would receieve a
> bogus #PF.
>
> Anyway every #PF in linux guests, including genuine guest ones, goes
> through async_page_fault, so its presence in the stacktraces is
> expected.
>
> And that bug is only relevant in presence of nested KVM, and it was
> fixed in vzlinux.
>
>> > I'd guess the problem is with your kernel.  Doesn't it reproduce on
>> > bare
>> > metal?
>
> I still hold on this.  What guest kernel are you using?  What are your
> reasons to blame the hypervisor and not the kernel?

Nope, we dont have reports from bare-metal hosts.

>
> Roman.
>