<br><br>On Thursday, September 28, 2017, Roman Kagan <<a href="mailto:rkagan@virtuozzo.com">rkagan@virtuozzo.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Thu, Sep 28, 2017 at 05:55:51PM +0300, Denis Kirjanov wrote:<br>
> Hi, we're seeing double faults in async_page_fault.<br>
<br>
async_page_fault is the #PF handler in KVM guests. It filters out<br>
specially crafted #PF's from the host; the rest fall through to the<br>
regular #PF handler. So most likely you're seeing genuine #PFs,<br>
unrelated to virtualization.<br>
<br>
> _Some_ of them related to the fact that during the faults RSP points<br>
> to userspace and it leads to double-fault scenario.<br>
<br>
The postmortem you quote doesn't support that.</blockquote><div><br></div><div>I'll post a relevant trace </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
> Is it known problem?<br>
<br>
There used to be a bug in async pagefault machinery which caused L0<br>
hypervisor to inject async pagefaults into L2 guest instead of L1. This<br>
must've been fixed in sufficiently recent </blockquote><div><br></div><div>Yep, I saw the patch and it's imho about the different thing. The patch fixes the wrong PF injected to an unrelated guest and thus a guest ends up with the 'CPU stuck' messages since it can't get the requested page</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
I'd guess the problem is with your kernel. Doesn't it reproduce on bare<br>
metal?<br>
<br>
<br>
> [11587.895394] Hardware name: Virtuozzo KVM, BIOS 1.9.1-5.3.2.vz7.6 04/01/2014<br>
> [11587.895394] task: ffff88020bee0000 ti: ffff880204b60000 task.ti:<br>
> ffff880204b60000<br>
> [11587.895394] RIP: 0010:[<ffffffff816a1bdd>] [<ffffffff816a1bdd>]<br>
> async_page_fault+0xd/0x30<br>
> [11587.895394] RSP: 002b:ffff880234f61fd8 EFLAGS: 00010096<br>
> [11587.895394] RAX: 00000000816a192c RBX: 0000000000000001 RCX: ffffffff816a192c<br>
> [11587.895394] RDX: ffff88023fc03fc0 RSI: 0000000000000000 RDI: ffff880234f62098<br>
> [11587.895394] RBP: ffff880234f62088 R08: ffff88023fbfffc0 R09: ffff88003642af00<br>
> [11587.895394] R10: 0000000000008000 R11: 0000000000000000 R12: ffff88023fc04f58<br>
> [11587.895394] R13: 0000000000000028 R14: 0000000000000000 R15: 0000000000000000<br>
> [11587.895394] FS: 00007ff80ffc1880(0000) GS:ffff88023fc00000(0000)<br>
> knlGS:0000000000000000<br>
> [11587.895394] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b<br>
> [11587.895394] CR2: ffff880234f61fc8 CR3: 00000000b9436000 CR4: 00000000000007f0<br>
> [11587.895394] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000<br>
> [11587.895394] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400<br>
> [11587.895394] Stack:<br>
> [11587.895394] 0000c7e9e11c7f44 0000270f05836600 906666906666fb02<br>
> be00000001b9d231<br>
> [11587.895394] e8df8948ffffffff 0000000231a6fba8 0001000000000008<br>
> 0000000000000000<br>
> [11587.895394] 0002000000000000 0000000000000000 0003000000000000<br>
> 0000000000000000<br>
> [11587.895394] Call Trace:<br>
> [11587.895394] Code: 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff<br>
> ff e8 78 3d 00 00 e9 33 02 00 00 0f 1f 00 66 66 90 66 66 90 66 66 90<br>
> 48 83 ec 78 <e8> 7e 01 00 00 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff<br>
> ff ff<br>
> [11587.895394] RIP [<ffffffff816a1bdd>] async_page_fault+0xd/0x30<br>
> [11587.895394] RSP <ffff880234f61fd8><br>
<br>
Roman.<br>
</blockquote>