[Devel] double faults in Virtuozzo KVM
Roman Kagan
rkagan at virtuozzo.com
Fri Sep 29 10:04:43 MSK 2017
On Fri, Sep 29, 2017 at 12:02:37AM +0300, Denis Kirjanov wrote:
> On Thursday, September 28, 2017, Roman Kagan <rkagan at virtuozzo.com> wrote:
> > On Thu, Sep 28, 2017 at 05:55:51PM +0300, Denis Kirjanov wrote:
> > > Hi, we're seeing double faults in async_page_fault.
> >
> > async_page_fault is the #PF handler in KVM guests. It filters out
> > specially crafted #PF's from the host; the rest fall through to the
> > regular #PF handler. So most likely you're seeing genuine #PFs,
> > unrelated to virtualization.
> >
> > > _Some_ of them related to the fact that during the faults RSP points
> > > to userspace and it leads to double-fault scenario.
> >
> > The postmortem you quote doesn't support that.
>
>
> I'll post a relevant trace
>
> >
> > > Is it known problem?
> >
> > There used to be a bug in async pagefault machinery which caused L0
> > hypervisor to inject async pagefaults into L2 guest instead of L1. This
> > must've been fixed in sufficiently recent
>
>
> Yep, I saw the patch and it's imho about the different thing. The patch
> fixes the wrong PF injected to an unrelated guest and thus a guest ends up
> with the 'CPU stuck' messages since it can't get the requested page
Not quite.
The idea of async_pf is that when a guest task hits a page which is
present in the guest page tables but absent in the hypervisor ones, the
hypervisor, instead of descheduling the whole vCPU thread until the
fault is resolved, injects a specially crafted #PF into the guest so
that the guest can deschedule that task and put it on a waiting list,
but otherwise continue working. Once the fault is resolved in the
hypervisor, it injects another #PF matching the first one, and the guest
looks up the task and resumes it. The bug was that those special #PF's
were occasionally injected into L2 guest instead of L1. If the guest
received the first kind of async_pf, but not the second, the task will
remain stuck forever. If, vice versa, the first one was missing, the
second one wouldn't match any suspended task and would be considered a
regular #PF by the guest kernel, so an arbitrary task would receieve a
bogus #PF.
Anyway every #PF in linux guests, including genuine guest ones, goes
through async_page_fault, so its presence in the stacktraces is
expected.
And that bug is only relevant in presence of nested KVM, and it was
fixed in vzlinux.
> > I'd guess the problem is with your kernel. Doesn't it reproduce on bare
> > metal?
I still hold on this. What guest kernel are you using? What are your
reasons to blame the hypervisor and not the kernel?
Roman.
More information about the Devel
mailing list