[CRIU] checkpoint/restore: Adding more "Getters" to the KVM API

Fri Jan 13 14:14:27 MSK 2023

On Fri, 2022-12-30 at 08:25 +0000, scalingtree wrote:
> Hi lists,
> 
> (Re-sending as plain text.)
> 
> We are in the process of using an external tool (CRIU) to
> checkpoint/restore a KVM-enabled virtual machine. Initially we target
> the hypervisor kvmtool but the extension, if done well, should allow
> to checkpoint any hypervisor: like Qemu or firecracker.
> 
> CRIU can checkpoint and restore most of the application (or the VMM
> in our case) state except the state of the kernel module KVM. To
> overcome this limitation, we need more getters in the KVM API to
> extract the state of the VM.
> 
> One example of a missing getter is the one for the guest memory.
> There is a KVM_SET_MEMORY API call. But there is no equivalent
> getter: KVM_GET_MEMORY. 
> 
> Can we add such getters to the KVM API? Any idea of the difficulty? I
> think one of the difficulties will be to get the state of the
> architecture-specific state of KVM: for now, we are targetting Intel
> x86_64 architecture (VT-X).

I'm not really sure I understand the use case here. Can't the VMM be
restarted and restore this?

Live update is a barely-special case of live migration. You kexec the
underlying kernel and start a *new* VMM (which may have its own fixes
too), from the preserved "migration" state.

Any VMM which supports live migration surely doesn't need the kernel to
help it with checkpoint/restore?

Now... if you wanted to talk about leaving some of the physical CPUs in
guest mode *while* the kernel uses one of them to actually do the
kexec, *that* would be interesting.

It starts with virtual address space isolation, putting that kvm_run
loop into its own address space separate from the kernel. And then why
*can't* we leave it running? If it ever needs to take a vmexit (and
with interrupt posting and all the stuff that we not accelerate in
hardware, how often is that anyway?), then it might need to wait for a
Linux kernel to come back before it thunks back into it.

That's the naïve starting point.... lots of fun with reconstituting
state and reconciling "newly-created" KVMs in the new kernel with the
vCPUs which are already actually running. But there's an amazing win to
be had there, letting VMs continue to actually *run* while the whole
hypervisor restarts with basically zero downtime.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5965 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20230113/07f463ca/attachment.p7s>