[CRIU] CRIU LXC Container Live Migration Concerns

Deepak Vij (A) deepak.vij at huawei.com
Mon Apr 28 10:13:20 PDT 2014


Thanks Pavel for your prompt response. This makes sense as your approach seems to be solely leveraging Kernel APIs in the user space instead of raw checkpoint/restore mechanism. My concern was what if the address space of the Kernel changes after an emergency Kernel patch (security, performance related patch etc.). Although, this may not be the issue as the Kernel uses virtual addresses instead of physical addresses. Also, I saw that you folks included some new Kernel level APIs within the mainline Linux Kernel to make the end-to-end container based checkpoint/restore possible.

I also saw that you folks initially tried in-kernel implementation of checkpoint/restore and later on abandoned this approach due to the additional overall complexity added to the kernel.

Although, the container based virtualization environment still has the fundamental problem of Kernel mismatch or OS mismatch as the LXC/Docker container created and tested on the source machine environment may not be same as the destination machine environment. The source and destination environments need to be controlled and coordinated in order to ensure the overall container based virtualization portability across machines, datacenters and clouds. Hypervisor based virtualization, on the other hand, does not have this problem as the VM comes packaged with the source operating system within it. Although, from mobility perspectives, moving hypervisor based VM is like moving an elephant versus moving lightweight containers.

Lastly, I am also chair of the IEEE Intercloud P2302 standards initiative. Workload portability is one of the main issue we are grappling with. Would it be possible for me to quickly chat with you on the phone to this regards. Please let me know, my cell is 408-806-6182. I would really appreciate that. Thanks.

Regards,
Deepak K. Vij

-----Original Message-----
From: Pavel Emelyanov [mailto:xemul at parallels.com] 
Sent: Monday, April 28, 2014 4:23 AM
To: Deepak Vij (A)
Cc: criu at openvz.org
Subject: Re: CRIU LXC Container Live Migration Concerns

On 04/26/2014 04:51 AM, Deepak Vij (A) wrote:
> Hi Pavel, let me quickly introduce myself. I am a researcher at FutureWei research lab based in Santa Clara. 

Nice to meet you.

> We have been looking at the LXC Containers/Docker as viable lightweight virtualization unit-of-work versus
> traditional hypervisor based virtualization. From portability of virtual machine perspectives, this is a no
> brainer. However, one of the concerns we have is due to Kernel incompatibility at the destination machine at
> the time of live migration of LXC container possibly using CRIU. For example, while doing the live
> "application process" migration embedded within the container what if someone applies a security patch on
> the destination machine that changes the underlying Kernel. This can possibly break the restore step as
> part of the overall checkpoint/restore process.

This is one of the main CRIU use cases -- evacuating containers/applications from one
node to another, potentially containing security/stability/performance updates. So I
would say that if this happens, this is a BUG, rather than expected behavior.

What CRIU does is uses only public kernel APIs to dump and restore process' state. Thus,
if we find a patch that, after applying to kernel, breaks applications behavior -- we
should report this to kernel people as "compatibility breakage". Moreover, what CRIU
dumps (and restores) is the state of the kernel objects, that can be seen via mentioned
APIs, thus this cannot change from kernel to kernel as well.


I can also share two practical experiences I have.

The first is Parallels' implementation of checkpoint/restore in 2.6.9, 2.6.18 and 2.6.32
kernels. We've been able to live migrate containers from 2.6.9 even to 2.6.32 without
any problems.

The 2nd one is about criu. I manually tried to live migrate apps from 3.11 to 3.13 and
things went smoothly every time.

> In contrast to this, hypervisor based virtualization does not have this problem as the whole OS is included
> in the VM itself.
> 
>  
> 
> The scenario I mentioned above is quite common as applying security patches etc. is a common practice. This
> seems to be our biggest concern.
> 
> I would really appreciate if you could throw some light into this. Thanks in advance.

You're welcome. Feel free to ask more if the above answer is not complete.

Thanks,
Pavel



More information about the CRIU mailing list