[CRIU] CRIU LXC Container Live Migration Concerns
Deepak Vij (A)
deepak.vij at huawei.com
Tue Apr 29 10:45:39 PDT 2014
Thanks Pavel & James for your detail response to my earlier email. I appreciate it.
Based on our back-and-forth discussion so far, I am fully convinced that you folks, as part of the CRIU "Live Migration" effort, are definitely on the right track. I am convinced that "Live Migration" design which you have put forward using Kernel level API should work.
Although, the concern I had was at a higher LXC/Container level itself. You also concur based on the following response from Pavel.
====================================
Hm... Good question -- what if after migration all the system libraries have changed. Yes, this _is_ a problem. E.g. I've seen that applications crash after migration between two nodes with glibcs of equal versions, but "fixed" by prelink in two different ways.
====================================
Now, first and foremost, I am also a big proponent of LXC-container and do want this to succeed as this is the right unit-of-work from portability standpoint versus traditional heavyweight hypervisor based unit-of-work VM model. To that regards, Docker folks are trying to solve this problem by enabling LXC-Container capabilities across various flavors of Linux based operating systems and dependency issues etc. However, there needs to be more work done so that Docker like abstraction can provide OVF (Open Virtualization Format for hypervisor based VMs) like seamless compatibility across various LXC-Containers environment, supporting all kinds of dependencies, injections etc.
This is something very near and dear to my heart from IEEE Intercloud standards development perspectives. As part of the IEEE Intercloud standardization effort, so far we have made a good progress on the Interoperability standards across clouds (using Ontology definitions, Trust Management, etc). However, the portability issue we have kind of shoved it under the rug as traditional VM-Mobility like portability is not really viable across geographically dispersed cloud computing environment.
This is a great opportunity for us to define standardization around lightweight container based unit-of-work, something like OVF, to make this whole thing work. I am not sure, who to contact to this regards. Maybe we should loop in Docker folks into all this as well. Or maybe they are already working towards that, I am not sure.
I hope this all makes sense. Thanks all.
Regards,
Deepak K. Vij
-----Original Message-----
From: Pavel Emelyanov [mailto:xemul at parallels.com]
Sent: Tuesday, April 29, 2014 4:13 AM
To: James Bottomley; Deepak Vij (A)
Cc: criu at openvz.org
Subject: Re: [CRIU] CRIU LXC Container Live Migration Concerns
On 04/29/2014 03:47 AM, James Bottomley wrote:
> On Mon, 2014-04-28 at 17:13 +0000, Deepak Vij (A) wrote:
>> Thanks Pavel for your prompt response. This makes sense as your
>> approach seems to be solely leveraging Kernel APIs in the user space
>> instead of raw checkpoint/restore mechanism. My concern was what if
>> the address space of the Kernel changes after an emergency Kernel
>> patch (security, performance related patch etc.). Although, this may
>> not be the issue as the Kernel uses virtual addresses instead of
>> physical addresses.
>
> The kernel address space doesn't matter. We only use user APIs for this
> and kernel addresses are never exported to users. A restore image has
> process image layout information from the userspace perspective plus
> kernel state, but the kernel state doesn't have any kernel addresses.
Yes, kernel can be at any address. Moreover, kernel does move its objects,
that can be used (via system calls) by applications, from one address to
another, so this sort of changes is absolutely expected.
>> Also, I saw that you folks included some new Kernel level APIs within
>> the mainline Linux Kernel to make the end-to-end container based
>> checkpoint/restore possible.
>>
>> I also saw that you folks initially tried in-kernel implementation of
>> checkpoint/restore and later on abandoned this approach due to the
>> additional overall complexity added to the kernel.
>>
>> Although, the container based virtualization environment still has the
>> fundamental problem of Kernel mismatch or OS mismatch as the
>
> Kernel mismatch isn't a problem; as Pavel previously told you we can
> migrate from different kernel versions on source and target.
>
> As far as OpenVZ is concerned, OS mismatch doesn't matter either because
> the migrated container contains all the OS pieces it needs.
Agree. This is, by the way, one of OpenVZ and Docker "features" -- they both
create an environment for applications running inside it, that doesn't depend
on anything outside it, which include the kernel as well.
>> LXC/Docker container created and tested on the source machine
>> environment may not be same as the destination machine environment.
>> The source and destination environments need to be controlled and
>> coordinated in order to ensure the overall container based
>> virtualization portability across machines, datacenters and clouds.
>> Hypervisor based virtualization, on the other hand, does not have this
>> problem as the VM comes packaged with the source operating system
>> within it. Although, from mobility perspectives, moving hypervisor
>> based VM is like moving an elephant versus moving lightweight
>> containers.
>
> OK, so reading this I think you're not really clear on what a container
> is; containers can be built in many different ways, so if you're asking
> about their properties, you need to define what sort of container first.
> To clarify, the question I think you're asking, which can be asked
> without even resorting to containers, is "if we migrate a single
> process, which depends on a dynamic object, from one system to another,
> what happens if the version of the dynamic object is different on one
> system from the other?" The answer is a bit speculative, because that's
> not currently the use cases we're testing.
>
> The answer depends on how the dynamic relocation was done. If it was
> done all at once (LD_BIND_NOW) then all the relocations are done and
> point to mapped pages in the dynamic object; when we migrate, we pull
> across all the mapped pages so we pull across all the pages needed for
> the binary to access the library and everything works fine.
>
> If the relocation uses lazy binding (the default), then there may be
> unresolved symbols sitting in the PLT and these may need resolving as
> the binary runs. On linux, we use versioned symbols, so resolution will
> always work if the process is migrated to a newer version of the dynamic
> shared object. However, if the process is migrated to an older version
> and needs to resolve a symbol that doesn't exist, it will take a bus
> fault and die.
>
> What this means is if you can't guarantee the migration environment, you
> need to start a process to be migrated with LD_BIND_NOW=1.
Hm... Good question -- what if after migration all the system libraries have
changed. Yes, this _is_ a problem. E.g. I've seen that applications crash after
migration between two nodes with glibcs of equal versions, but "fixed" by
prelink in two different ways.
And we don't have (yet) any protection against this.
But yet again -- this is natural live-migration limitation. Files (including
those with libraries) should be the same on source and destination nodes.
>> Lastly, I am also chair of the IEEE Intercloud P2302 standards
>> initiative. Workload portability is one of the main issue we are
>> grappling with. Would it be possible for me to quickly chat with you
>> on the phone to this regards. Please let me know, my cell is
>> 408-806-6182. I would really appreciate that. Thanks.
>
> Pavel is in Russia, so Timezones are a problem for phone and email would
> be much better. Plus all the CRIU experts see the questions on the
> mailing list, so anyone can answer.
Yes, e-mails are preferred, as more people would be involved. But if phone or
skype conversation is really required, I think we can arrange some time slot
suitable for everyone.
> Just to set expectations, though, CRIU is a checkpoint and restore
> project. Resolving environmental expectations (including IP addresses
> and the like on migration) is the responsibility of whatever harness is
> making use of CRIU for migration.
Exactly. By the way, this is what we try to address with the P.Haul project [1]
that does live migration using CRIU.
Thanks,
Pavel
[1] https://github.com/xemul/p.haul
More information about the CRIU
mailing list