[CRIU] CRIU LXC Container Live Migration Concerns

Tue Apr 29 04:12:59 PDT 2014

On 04/29/2014 03:47 AM, James Bottomley wrote:
> On Mon, 2014-04-28 at 17:13 +0000, Deepak Vij (A) wrote:
>> Thanks Pavel for your prompt response. This makes sense as your
>> approach seems to be solely leveraging Kernel APIs in the user space
>> instead of raw checkpoint/restore mechanism. My concern was what if
>> the address space of the Kernel changes after an emergency Kernel
>> patch (security, performance related patch etc.). Although, this may
>> not be the issue as the Kernel uses virtual addresses instead of
>> physical addresses.
> 
> The kernel address space doesn't matter.  We only use user APIs for this
> and kernel addresses are never exported to users.  A restore image has
> process image layout information from the userspace perspective plus
> kernel state, but the kernel state doesn't have any kernel addresses.

Yes, kernel can be at any address. Moreover, kernel does move its objects,
that can be used (via system calls) by applications, from one address to
another, so this sort of changes is absolutely expected.

>>  Also, I saw that you folks included some new Kernel level APIs within
>> the mainline Linux Kernel to make the end-to-end container based
>> checkpoint/restore possible.
>>
>> I also saw that you folks initially tried in-kernel implementation of
>> checkpoint/restore and later on abandoned this approach due to the
>> additional overall complexity added to the kernel.
>>
>> Although, the container based virtualization environment still has the
>> fundamental problem of Kernel mismatch or OS mismatch as the
> 
> Kernel mismatch isn't a problem; as Pavel previously told you we can
> migrate from different kernel versions on source and target.
> 
> As far as OpenVZ is concerned, OS mismatch doesn't matter either because
> the migrated container contains all the OS pieces it needs.

Agree. This is, by the way, one of OpenVZ and Docker "features" -- they both
create an environment for applications running inside it, that doesn't depend
on anything outside it, which include the kernel as well.

>>  LXC/Docker container created and tested on the source machine
>> environment may not be same as the destination machine environment.
>> The source and destination environments need to be controlled and
>> coordinated in order to ensure the overall container based
>> virtualization portability across machines, datacenters and clouds.
>> Hypervisor based virtualization, on the other hand, does not have this
>> problem as the VM comes packaged with the source operating system
>> within it. Although, from mobility perspectives, moving hypervisor
>> based VM is like moving an elephant versus moving lightweight
>> containers.
> 
> OK, so reading this I think you're not really clear on what a container
> is; containers can be built in many different ways, so if you're asking
> about their properties, you need to define what sort of container first.
> To clarify, the question I think you're asking, which can be asked
> without even resorting to containers, is "if we migrate a single
> process, which depends on a dynamic object, from one system to another,
> what happens if the version of the dynamic object is different on one
> system from the other?"  The answer is a bit speculative, because that's
> not currently the use cases we're testing.
> 
> The answer depends on how the dynamic relocation was done.  If it was
> done all at once (LD_BIND_NOW) then all the relocations are done and
> point to mapped pages in the dynamic object; when we migrate, we pull
> across all the mapped pages so we pull across all the pages needed for
> the binary to access the library and everything works fine.
> 
> If the relocation uses lazy binding (the default), then there may be
> unresolved symbols sitting in the PLT and these may need resolving as
> the binary runs.  On linux, we use versioned symbols, so resolution will
> always work if the process is migrated to a newer version of the dynamic
> shared object.  However, if the process is migrated to an older version
> and needs to resolve a symbol that doesn't exist, it will take a bus
> fault and die.
> 
> What this means is if you can't guarantee the migration environment, you
> need to start a process to be migrated with LD_BIND_NOW=1.

Hm... Good question -- what if after migration all the system libraries have
changed. Yes, this _is_ a problem. E.g. I've seen that applications crash after
migration between two nodes with glibcs of equal versions, but "fixed" by
prelink in two different ways.

And we don't have (yet) any protection against this.

But yet again -- this is natural live-migration limitation. Files (including
those with libraries) should be the same on source and destination nodes.

>> Lastly, I am also chair of the IEEE Intercloud P2302 standards
>> initiative. Workload portability is one of the main issue we are
>> grappling with. Would it be possible for me to quickly chat with you
>> on the phone to this regards. Please let me know, my cell is
>> 408-806-6182. I would really appreciate that. Thanks.
> 
> Pavel is in Russia, so Timezones are a problem for phone and email would
> be much better.  Plus all the CRIU experts see the questions on the
> mailing list, so anyone can answer.

Yes, e-mails are preferred, as more people would be involved. But if phone or
skype conversation is really required, I think we can arrange some time slot
suitable for everyone.

> Just to set expectations, though, CRIU is a checkpoint and restore
> project.  Resolving environmental expectations (including IP addresses
> and the like on migration) is the responsibility of whatever harness is
> making use of CRIU for migration.

Exactly. By the way, this is what we try to address with the P.Haul project [1]
that does live migration using CRIU.

Thanks,
Pavel

[1] https://github.com/xemul/p.haul