[CRIU] CRIU LXC Container Live Migration Concerns

Mon Apr 28 16:47:33 PDT 2014

On Mon, 2014-04-28 at 17:13 +0000, Deepak Vij (A) wrote:
> Thanks Pavel for your prompt response. This makes sense as your
> approach seems to be solely leveraging Kernel APIs in the user space
> instead of raw checkpoint/restore mechanism. My concern was what if
> the address space of the Kernel changes after an emergency Kernel
> patch (security, performance related patch etc.). Although, this may
> not be the issue as the Kernel uses virtual addresses instead of
> physical addresses.

The kernel address space doesn't matter.  We only use user APIs for this
and kernel addresses are never exported to users.  A restore image has
process image layout information from the userspace perspective plus
kernel state, but the kernel state doesn't have any kernel addresses.

>  Also, I saw that you folks included some new Kernel level APIs within
> the mainline Linux Kernel to make the end-to-end container based
> checkpoint/restore possible.
> 
> I also saw that you folks initially tried in-kernel implementation of
> checkpoint/restore and later on abandoned this approach due to the
> additional overall complexity added to the kernel.
> 
> Although, the container based virtualization environment still has the
> fundamental problem of Kernel mismatch or OS mismatch as the

Kernel mismatch isn't a problem; as Pavel previously told you we can
migrate from different kernel versions on source and target.

As far as OpenVZ is concerned, OS mismatch doesn't matter either because
the migrated container contains all the OS pieces it needs.

>  LXC/Docker container created and tested on the source machine
> environment may not be same as the destination machine environment.
> The source and destination environments need to be controlled and
> coordinated in order to ensure the overall container based
> virtualization portability across machines, datacenters and clouds.
> Hypervisor based virtualization, on the other hand, does not have this
> problem as the VM comes packaged with the source operating system
> within it. Although, from mobility perspectives, moving hypervisor
> based VM is like moving an elephant versus moving lightweight
> containers.

OK, so reading this I think you're not really clear on what a container
is; containers can be built in many different ways, so if you're asking
about their properties, you need to define what sort of container first.
To clarify, the question I think you're asking, which can be asked
without even resorting to containers, is "if we migrate a single
process, which depends on a dynamic object, from one system to another,
what happens if the version of the dynamic object is different on one
system from the other?"  The answer is a bit speculative, because that's
not currently the use cases we're testing.

The answer depends on how the dynamic relocation was done.  If it was
done all at once (LD_BIND_NOW) then all the relocations are done and
point to mapped pages in the dynamic object; when we migrate, we pull
across all the mapped pages so we pull across all the pages needed for
the binary to access the library and everything works fine.

If the relocation uses lazy binding (the default), then there may be
unresolved symbols sitting in the PLT and these may need resolving as
the binary runs.  On linux, we use versioned symbols, so resolution will
always work if the process is migrated to a newer version of the dynamic
shared object.  However, if the process is migrated to an older version
and needs to resolve a symbol that doesn't exist, it will take a bus
fault and die.

What this means is if you can't guarantee the migration environment, you
need to start a process to be migrated with LD_BIND_NOW=1.

> Lastly, I am also chair of the IEEE Intercloud P2302 standards
> initiative. Workload portability is one of the main issue we are
> grappling with. Would it be possible for me to quickly chat with you
> on the phone to this regards. Please let me know, my cell is
> 408-806-6182. I would really appreciate that. Thanks.

Pavel is in Russia, so Timezones are a problem for phone and email would
be much better.  Plus all the CRIU experts see the questions on the
mailing list, so anyone can answer.

Just to set expectations, though, CRIU is a checkpoint and restore
project.  Resolving environmental expectations (including IP addresses
and the like on migration) is the responsibility of whatever harness is
making use of CRIU for migration.

James