[CRIU] Attempt for process migration

Tue Mar 6 07:51:04 EST 2012

On Tue, Mar 06, 2012 at 04:00:41PM +0400, Pavel Emelyanov wrote:
> > Now I looked into migrating one
> > process from one machine to another. I written some code but before I am
> > going to continue I wanted to ask if there has been already any
> > discussion in that direction? 
> 
> Not really. The plan is to do it similar to how we do it with the openvz -- we
> just put the dump files locally and then copy them to the destination node
> with plain scp (or use NFS for this).
> 
> I actually planned to use the similar with criu, but I'm open for discussion.

We are interested in migrating processes in an HPC environment. We have
shared storage on all our nodes (lustre, nfs) but we really want to
avoid putting more pressure on the storage backend. Therefore we want to
do it directly over sockets without any disk.

> The other thing around it is that the images files in criu are now considered
> to be seek-able at the restore time which is not so for sockets, so your
> approach will require more fixes on the restore code.
> 
> The other problem with this set is that for every single image file we have
> to set-up a TCP connection from the beginning. This is not very good, I suppose,
> since typically image files are several bytes, thus the handshake latency will
> kill all the performance.

That is why I am I right now only using one connection.

> And one more thing about this all -- we haven't yet thought of (but will need
> to for sure) what to do with the filesystem used by tasks we're migrating. IOW,
> you should make sure the filesystem as it was at the dumping time is the same
> as it is at the restoration time. For openvz we use rsync for this, but it's
> not necessarily -- we can assume tasks are on NFS or some sort of mirroring
> like drbd is used.

In our case we only have shared file systems.

> Plus :) we have plans to implement the preliminary working set migration for criu.
> This is -- you take the apps memory, push it to the destination host, then freeze
> the tasks and push only the part of memory that has changed since last time (the
> kernel support is not up to this, but we'll patch it). If we're going to write
> images right into the opened connection, then we should think how to mix it with
> the pre-migrating data as well.

That also sounds nice.

> > Also, the current code makes it hard to
> > implement migration over TCP and therefore I wanted some feedback before
> > continuing. Not that I am doing it completely wrong. Attached is a patch
> > which I am using to write the image to another machine.
> > 
> > It basically works that way:
> > 
> >  * user supplies with -r [host:port] on which machine a server is
> >    listening to receive the checkpoint image
> > 
> >  * after that I changed cr_fdset_open() to open a socket instead of
> >    a file to which the checkpoint can be transferred
> > 
> > So far it still seems correct. Unfortunately I had to add at the opts
> > structure to many functions as parameter to have it available in the
> > cr_fdset_open() function. The problem I have with cr_fdset_open() is
> > that it is writing the magic string just after the file has been opened
> > and that is something I cannot do with the socket if I want to use only
> > one TCP connection for the complete migration. Would it be somehow
> > possible to move the writing of the magic string to the low level
> > write_img_buf() function? Is it at all possible with the current code to
> > use one socket to write all images serialized over the network to
> > another machine?
> 
> Frankly speaking the whole cr_fdset machinery is rather raw at the moment. It was
> adopted from the very beginning for simplicity, then was cleaned-out several times,
> but we're open for changing it this way or that :)

Sound promising.

> But the biggest problem with it is not about where to write magics into it, but,
> as I said -- we assume the images to be seek-able and this is the biggest obstacle
> to live migration implemented in this way.

Okay.

> > Does it make any sense how I started with this?
> > 
> > Are there ideas how correctly implement this?
> 
> As I said, the existing openvz scheme looks very sane to me (put files locally and
> scp them), but if you see any problems with this, please share.
> 
> If implementing the ability to write images into a socket, then, first of all, we
> need to fix the restore not to rely on seek. Then, I assume, we should take care
> of developing some "migration protocol" which will allow us to push data we want
> over the socket on the remote box. And it will most likely differ from simple
> <type><lenght><payload> set of packets.
> 
> The Cyrill's proposal about writing +1 abstraction level looks sane, we we'll also
> have to integrate criu with openvz live migration which has its own image format.

It sounds like it is okay if I continue implementing it. I will look at
implementing the abstraction level mentioned and what to do about
restore without seeking. I will post my progress here and we can then
decide if that is the right direction or if it needs to be done another
way.

		Adrian