[CRIU] [Process Migration using Sockets v4 - Patch 1/3]

Rodrigo Bruno rbruno at gsd.inesc-id.pt
Thu Dec 17 03:30:26 PST 2015


Hi,

the commit messages are below.

The code I sent in the previous emails (v4) is a patch agains a version of CRIU
from 2 months ago. If you need, I can pull all the updates from CRIU and apply
the patch.

I will now start working on improving the tests.

Here goes the commit message for patch #1:

The current patch allows CRIU to perform in-memory remote live migrations. 
Snapshot images are stored in memory and sent to a remote node, where they are 
stored in memory until the process is restored. This can be activated by using
'--remote' in the command line arguments for launching CRIU 
Dump/Pre-Dump/Restore.

Existing CRIU code is changed to open socket connections instead of real disk
files. The rest of the code works with file descriptors and does not notice the
difference between local files and sockets.

When a file read or write request is performed by CRIU Dump/Pre-Dump, a local
connection is opened to the image proxy. This component keeps images in memory
and also forwards them to the destination node, where the image cache receives
them. The file descriptor of the local connection to the image proxy is 
returned to CRIU and is used as a regular file descriptor.

When a file read request is performed by CRIU Restore, a local connection is 
opened to the image cache. This component keeps images (received from the image
proxy) in memory and sends them to CRIU Restore when asked. The file descriptor
of the local connection to the image cache is returned to CRIU and is used as
a regular file descriptor.

The overall communication is performed as follows:

CRIU Dump/Pre-Dump <--> image proxy --> image cache --> CRIU Restore

Note 1: CRIU Dump/Pre-dump and the image proxy are at the source node, where 
the process to migrate is running;
Note 2: image cache and CRIU Restore are at the destination node, where the
process will be migrated to;
Note 3: the implementation of the image proxy and image cache, as well as the
communication between them is not described in this patch;
Note 4: communication between CRIU Dump/Pre-Dump and image proxy, and CRIU 
Restore and image cache is performed using UNIX sockets. Each file open request 
is served with a new, independent connection.
Note 5: both the image proxy and the image cache have a UNIX server socket 
open. The path to this socket can be used to open read, write, and append 
connections. The path where the server socket will be listening for connections
can be redefined by the command line arguments given by the user.

Read Operations from CRIU Dump/Pre-Dump/Restore use the following protocol:
1 - open UNIX socket to local image proxy (for CRIU Dump/Pre-Dump) or image
cache (for CRIU Restore)
2 - write request header (protobuf object) containing the image identifier, and 
the open mode;
3 - read request reply (protobuf object) containing an error code;
4 - return the file descriptor of the UNIX socket connection if the error code
does not report an error. Otherwise fail;
5 - the sender side (image proxy or image cache) will close the UNIX socket
when all bytes of the image have been sent.

Write Operations from CRIU Dump/Pre-Dump use the following protocol:
1 - open UNIX socket to local image proxy;
2 - write request header (protobuf object) containing the image identifier, and
the open mode;
3 - return the file descriptor of the UNIX socket connection;
4 - the sender side (CRIU Dump/Pre-Dump) will close the UNIX socket when there
are no more bytes to write (this is done transparently by CRIU).

The image identifier if composed by the image name, and snapshot_id. The 
snapshot_id represents the identifier of the snapshot that created that image.
For example, a CRIU Pre-Dump and a CRIU Dump might create two images with the
same name. Therefore, we need to distinguish both images. We use the image 
directory given by the used to provide a snapshot_id. CRIU code is changed to 
be able to open images from parent snapshots. These images are read from the
image proxy (for CRIU Dump/Pre-Dump) and image cache (for CRIU Restore).


And now the commit message for patch #2:

The current patch brings the implementation of the image proxy and image cache.
These components are necessary to perform in-memory live migration of processes
using CRIU. The image proxy receives images from CRIU Dump/Pre-Dump (through
UNIX sockets) and forwards them to the image cache (through a TCP socket). The
image cache caches image in memory and sends them to CRIU Restore (through
UNIX sockets) when requested.

The communication between the image proxy and the image cache is 
unidirectional, the image proxy sends and the image cache receives. All images
sent to the image proxy are forwarded to the image cache through a single TCP
connection (than can be inherited). Images are forwarded (from proxy to cache) 
only after being successfully received from the CRIU Dump/Pre-Dump.

The forwarding (proxy to cache) protocol goes as follows:
1 - write remote header (protobuf object) with snapshot_id, image path, and
image size (note that we know the size of the image because we received it
locally and all its content is in memory);
2 - send the image content;

Note 1: the connection used to send images from proxy to cache is not closed
because it might be used for future images. This connection is closed when
the CRIU Dump is finished and we are shore that no more images are coming;

Note 2: the image cache knowns when each image ends because it receives the 
size of the image in advance. Therefore, after reading 'size' bytes, it will
expect a new remote header.

The image proxy and the image cache might receive read requests for images that
are still not in memory or that will never exist. Both the image proxy and the
image cache handle these requests differently. 

The image proxy simply returns an ENOENT error if the image is not cached. This
can happen when a previous subprocess was reported by a CRIU Pre-Dump but no
longer exists when the CRIU Dump takes place.

The image cache performs additional checks. If the connection between proxy and
cache is closed and all images have been received, the cache returns an ENOENT
error. On the other hand, it will wait for the image to be received. If it is 
never received, when the connection between proxy and cache is closed, the 
request will received an ENOENT reply.

On Tue, 15 Dec 2015 13:56:02 +0300
Pavel Emelyanov <xemul at parallels.com> wrote:

> On 12/12/2015 01:18 AM, Rodrigo Bruno wrote:
> > Hi, 
> > 
> > I think I fixed all the comments you added in the last version of the patch.
> > 
> > It took me a little longer to send the new version because a I found a bug when
> > a subprocess that was snapshoted by a predump no longer exists when the dump takes
> > place. I was not considering that situation (but now it is perfectly okey).
> > 
> > I also took some time to use the checkpatch.pl from linux. I fixed all the code.
> > 
> > Here is the first part of the patch:
> 
> OK, great :) I think this version is ready for merging. To do that I need
> from you a patch commit message -- a good description of what's going on,
> protocol details, all that you wrote before, but what was left in previous
> e-mails. Please, send me one as a plain text, I'll paste it in the patch
> before committing.
> 
> The need for patch commit message also exists for patch #2, so please do
> it as well.
> 
> Meanwhile I'll split the patch #1 and commit some pieces that make sense
> as themselves (next time, please, do it yourself in advance).
> 
> As far as patch #3 is concerned -- such small test for such a huge change
> is too little :( Please, look at test/zdtm.py launcher, there's an ability
> run tests using page-server, change this launcher so that there appears a
> mode to dump via page cache and proxy.
> 
> -- Pavel

-- 
Rodrigo Bruno <rbruno at gsd.inesc-id.pt>


More information about the CRIU mailing list