[CRIU] Process Migration Using Sockets - PATCH

Pavel Emelyanov xemul at parallels.com
Mon Sep 7 03:34:54 PDT 2015


On 09/02/2015 11:24 PM, Rodrigo Bruno wrote:

Ridrigo, thanks for the patch. Sorry for the late response, I was busy with the
1.7 release. Now it's finished and we have some time for new cool features :)

Find my comments inline.

> The patch is listed below. The idea is to migrate processes without using disk-backed
> images. Files used by these processes still need to be shared (NFS for example) to 
> enable full live migration. In future these files could also be transferred using 
> sockets.
> 
> Two new entities are introduced: the image-proxy, and the image-cache. The image-proxy
> receives the image files from the dump process and forwards them to the image-cache. 
> The image-cache waits for requests from  the restore process.

Can you shed a little bit more light on this: what's the way image-proxy gets images
from criu dump and what's the way criu restore gets images from image-cache? Are these
just unix sockets?

> Example:
> 
> Target Node:
> criu image-cache -vvv -o /tmp/image-cache.log --port <cache port> < /dev/null &
> sudo criu restore -D /tmp/dump -d -vvvv -o /tmp/restore.log  --remote && echo OK
> 
> Source Node:
> criu image-proxy -vvv -o /tmp/image-proxy.log --port <cache port> --address <target node> < /dev/null &
> sudo criu pre-dump -D /tmp/pre-dump -d -vvvv -o /tmp/pre-dump.log -t $pid --remote
> sudo criu dump -D /tmp/dump -d -vvvv -o /tmp/dump.log -t $pid --remote  --prev-images-dir /tmp/pre-dump --track-mem
> 
> The code is also available at https://github.com/rodrigo-bruno/criu (forked from CRIU).
> 
> You can also test it locally. I have been using this to migrate OpenJDK processes.
> If you ever decide to use this code, I would be glad to help, provide bug fixes, etc.

I'd also appreciate if you split the patch into set. First there must go changes in
the existing criu code that prepare one for easier further patches, then the component
by component new stuff. E.g. first goes image-cache, then image-proxy, then changes
in the dump code to support proxy, then changes in the restorer code to support cache.

> Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt>
> 
> diff -uprN criu-source/cr-dedup.c criu-patch/cr-dedup.c
> --- criu-source/cr-dedup.c	2015-09-01 20:34:37.042773339 +0100
> +++ criu-patch/cr-dedup.c	2015-09-02 02:22:45.725920125 +0100
> @@ -11,6 +11,7 @@
>  
>  static int cr_dedup_one_pagemap(int pid);
>  
> +// TODO - Eventually patch this for remote usage?

Please, use /* */ style comments.

>  int cr_dedup(void)
>  {
>  	int close_ret, ret = 0;
> diff -uprN criu-source/cr-dump.c criu-patch/cr-dump.c
> --- criu-source/cr-dump.c	2015-09-01 20:34:37.050773528 +0100
> +++ criu-patch/cr-dump.c	2015-09-02 02:37:15.993970004 +0100
> @@ -1550,6 +1552,10 @@ err:
>  	if (disconnect_from_page_server())
>  		ret = -1;
>  
> +        if (opts.remote) {

Something has happened with tab indentation.

> +            finish_remote_dump();
> +        }
> +
>  	close_cr_imgset(&glob_imgset);
>  
>  	if (bfd_flush_images())

> diff: criu-source/crtools: No such file or directory
> diff: criu-patch/crtools: No such file or directory
> diff -uprN criu-source/crtools.c criu-patch/crtools.c
> --- criu-source/crtools.c	2015-09-01 20:34:37.054773617 +0100
> +++ criu-patch/crtools.c	2015-09-02 03:05:47.229581153 +0100
> @@ -42,6 +42,8 @@
>  
>  #include "setproctitle.h"
>  
> +#include "image-remote.h"
> +
>  struct cr_options opts;
>  
>  void init_opts(void)
> @@ -60,6 +62,8 @@ void init_opts(void)
>  	opts.cpu_cap = CPU_CAP_DEFAULT;
>  	opts.manage_cgroups = CG_MODE_DEFAULT;
>  	opts.ps_socket = -1;
> +	opts.addr = PROXY_FWD_HOST;
> +	opts.ps_port = CACHE_PUT_PORT;

You reuse the existing opts fields. How would this correlate with the --page-server
code?

>  	opts.ghost_limit = DEFAULT_GHOST_LIMIT;
>  }
>  

> diff -uprN criu-source/image.c criu-patch/image.c
> --- criu-source/image.c	2015-09-01 20:34:37.058773708 +0100
> +++ criu-patch/image.c	2015-09-02 02:57:48.502419478 +0100
> @@ -336,6 +347,72 @@ static int do_open_image(struct cr_img *
>  	if (imgset_template[type].magic == RAW_IMAGE_MAGIC)
>  		goto skip_magic;
>  
> +	if (flags == O_RDONLY) {
> +		ret = img_check_magic(img, oflags, type, path);
> +        }
> +	else {
> +		ret = img_write_magic(img, oflags, type);
> +        }
> +	if (ret)
> +		goto err;
> +
> +skip_magic:
> +	return 0;
> +
> +err:
> +	return -1;
> +}
> +
> +static int do_open_remote_image(struct cr_img *img, int dfd, int type, unsigned long oflags, char *path)
> +{
> +	int ret, flags;
> +
> +	flags = oflags & ~(O_NOBUF | O_SERVICE);
> +        
> +        if(dfd == get_service_fd(IMG_FD_OFF) || dfd == -1)
> +            dfd = get_current_namespace_fd();

I didn't quite get the idea of namespaces. Can you descibe it in more details, please?

> +        
> +        // TODO - fix this. Find out what is the purpose of this file.
> +        if(!strcmp("irmap-cache", path)) {
> +            ret = -1;
> +        }
> +        else if(get_namespace(dfd) == NULL) {
> +            ret = -1;
> +        }
> +        else if (flags == O_RDONLY) {
> +            pr_info("do_open_remote_image RDONLY path=%s namespace=%s\n", 
> +                    path, get_namespace(dfd));
> +            ret = get_remote_image_connection(get_namespace(dfd), path);
> +        }
> +        else {
> +            pr_info("do_open_remote_image WDONLY path=%s namespace=%s\n", 
> +                    path, get_namespace(dfd));
> +            ret = open_remote_image_connection(get_namespace(dfd), path);
> +        }
> +        
> +        if (ret < 0) {
> +            pr_info("No %s (dfd=%d) image\n", path, dfd);
> +            img->_x.fd = EMPTY_IMG_FD;
> +            goto skip_magic;
> +	}
> +        
> +
> +	img->_x.fd = ret;
> +	if (oflags & O_NOBUF)
> +		bfd_setraw(&img->_x);
> +	else {
> +		if (flags == O_RDONLY)
> +			ret = bfdopenr(&img->_x);
> +		else
> +			ret = bfdopenw(&img->_x);
> +
> +		if (ret)
> +			goto err;
> +	}
> +
> +	if (imgset_template[type].magic == RAW_IMAGE_MAGIC)
> +		goto skip_magic;
> +
>  	if (flags == O_RDONLY)
>  		ret = img_check_magic(img, oflags, type, path);
>  	else

> diff -uprN criu-source/image-remote.c criu-patch/image-remote.c
> --- criu-source/image-remote.c	1970-01-01 01:00:00.000000000 +0100
> +++ criu-patch/image-remote.c	2015-09-02 02:18:33.548099686 +0100
> @@ -0,0 +1,281 @@
> +#include <unistd.h>
> +#include <stdlib.h>
> +#include <sys/types.h> 
> +#include <sys/socket.h>
> +#include <netinet/in.h>
> +#include <netdb.h>
> +
> +#include <pthread.h>
> +#include <semaphore.h>
> +
> +#include "criu-log.h"
> +#include "image-remote.h"
> +
> +// TODO - fix space limitation
> +static char parents[PATHLEN][PATHLEN]; 
> +static int  parents_occ = 0;
> +static char* namespace = NULL;
> +// TODO - not used for now. It will be used if we implement a shared cache and proxy.
> +static char* parent = NULL; 
> +
> +int setup_local_client_connection(int port) 
> +{
> +        int sockfd;
> +        struct sockaddr_in serv_addr;
> +        struct hostent *server;
> +
> +        sockfd = socket(AF_INET, SOCK_STREAM, 0);
> +        if (sockfd < 0) {
> +                pr_perror("Unable to open remote image socket to img cache");
> +                return -1;
> +        }
> +
> +        server = gethostbyname(DEFAULT_HOST);
> +        if (server == NULL) {
> +                pr_perror("Unable to get host by name (%s)", DEFAULT_HOST);
> +                return -1;
> +        }
> +
> +        bzero((char *) &serv_addr, sizeof (serv_addr));
> +        serv_addr.sin_family = AF_INET;
> +        bcopy((char *) server->h_addr,
> +              (char *) &serv_addr.sin_addr.s_addr,
> +              server->h_length);
> +        serv_addr.sin_port = htons(port);
> +
> +        if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) {
> +                pr_perror("Unable to connect to remote restore host %s", DEFAULT_HOST);
> +                return -1;
> +        }
> +
> +        return sockfd;
> +}
> +
> +int write_header(int fd, char* namespace, char* path)

You seem to be using text-based protocol for images transfers, don't you?

> +{
> +        if (write(fd, path, PATHLEN) < 1) {
> +                pr_perror("Unable to send path to remote image connection");
> +                return -1;
> +        }
> +
> +        if (write(fd, namespace, PATHLEN) < 1) {
> +                pr_perror("Unable to send namespace to remote image connection");
> +                return -1;
> +        } 
> +        return 0;
> +}
> +

> diff -uprN criu-source/page-read.c criu-patch/page-read.c
> --- criu-source/page-read.c	2015-09-01 20:34:37.082774260 +0100
> +++ criu-patch/page-read.c	2015-09-02 02:21:29.616164017 +0100
> @@ -10,6 +10,8 @@
>  #include "protobuf.h"
>  #include "protobuf/pagemap.pb-c.h"
>  
> +#include "image-remote.h"
> +
>  #ifndef SEEK_DATA
>  #define SEEK_DATA	3
>  #define SEEK_HOLE	4
> @@ -90,8 +92,17 @@ static void skip_pagemap_pages(struct pa
>  		return;
>  
>  	pr_debug("\tpr%u Skip %lx bytes from page-dump\n", pr->id, len);
> -	if (!pr->pe->in_parent)
> -		lseek(img_raw_fd(pr->pi), len, SEEK_CUR);
> +	if (!pr->pe->in_parent) {
> +            if(opts.remote) {
> +                    if(skip_remote_bytes(img_raw_fd(pr->pi), len) < 0)
> +                            pr_perror("Unable to seek remote bytes");
> +            }
> +            else {
> +                    if(lseek(img_raw_fd(pr->pi), len, SEEK_CUR) < 0)
> +                            pr_perror("Unable to lseek");
> +            }
> +            	
> +        }

The page-read engine is already modularized. Don't introduce if()-s in the
existing code, just add new set of options. The open_page_read() selects
one of them.

>  	pr->cvaddr += len;
>  }
>  

> diff -uprN criu-source/page-xfer.c criu-patch/page-xfer.c
> --- criu-source/page-xfer.c	2015-09-01 20:34:37.082774260 +0100
> +++ criu-patch/page-xfer.c	2015-09-02 02:21:44.968518366 +0100
> @@ -728,13 +730,21 @@ static int open_page_local_xfer(struct p
>  		int ret;
>  		int pfd;
>  
> -		pfd = openat(get_service_fd(IMG_FD_OFF), CR_PARENT_LINK, O_RDONLY);
> -		if (pfd < 0 && errno == ENOENT)
> -			goto out;
> +		if(opts.remote) {
> +                        pfd = get_current_namespace_fd() - 1;
> +                        if(get_namespace(pfd) == NULL)
> +                                goto out;
> +                }
> +                else {
> +                        pfd = openat(get_service_fd(IMG_FD_OFF), CR_PARENT_LINK, O_RDONLY);
> +                        if (pfd < 0 && errno == ENOENT)
> +                                goto out;
> +                }

We already have network transfer for pages data. How does this correlate with
the new mode you introduce?

>  
>  		xfer->parent = xmalloc(sizeof(*xfer->parent));
>  		if (!xfer->parent) {
> -			close(pfd);
> +			if(!opts.remote)
> +				close(pfd);
>  			return -1;
>  		}
>  

-- Pavel



More information about the CRIU mailing list