[CRIU] Process Migration using Sockets v2 - Patch 2/2

Rodrigo Bruno rbruno at gsd.inesc-id.pt
Sun Oct 25 11:21:24 PDT 2015


Hi,

On Thu, 15 Oct 2015 14:32:00 +0300
Pavel Emelyanov <xemul at parallels.com> wrote:

> On 10/13/2015 05:14 PM, Rodrigo Bruno wrote:
> > On Mon, 12 Oct 2015 14:56:00 +0300
> > Pavel Emelyanov <xemul at parallels.com> wrote:
> > 
> >> On 10/12/2015 02:11 PM, Rodrigo Bruno wrote:
> >>> On Mon, 12 Oct 2015 12:38:43 +0300
> >>> Pavel Emelyanov <xemul at parallels.com> wrote:
> >>>
> >>>> On 10/09/2015 05:46 PM, Rodrigo Bruno wrote:
> >>>>> Hi,
> >>>>>
> >>>>> here goes the second part.
> >>>>
> >>>> Thanks, this time it's more clear. Please, find my comments inline.
> >>>>
> >>>>> This patch adds:
> >>>>>
> >>>>> 1. image-cache.c - the component that is on the same node as CRIU restore. It holds 
> >>>>> image files in memory waiting for CRIU restore to ask them
> >>>>>
> >>>>> 2. image-proxy.c - the component that is on the same node as CRIU dump. It holds
> >>>>> image files in memory (because a future dump/predump might need them) and also
> >>>>> forwards them to the image-cache.
> >>>>>
> >>>>> 3. image-remote-pvt.c - implements almost all the image receiving and sending
> >>>>> engine.
> >>>>
> >>>> What does the "pvt" stand for?
> >>>
> >>> Private. Maybe "prvt" would be better. The idead is that existing CRIU code is
> >>> transparent to this code since this is only used by the proxy, cache, and 
> >>> image-remote.
> >>
> >> Ah, I see. Since all this stuff sits in one dir, then private is not applicable here :)
> >> I'd call those img-cache, img-proxy and img-remote.
> > 
> > image-proxy.c -> img-proxy.c
> > image-cache.c -> img-cache.c
> > image-remote-pvt.c -> img-remote.c
> > image-remote.c -> (stays the same?)
> 
> Ouch, we also have the image-remote.c as an API for criu... Maybe like this then:
> 
> image-proxy -> img-proxy
> image-cache -> img-cache
> image-remote-pvt -> img-remote-proto
> image-remote -> img-remote
> 
> ?
> 
> >>
> >>>>
> >>>>> 4. include/image-remote-pvt.h - exports function declarations used by both image-cache
> >>>>> image-proxy and image-remote.
> >>>>>
> >>>>>
> >>>>> Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt>
> >>>>> >From e690070808e6f1b459f033e162c910bc2d02859a Mon Sep 17 00:00:00 2001
> >>>>> From: rodrigo-bruno <rbruno at gsd.inesc-id.pt>
> >>>>> Date: Fri, 9 Oct 2015 15:37:17 +0100
> >>>>> Subject: [PATCH] Process migration using sockets (2/2)
> >>>>>
> >>>>> ---
> >>>>>  image-cache.c              |  54 +++++
> >>>>>  image-proxy.c              |  74 ++++++
> >>>>>  image-remote-pvt.c         | 566 +++++++++++++++++++++++++++++++++++++++++++++
> >>>>>  include/image-remote-pvt.h |  52 +++++
> >>>>>  4 files changed, 746 insertions(+)
> >>>>>  create mode 100644 image-cache.c
> >>>>>  create mode 100644 image-proxy.c
> >>>>>  create mode 100644 image-remote-pvt.c
> >>>>>  create mode 100644 include/image-remote-pvt.h
> >>>>>
> >>>>> diff --git a/image-cache.c b/image-cache.c
> >>>>> new file mode 100644
> >>>>> index 0000000..e2030f4
> >>>>> --- /dev/null
> >>>>> +++ b/image-cache.c
> >>>>> @@ -0,0 +1,54 @@
> >>>>> +#include <unistd.h>
> >>>>> +
> >>>>> +#include "image-remote.h"
> >>>>> +#include "image-remote-pvt.h"
> >>>>> +#include "criu-log.h"
> >>>>> +#include <pthread.h>
> >>>>> +
> >>>>> +static void* cache_remote_image(void* ptr)
> >>>>> +{
> >>>>> +	remote_image* rimg = (remote_image*) ptr;
> >>>>> +
> >>>>> +	if (!strncmp(rimg->path, DUMP_FINISH, sizeof (DUMP_FINISH)))
> >>>>> +	{
> >>>>> +		close(rimg->src_fd);
> >>>>> +		return NULL;
> >>>>> +	}
> >>>>> +
> >>>>> +	prepare_put_rimg();
> >>>>> +	recv_remote_image(rimg->src_fd, rimg->path, &rimg->buf_head, true);
> >>>>> +	finalize_put_rimg(rimg);
> >>>>> +
> >>>>> +	return NULL;
> >>>>> +}
> >>>>> +
> >>>>> +int image_cache(unsigned short cache_put_port)
> >>>>> +{
> >>>>> +	pthread_t get_thr, put_thr;
> >>>>> +	int put_fd, get_fd;
> >>>>> +
> >>>>> +	pr_info("Put Port %d, Get Path %s\n", cache_put_port, CACHE_GET);
> >>>>> +
> >>>>> +	put_fd = setup_TCP_server_socket(cache_put_port);
> >>>>
> >>>> Proxy and cache should be able to work over given connections. Look at the
> >>>> opts.ps_socket field for details how this works in case of page-server.
> >>>
> >>> Okey, this is related to the previous comment on multiple connections. I will
> >>> have a look at it and try to mimic into the proxy-cache communication.
> >>
> >> OK, thanks.
> >>
> >>>>
> >>>>> +	get_fd = setup_UNIX_server_socket(CACHE_GET);
> >>>>> +
> >>>>> +	if (init_daemon(cache_remote_image))
> >>>>> +		return -1;
> >>>>> +
> >>>>> +	if (pthread_create(
> >>>>> +	    &put_thr, NULL, accept_put_image_connections, (void*) &put_fd)) {
> >>>>> +		pr_perror("Unable to create put thread");
> >>>>> +		return -1;
> >>>>> +	}
> >>>>> +	if (pthread_create(
> >>>>> +	    &get_thr, NULL, accept_get_image_connections, (void*) &get_fd)) {
> >>>>> +		pr_perror("Unable to create get thread");
> >>>>> +		return -1;
> >>>>> +	}
> >>>>
> >>>> Can you describe the logic behind the multithreading and sems/mutexes you use?
> >>>
> >>> I use two sems and two mutexes, one sem and one mutex to work with worker threads list,
> >>> and the other mutex and sem to work with the remote image list.
> >>>
> >>> Worker threads are created to serve specific request, either read or write an image.
> >>> When the thread finishes reading or writing the image, it exists. There are a couple
> >>> of auxiliary threads: one for receiving read connections, one for receiving write
> >>> connections, and one for joining other threads.
> >>>
> >>> Mutexes are used to make sure that no one is changing or searching the list at the
> >>> same time. Looking at the code now I realize that I do not allow multiple searches
> >>> in parallel. This can be solved by adding an additional mutex (reader mutex).
> >>>
> >>> Semaphores are used to indicate that a new item was added to the list or, in a 
> >>> special condition for the remote image list, to indicate that no more images will
> >>> come.
> >>
> >> OK, so the semaphores act as wake-up-s, right?
> > 
> > Yes.
> > 
> >>
> >>>>
> >>>>> +
> >>>>> +	join_workers();
> >>>>> +
> >>>>> +	pthread_join(put_thr, NULL);
> >>>>> +	pthread_join(get_thr, NULL);
> >>>>> +	return 0;
> >>>>> +}
> >>>>> diff --git a/image-proxy.c b/image-proxy.c
> >>>>> new file mode 100644
> >>>>> index 0000000..24b8935
> >>>>> --- /dev/null
> >>>>> +++ b/image-proxy.c
> >>>>> @@ -0,0 +1,74 @@
> >>>>> +#include <unistd.h>
> >>>>> +
> >>>>> +#include "image-remote.h"
> >>>>> +#include "image-remote-pvt.h"
> >>>>> +#include "criu-log.h"
> >>>>> +#include <pthread.h>
> >>>>> +
> >>>>> +static char* dst_host;
> >>>>> +static unsigned short dst_port;
> >>>>> +
> >>>>> +static void* proxy_remote_image(void* ptr)
> >>>>> +{
> >>>>> +	remote_image* rimg = (remote_image*) ptr;
> >>>>> +	rimg->dst_fd = setup_TCP_client_socket(dst_host, dst_port);
> >>>>> +	if (rimg->dst_fd < 0) {
> >>>>> +		pr_perror("Unable to open recover image socket");
> >>>>> +		return NULL;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (write_header(rimg->dst_fd, rimg->namespace, rimg->path) < 0) {
> >>>>> +		pr_perror("Error writing header for %s:%s",
> >>>>> +			rimg->path, rimg->namespace);
> >>>>> +		return NULL;
> >>>>> +	}
> >>>>> +
> >>>>> +	prepare_put_rimg();
> >>>>> +
> >>>>> +	if (!strncmp(rimg->path, DUMP_FINISH, sizeof(DUMP_FINISH))) {
> >>>>
> >>>> For control messages I would suggest integer field, rather than magic names
> >>>> in paths :)
> >>>
> >>> Right.
> >>>
> >>>>
> >>>>> +		close(rimg->dst_fd);
> >>>>> +		finalize_put_rimg(rimg);
> >>>>> +		return NULL;
> >>>>> +	}
> >>>>> +	if (recv_remote_image(rimg->src_fd, rimg->path, &(rimg->buf_head), false) < 0) {
> >>>>> +		return NULL;
> >>>>> +	}
> >>>>> +	finalize_put_rimg(rimg);
> >>>>> +	send_remote_image(rimg->dst_fd, rimg->path, &(rimg->buf_head), true);
> >>>>> +	return NULL;
> >>>>> +}
> >>>>> +
> >>>>> +int image_proxy(char* fwd_host, unsigned short fwd_port)
> >>>>> +{
> >>>>> +	pthread_t get_thr, put_thr;
> >>>>> +	int put_fd, get_fd;
> >>>>> +
> >>>>> +	dst_host = fwd_host;
> >>>>> +	dst_port = fwd_port;
> >>>>> +
> >>>>> +	pr_info("Proxy Get Path %s, Put Path %s, Destination Host %s:%hu\n",
> >>>>> +		PROXY_GET, PROXY_PUT, fwd_host, fwd_port);
> >>>>> +
> >>>>> +	put_fd = setup_UNIX_server_socket(PROXY_PUT);
> >>>>> +	get_fd = setup_UNIX_server_socket(PROXY_GET);
> >>>>> +
> >>>>> +	if(init_daemon(proxy_remote_image))
> >>>>> +		return -1;
> >>>>> +
> >>>>> +	if (pthread_create(
> >>>>> +	    &put_thr, NULL, accept_put_image_connections, (void*) &put_fd)) {
> >>>>> +		pr_perror("Unable to create put thread");
> >>>>> +		return -1;
> >>>>> +	}
> >>>>> +	if (pthread_create(
> >>>>> +	    &get_thr, NULL, accept_get_image_connections, (void*) &get_fd)) {
> >>>>> +		pr_perror("Unable to create get thread");
> >>>>> +		return -1;
> >>>>> +	}
> >>>>> +
> >>>>> +	join_workers();
> >>>>> +
> >>>>> +	pthread_join(put_thr, NULL);
> >>>>> +	pthread_join(get_thr, NULL);
> >>>>> +	return 0;
> >>>>> +}
> >>>>> diff --git a/image-remote-pvt.c b/image-remote-pvt.c
> >>>>> new file mode 100644
> >>>>> index 0000000..285bb91
> >>>>> --- /dev/null
> >>>>> +++ b/image-remote-pvt.c
> >>>>> @@ -0,0 +1,566 @@
> >>>>> +#include <unistd.h>
> >>>>> +#include <stdlib.h>
> >>>>> +
> >>>>> +#include <semaphore.h>
> >>>>> +#include <sys/socket.h>
> >>>>> +#include <netinet/in.h>
> >>>>> +#include <netdb.h>
> >>>>> +#include "sys/un.h"
> >>>>> +#include <pthread.h>
> >>>>> +
> >>>>> +#include "image-remote-pvt.h"
> >>>>> +#include "criu-log.h"
> >>>>> +
> >>>>> +#include "protobuf.h"
> >>>>> +#include "protobuf/remote-image.pb-c.h"
> >>>>> +#include "image.h"
> >>>>> +
> >>>>> +typedef struct wthread {
> >>>>> +	pthread_t tid;
> >>>>> +	struct list_head l;
> >>>>> +} worker_thread;
> >>>>> +
> >>>>> +static LIST_HEAD(rimg_head);
> >>>>> +static pthread_mutex_t rimg_lock;
> >>>>> +static sem_t rimg_semph;
> >>>>> +
> >>>>> +static LIST_HEAD(workers_head);
> >>>>> +static pthread_mutex_t workers_lock;
> >>>>> +static sem_t workers_semph;
> >>>>> +
> >>>>> +static int finished = 0;
> >>>>> +static int putting = 0;
> >>>>> +
> >>>>> +static void* (*get_func)(void*);
> >>>>> +static void* (*put_func)(void*);
> >>>>
> >>>> What does get and put stand for here?
> >>>
> >>> Get means read, put means write. I will refactor these two to read and
> >>> write to simplify.
> >>>
> >>>>
> >>>>> +
> >>>>> +static remote_image* get_rimg_by_name(const char* namespace, const char* path)
> >>>>> +{
> >>>>> +	remote_image* rimg = NULL;
> >>>>> +	pthread_mutex_lock(&rimg_lock);
> >>>>> +	list_for_each_entry(rimg, &rimg_head, l) {
> >>>>> +		if( !strncmp(rimg->path, path, PATHLEN) &&
> >>>>> +		    !strncmp(rimg->namespace, namespace, PATHLEN)) {
> >>>>> +			pthread_mutex_unlock(&rimg_lock);
> >>>>> +			return rimg;
> >>>>> +		}
> >>>>> +	}
> >>>>> +	pthread_mutex_unlock(&rimg_lock);
> >>>>> +	return NULL;
> >>>>> +}
> >>>>> +
> >>>>> +static uint64_t sizeof_remote_buffer(struct list_head* rbuff_head)
> >>>>> +{
> >>>>> +    uint64_t res = 0;
> >>>>> +    remote_buffer* aux = NULL;
> >>>>> +    list_for_each_entry(aux, rbuff_head, l) {
> >>>>> +	res += aux->nbytes;
> >>>>> +    }
> >>>>
> >>>> Indentation.
> >>>
> >>> Okey.
> >>>
> >>>>
> >>>>> +    return res;
> >>>>> +}
> >>>>> +
> >>>>> +int init_sync_structures()
> >>>>> +{
> >>>>> +	if (pthread_mutex_init(&rimg_lock, NULL) != 0) {
> >>>>> +		pr_perror("Remote image connection mutex init failed");
> >>>>> +		return -1;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (sem_init(&rimg_semph, 0, 0) != 0) {
> >>>>> +		pr_perror("Remote image connection semaphore init failed");
> >>>>> +		return -1;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (pthread_mutex_init(&workers_lock, NULL) != 0) {
> >>>>> +		pr_perror("Workers mutex init failed");
> >>>>> +		return -1;
> >>>>> +	}
> >>>>> +
> >>>>> +	if (sem_init(&workers_semph, 0, 0) != 0) {
> >>>>> +		pr_perror("Workers semaphore init failed");
> >>>>> +		return -1;
> >>>>> +	}
> >>>>> +	return 0;
> >>>>> +}
> >>>>> +
> >>>>> +static remote_image* wait_for_image(int cli_fd, char* namespace, char* path)
> >>>>> +{
> >>>>
> >>>> This routine waits for image to appear on proxy-cache connection, doesn't it? It looks
> >>>> like the protocol between cache and proxy is more complex than just proxy sends header
> >>>> and data over the wire, can you describe it in details?
> >>>
> >>> This function is used when someone asks for an image that is still not buffered.
> >>> For example, CRIU restore might ask for images that are still on the way. Another
> >>> possible scenario is for CRIU restore to ask for images that will never exist for
> >>> a particular process dump.
> >>>
> >>> If such scenarios occur, the image cache will hold the thread dealing with the
> >>> particular request until: i) a new image is added to the list, ii) the flag 
> >>> finished is set to true and the flag putting (aka receiving remote image) is
> >>> false.
> >>>
> >>> I found this very useful. When I migrate processes, I start the cache and CRIU
> >>> restore before starting the CRIU dump/predump.
> >>
> >> OK this is understood. Can you also describe the protocol between cache and proxy,
> >> i.e. the one over the remote (TCP) connection?
> > 
> > Yes, when the proxy finishes receiving a file from CRIU dump/predump it:
> > 
> > 1- opens a write connection to image-cache
> > 2- write image header (name + namespace) (protobuf object)
> > 3- write image size (uint64_t)
> > 4- writes image
> > 5- closes
> > 
> > 
> > the image-cache side:
> > 1- accepts connection
> > 2- reads header
> > 3- reads size
> > 4- reads image
> > 5- upon reading returns 0, checks if the received bytes match the size.
> > 
> > This will be changed to use only one connection (inherited from the user).
> > 
> > It will look like this:
> > image-proxy:
> > 1- write image header (name + namespace + size) protobuf
> > 2- write image
> > 3- write closing header (to replace the close)
> > 
> > image-cache:
> > 1- read image header
> > 2- read image
> > 3- read image closing header (it knows when this header should came because it knows
> > the size of the image).
> 
> ACK, this would work.
> 
> >> I meant that typedefs are not necessary for structures that are used
> >> internally by some subsystem. Just reading the struct keywork makes
> >> it easier to understand :)
> > 
> > Oh, I get it. I will fix that.
> > 
> > My time is crazy tight right until the end of the next week. Give me a couple of
> > days to re-send a new version of the patch.
> 
> No worries, take your time :)
> Unless you have any requirements on the version of criu you want your code in (the
> next release is December 7).

Thanks, now I have more time to work on this. I hope to send you the new 
version of this patch in the next days.

> 
> -- Pavel
> 


-- 
Rodrigo Bruno <rbruno at gsd.inesc-id.pt>


More information about the CRIU mailing list