[CRIU] Process Migration using Sockets v2 - Patch 2/2

Thu Oct 15 04:32:00 PDT 2015

On 10/13/2015 05:14 PM, Rodrigo Bruno wrote:
> On Mon, 12 Oct 2015 14:56:00 +0300
> Pavel Emelyanov <xemul at parallels.com> wrote:
> 
>> On 10/12/2015 02:11 PM, Rodrigo Bruno wrote:
>>> On Mon, 12 Oct 2015 12:38:43 +0300
>>> Pavel Emelyanov <xemul at parallels.com> wrote:
>>>
>>>> On 10/09/2015 05:46 PM, Rodrigo Bruno wrote:
>>>>> Hi,
>>>>>
>>>>> here goes the second part.
>>>>
>>>> Thanks, this time it's more clear. Please, find my comments inline.
>>>>
>>>>> This patch adds:
>>>>>
>>>>> 1. image-cache.c - the component that is on the same node as CRIU restore. It holds 
>>>>> image files in memory waiting for CRIU restore to ask them
>>>>>
>>>>> 2. image-proxy.c - the component that is on the same node as CRIU dump. It holds
>>>>> image files in memory (because a future dump/predump might need them) and also
>>>>> forwards them to the image-cache.
>>>>>
>>>>> 3. image-remote-pvt.c - implements almost all the image receiving and sending
>>>>> engine.
>>>>
>>>> What does the "pvt" stand for?
>>>
>>> Private. Maybe "prvt" would be better. The idead is that existing CRIU code is
>>> transparent to this code since this is only used by the proxy, cache, and 
>>> image-remote.
>>
>> Ah, I see. Since all this stuff sits in one dir, then private is not applicable here :)
>> I'd call those img-cache, img-proxy and img-remote.
> 
> image-proxy.c -> img-proxy.c
> image-cache.c -> img-cache.c
> image-remote-pvt.c -> img-remote.c
> image-remote.c -> (stays the same?)

Ouch, we also have the image-remote.c as an API for criu... Maybe like this then:

image-proxy -> img-proxy
image-cache -> img-cache
image-remote-pvt -> img-remote-proto
image-remote -> img-remote

?

>>
>>>>
>>>>> 4. include/image-remote-pvt.h - exports function declarations used by both image-cache
>>>>> image-proxy and image-remote.
>>>>>
>>>>>
>>>>> Signed-off-by: Rodrigo Bruno <rbruno at gsd.inesc-id.pt>
>>>>> >From e690070808e6f1b459f033e162c910bc2d02859a Mon Sep 17 00:00:00 2001
>>>>> From: rodrigo-bruno <rbruno at gsd.inesc-id.pt>
>>>>> Date: Fri, 9 Oct 2015 15:37:17 +0100
>>>>> Subject: [PATCH] Process migration using sockets (2/2)
>>>>>
>>>>> ---
>>>>>  image-cache.c              |  54 +++++
>>>>>  image-proxy.c              |  74 ++++++
>>>>>  image-remote-pvt.c         | 566 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>  include/image-remote-pvt.h |  52 +++++
>>>>>  4 files changed, 746 insertions(+)
>>>>>  create mode 100644 image-cache.c
>>>>>  create mode 100644 image-proxy.c
>>>>>  create mode 100644 image-remote-pvt.c
>>>>>  create mode 100644 include/image-remote-pvt.h
>>>>>
>>>>> diff --git a/image-cache.c b/image-cache.c
>>>>> new file mode 100644
>>>>> index 0000000..e2030f4
>>>>> --- /dev/null
>>>>> +++ b/image-cache.c
>>>>> @@ -0,0 +1,54 @@
>>>>> +#include <unistd.h>
>>>>> +
>>>>> +#include "image-remote.h"
>>>>> +#include "image-remote-pvt.h"
>>>>> +#include "criu-log.h"
>>>>> +#include <pthread.h>
>>>>> +
>>>>> +static void* cache_remote_image(void* ptr)
>>>>> +{
>>>>> +	remote_image* rimg = (remote_image*) ptr;
>>>>> +
>>>>> +	if (!strncmp(rimg->path, DUMP_FINISH, sizeof (DUMP_FINISH)))
>>>>> +	{
>>>>> +		close(rimg->src_fd);
>>>>> +		return NULL;
>>>>> +	}
>>>>> +
>>>>> +	prepare_put_rimg();
>>>>> +	recv_remote_image(rimg->src_fd, rimg->path, &rimg->buf_head, true);
>>>>> +	finalize_put_rimg(rimg);
>>>>> +
>>>>> +	return NULL;
>>>>> +}
>>>>> +
>>>>> +int image_cache(unsigned short cache_put_port)
>>>>> +{
>>>>> +	pthread_t get_thr, put_thr;
>>>>> +	int put_fd, get_fd;
>>>>> +
>>>>> +	pr_info("Put Port %d, Get Path %s\n", cache_put_port, CACHE_GET);
>>>>> +
>>>>> +	put_fd = setup_TCP_server_socket(cache_put_port);
>>>>
>>>> Proxy and cache should be able to work over given connections. Look at the
>>>> opts.ps_socket field for details how this works in case of page-server.
>>>
>>> Okey, this is related to the previous comment on multiple connections. I will
>>> have a look at it and try to mimic into the proxy-cache communication.
>>
>> OK, thanks.
>>
>>>>
>>>>> +	get_fd = setup_UNIX_server_socket(CACHE_GET);
>>>>> +
>>>>> +	if (init_daemon(cache_remote_image))
>>>>> +		return -1;
>>>>> +
>>>>> +	if (pthread_create(
>>>>> +	    &put_thr, NULL, accept_put_image_connections, (void*) &put_fd)) {
>>>>> +		pr_perror("Unable to create put thread");
>>>>> +		return -1;
>>>>> +	}
>>>>> +	if (pthread_create(
>>>>> +	    &get_thr, NULL, accept_get_image_connections, (void*) &get_fd)) {
>>>>> +		pr_perror("Unable to create get thread");
>>>>> +		return -1;
>>>>> +	}
>>>>
>>>> Can you describe the logic behind the multithreading and sems/mutexes you use?
>>>
>>> I use two sems and two mutexes, one sem and one mutex to work with worker threads list,
>>> and the other mutex and sem to work with the remote image list.
>>>
>>> Worker threads are created to serve specific request, either read or write an image.
>>> When the thread finishes reading or writing the image, it exists. There are a couple
>>> of auxiliary threads: one for receiving read connections, one for receiving write
>>> connections, and one for joining other threads.
>>>
>>> Mutexes are used to make sure that no one is changing or searching the list at the
>>> same time. Looking at the code now I realize that I do not allow multiple searches
>>> in parallel. This can be solved by adding an additional mutex (reader mutex).
>>>
>>> Semaphores are used to indicate that a new item was added to the list or, in a 
>>> special condition for the remote image list, to indicate that no more images will
>>> come.
>>
>> OK, so the semaphores act as wake-up-s, right?
> 
> Yes.
> 
>>
>>>>
>>>>> +
>>>>> +	join_workers();
>>>>> +
>>>>> +	pthread_join(put_thr, NULL);
>>>>> +	pthread_join(get_thr, NULL);
>>>>> +	return 0;
>>>>> +}
>>>>> diff --git a/image-proxy.c b/image-proxy.c
>>>>> new file mode 100644
>>>>> index 0000000..24b8935
>>>>> --- /dev/null
>>>>> +++ b/image-proxy.c
>>>>> @@ -0,0 +1,74 @@
>>>>> +#include <unistd.h>
>>>>> +
>>>>> +#include "image-remote.h"
>>>>> +#include "image-remote-pvt.h"
>>>>> +#include "criu-log.h"
>>>>> +#include <pthread.h>
>>>>> +
>>>>> +static char* dst_host;
>>>>> +static unsigned short dst_port;
>>>>> +
>>>>> +static void* proxy_remote_image(void* ptr)
>>>>> +{
>>>>> +	remote_image* rimg = (remote_image*) ptr;
>>>>> +	rimg->dst_fd = setup_TCP_client_socket(dst_host, dst_port);
>>>>> +	if (rimg->dst_fd < 0) {
>>>>> +		pr_perror("Unable to open recover image socket");
>>>>> +		return NULL;
>>>>> +	}
>>>>> +
>>>>> +	if (write_header(rimg->dst_fd, rimg->namespace, rimg->path) < 0) {
>>>>> +		pr_perror("Error writing header for %s:%s",
>>>>> +			rimg->path, rimg->namespace);
>>>>> +		return NULL;
>>>>> +	}
>>>>> +
>>>>> +	prepare_put_rimg();
>>>>> +
>>>>> +	if (!strncmp(rimg->path, DUMP_FINISH, sizeof(DUMP_FINISH))) {
>>>>
>>>> For control messages I would suggest integer field, rather than magic names
>>>> in paths :)
>>>
>>> Right.
>>>
>>>>
>>>>> +		close(rimg->dst_fd);
>>>>> +		finalize_put_rimg(rimg);
>>>>> +		return NULL;
>>>>> +	}
>>>>> +	if (recv_remote_image(rimg->src_fd, rimg->path, &(rimg->buf_head), false) < 0) {
>>>>> +		return NULL;
>>>>> +	}
>>>>> +	finalize_put_rimg(rimg);
>>>>> +	send_remote_image(rimg->dst_fd, rimg->path, &(rimg->buf_head), true);
>>>>> +	return NULL;
>>>>> +}
>>>>> +
>>>>> +int image_proxy(char* fwd_host, unsigned short fwd_port)
>>>>> +{
>>>>> +	pthread_t get_thr, put_thr;
>>>>> +	int put_fd, get_fd;
>>>>> +
>>>>> +	dst_host = fwd_host;
>>>>> +	dst_port = fwd_port;
>>>>> +
>>>>> +	pr_info("Proxy Get Path %s, Put Path %s, Destination Host %s:%hu\n",
>>>>> +		PROXY_GET, PROXY_PUT, fwd_host, fwd_port);
>>>>> +
>>>>> +	put_fd = setup_UNIX_server_socket(PROXY_PUT);
>>>>> +	get_fd = setup_UNIX_server_socket(PROXY_GET);
>>>>> +
>>>>> +	if(init_daemon(proxy_remote_image))
>>>>> +		return -1;
>>>>> +
>>>>> +	if (pthread_create(
>>>>> +	    &put_thr, NULL, accept_put_image_connections, (void*) &put_fd)) {
>>>>> +		pr_perror("Unable to create put thread");
>>>>> +		return -1;
>>>>> +	}
>>>>> +	if (pthread_create(
>>>>> +	    &get_thr, NULL, accept_get_image_connections, (void*) &get_fd)) {
>>>>> +		pr_perror("Unable to create get thread");
>>>>> +		return -1;
>>>>> +	}
>>>>> +
>>>>> +	join_workers();
>>>>> +
>>>>> +	pthread_join(put_thr, NULL);
>>>>> +	pthread_join(get_thr, NULL);
>>>>> +	return 0;
>>>>> +}
>>>>> diff --git a/image-remote-pvt.c b/image-remote-pvt.c
>>>>> new file mode 100644
>>>>> index 0000000..285bb91
>>>>> --- /dev/null
>>>>> +++ b/image-remote-pvt.c
>>>>> @@ -0,0 +1,566 @@
>>>>> +#include <unistd.h>
>>>>> +#include <stdlib.h>
>>>>> +
>>>>> +#include <semaphore.h>
>>>>> +#include <sys/socket.h>
>>>>> +#include <netinet/in.h>
>>>>> +#include <netdb.h>
>>>>> +#include "sys/un.h"
>>>>> +#include <pthread.h>
>>>>> +
>>>>> +#include "image-remote-pvt.h"
>>>>> +#include "criu-log.h"
>>>>> +
>>>>> +#include "protobuf.h"
>>>>> +#include "protobuf/remote-image.pb-c.h"
>>>>> +#include "image.h"
>>>>> +
>>>>> +typedef struct wthread {
>>>>> +	pthread_t tid;
>>>>> +	struct list_head l;
>>>>> +} worker_thread;
>>>>> +
>>>>> +static LIST_HEAD(rimg_head);
>>>>> +static pthread_mutex_t rimg_lock;
>>>>> +static sem_t rimg_semph;
>>>>> +
>>>>> +static LIST_HEAD(workers_head);
>>>>> +static pthread_mutex_t workers_lock;
>>>>> +static sem_t workers_semph;
>>>>> +
>>>>> +static int finished = 0;
>>>>> +static int putting = 0;
>>>>> +
>>>>> +static void* (*get_func)(void*);
>>>>> +static void* (*put_func)(void*);
>>>>
>>>> What does get and put stand for here?
>>>
>>> Get means read, put means write. I will refactor these two to read and
>>> write to simplify.
>>>
>>>>
>>>>> +
>>>>> +static remote_image* get_rimg_by_name(const char* namespace, const char* path)
>>>>> +{
>>>>> +	remote_image* rimg = NULL;
>>>>> +	pthread_mutex_lock(&rimg_lock);
>>>>> +	list_for_each_entry(rimg, &rimg_head, l) {
>>>>> +		if( !strncmp(rimg->path, path, PATHLEN) &&
>>>>> +		    !strncmp(rimg->namespace, namespace, PATHLEN)) {
>>>>> +			pthread_mutex_unlock(&rimg_lock);
>>>>> +			return rimg;
>>>>> +		}
>>>>> +	}
>>>>> +	pthread_mutex_unlock(&rimg_lock);
>>>>> +	return NULL;
>>>>> +}
>>>>> +
>>>>> +static uint64_t sizeof_remote_buffer(struct list_head* rbuff_head)
>>>>> +{
>>>>> +    uint64_t res = 0;
>>>>> +    remote_buffer* aux = NULL;
>>>>> +    list_for_each_entry(aux, rbuff_head, l) {
>>>>> +	res += aux->nbytes;
>>>>> +    }
>>>>
>>>> Indentation.
>>>
>>> Okey.
>>>
>>>>
>>>>> +    return res;
>>>>> +}
>>>>> +
>>>>> +int init_sync_structures()
>>>>> +{
>>>>> +	if (pthread_mutex_init(&rimg_lock, NULL) != 0) {
>>>>> +		pr_perror("Remote image connection mutex init failed");
>>>>> +		return -1;
>>>>> +	}
>>>>> +
>>>>> +	if (sem_init(&rimg_semph, 0, 0) != 0) {
>>>>> +		pr_perror("Remote image connection semaphore init failed");
>>>>> +		return -1;
>>>>> +	}
>>>>> +
>>>>> +	if (pthread_mutex_init(&workers_lock, NULL) != 0) {
>>>>> +		pr_perror("Workers mutex init failed");
>>>>> +		return -1;
>>>>> +	}
>>>>> +
>>>>> +	if (sem_init(&workers_semph, 0, 0) != 0) {
>>>>> +		pr_perror("Workers semaphore init failed");
>>>>> +		return -1;
>>>>> +	}
>>>>> +	return 0;
>>>>> +}
>>>>> +
>>>>> +static remote_image* wait_for_image(int cli_fd, char* namespace, char* path)
>>>>> +{
>>>>
>>>> This routine waits for image to appear on proxy-cache connection, doesn't it? It looks
>>>> like the protocol between cache and proxy is more complex than just proxy sends header
>>>> and data over the wire, can you describe it in details?
>>>
>>> This function is used when someone asks for an image that is still not buffered.
>>> For example, CRIU restore might ask for images that are still on the way. Another
>>> possible scenario is for CRIU restore to ask for images that will never exist for
>>> a particular process dump.
>>>
>>> If such scenarios occur, the image cache will hold the thread dealing with the
>>> particular request until: i) a new image is added to the list, ii) the flag 
>>> finished is set to true and the flag putting (aka receiving remote image) is
>>> false.
>>>
>>> I found this very useful. When I migrate processes, I start the cache and CRIU
>>> restore before starting the CRIU dump/predump.
>>
>> OK this is understood. Can you also describe the protocol between cache and proxy,
>> i.e. the one over the remote (TCP) connection?
> 
> Yes, when the proxy finishes receiving a file from CRIU dump/predump it:
> 
> 1- opens a write connection to image-cache
> 2- write image header (name + namespace) (protobuf object)
> 3- write image size (uint64_t)
> 4- writes image
> 5- closes
> 
> 
> the image-cache side:
> 1- accepts connection
> 2- reads header
> 3- reads size
> 4- reads image
> 5- upon reading returns 0, checks if the received bytes match the size.
> 
> This will be changed to use only one connection (inherited from the user).
> 
> It will look like this:
> image-proxy:
> 1- write image header (name + namespace + size) protobuf
> 2- write image
> 3- write closing header (to replace the close)
> 
> image-cache:
> 1- read image header
> 2- read image
> 3- read image closing header (it knows when this header should came because it knows
> the size of the image).

ACK, this would work.

>> I meant that typedefs are not necessary for structures that are used
>> internally by some subsystem. Just reading the struct keywork makes
>> it easier to understand :)
> 
> Oh, I get it. I will fix that.
> 
> My time is crazy tight right until the end of the next week. Give me a couple of
> days to re-send a new version of the patch.

No worries, take your time :)
Unless you have any requirements on the version of criu you want your code in (the
next release is December 7).

-- Pavel