[CRIU] [PATCH v5 3/5] Try to include userfaultfd with criu (part 1)

Mon Mar 14 01:42:11 PDT 2016

On 03/12/2016 07:54 PM, Adrian Reber wrote:
> On Fri, Mar 11, 2016 at 06:25:52PM +0300, Pavel Emelyanov wrote:
>> On 03/11/2016 06:03 PM, Adrian Reber wrote:
>>> On Fri, Mar 11, 2016 at 04:08:10PM +0300, Pavel Emelyanov wrote:
>>>>
>>>>> +static void criu_init()
>>>>> +{
>>>>> +	/* TODO: return code checking */
>>>>> +	check_img_inventory();
>>>>> +	prepare_task_entries();
>>>>> +	prepare_pstree();
>>>>> +	collect_remaps_and_regfiles();
>>>>> +	prepare_shared_reg_files();
>>>>> +	prepare_remaps();
>>>>> +	prepare_mm_pid(root_item);
>>>>> +
>>>>> +	/* We found a PID */
>>>>> +	pr_debug("root_item->pid.virt %d\n", root_item->pid.virt);
>>>>> +	pr_debug("root_item->pid.real %d\n", root_item->pid.real);
>>>>> +}
>>>>
>>>> This portion should be really resolved before merging. All of the above
>>>> has nothing to do with the page_read, so please, find the reason for
>>>> page read engine non working due to absence of this. If you need help
>>>> with the code, just drop me an e-mail, I'll help.
>>>
>>> I had a quick look, but need to look a bit in more detail.
>>>
>>> If I leave away all those lines I get a segfault, I haven't checked yet
>>> but I think when accessing root_item->pid.virt.
>>
>> Ah! Indeed. You open the page read for init task only. I believe the proper
>> fix would be to pass the pid of the process via socket you use to pass uffd.
>>
>> Since we'll have to do it anyway in the future, I think this is worth doing
>> from the very beginning. And the lazy pages daemon should accept only one
>> such message (for you initial case).
> 
> Getting the information about which directory contains the checkpoint
> from the main restore process via the same mechanism as the userfaultfd
> FD was also my initial plan. But, unfortunately, the lazy-pages server
> needs to open the checkpoint directory on its own. Especially as I am
> currently working on the code for remote lazy-restore.

No no no, I'm not talking about passing the directory with images via socket,
but about finding out the PID of the task to work on. You get this value (the
pid) from root_item, but this value should go via socket as a raw integer,
together with the uffd descriptor.

This line from patch #3, uffd.c file, uffd_listen() function:

> +	rc = open_page_read(root_item->pid.virt, &pr, PR_TASK);

there should not be any root_item-> dereferences, instead, the value of pid.virt
should be sent by the criu restore here:

> +static int send_uffd(int sendfd)
> +{
> +	int fd;
> +	int len;
> +	int ret = -1;
> +	struct sockaddr_un sun;
> +
...
> +
> +	if (send_fd(fd, NULL, 0, sendfd) < 0) {

Here. As an auxiliary data (no API yet, more on this below) the task's virt
pid should go.

No on where to put the data. If you look at the send_fds() code that is called
by the send_fd, you'd see that it _can_ send data together with the descriptors.
If does send an array of fd_opts structures.

So you need to either tune this code to accept arbitrary data, not just fd_opts,
or fix the send_uffd so that it sends two packets -- one with pid and the other
one with uffd itself.

> +		pr_perror("send_fd error:");
> +		goto out;
> +	}
> +	ret = 0;
> +out:
> +	close(fd);
> +	return ret;
> +}
> +

> I am starting to transfer the userfaultfd requests over TCP. I am using
> the functions provided by the page server to setup the TCP communication
> and I am then transferring the requested pages from one host to another.
> 
> Once the current patches have been accepted I was planing on sending the
> lazy pages over TCP patches. This is, from my point of view, the important
> part. It does not make much sense to require to transfer the pages from
> one host to another and use them then for they lazy restore. The goal has to
> be to transfer as little data as possible for the initial restore and
> once the process is running the required pages are transferred via
> lazy-restore.
> 
> In this case, remote lazy-pages, I need all the initialization to read
> the checkpoint directory in the uffd code anyway, as it is running on
> another host as the main restore process.
> 
> Is there a way to correctly set up the image dir handling logic so that
> you would accept it as part of uffd.c?

The directory where the lazy pages daemon should work on can be passed as -D
option, this is fine. This is how, e.g. page server works.

-- Pavel