[CRIU] [PATCH v5 3/5] Try to include userfaultfd with criu (part 1)

Sat Mar 12 08:54:15 PST 2016

On Fri, Mar 11, 2016 at 06:25:52PM +0300, Pavel Emelyanov wrote:
> On 03/11/2016 06:03 PM, Adrian Reber wrote:
> > On Fri, Mar 11, 2016 at 04:08:10PM +0300, Pavel Emelyanov wrote:
> >>
> >>> +static void criu_init()
> >>> +{
> >>> +	/* TODO: return code checking */
> >>> +	check_img_inventory();
> >>> +	prepare_task_entries();
> >>> +	prepare_pstree();
> >>> +	collect_remaps_and_regfiles();
> >>> +	prepare_shared_reg_files();
> >>> +	prepare_remaps();
> >>> +	prepare_mm_pid(root_item);
> >>> +
> >>> +	/* We found a PID */
> >>> +	pr_debug("root_item->pid.virt %d\n", root_item->pid.virt);
> >>> +	pr_debug("root_item->pid.real %d\n", root_item->pid.real);
> >>> +}
> >>
> >> This portion should be really resolved before merging. All of the above
> >> has nothing to do with the page_read, so please, find the reason for
> >> page read engine non working due to absence of this. If you need help
> >> with the code, just drop me an e-mail, I'll help.
> > 
> > I had a quick look, but need to look a bit in more detail.
> > 
> > If I leave away all those lines I get a segfault, I haven't checked yet
> > but I think when accessing root_item->pid.virt.
> 
> Ah! Indeed. You open the page read for init task only. I believe the proper
> fix would be to pass the pid of the process via socket you use to pass uffd.
> 
> Since we'll have to do it anyway in the future, I think this is worth doing
> from the very beginning. And the lazy pages daemon should accept only one
> such message (for you initial case).

Getting the information about which directory contains the checkpoint
from the main restore process via the same mechanism as the userfaultfd
FD was also my initial plan. But, unfortunately, the lazy-pages server
needs to open the checkpoint directory on its own. Especially as I am
currently working on the code for remote lazy-restore.

I am starting to transfer the userfaultfd requests over TCP. I am using
the functions provided by the page server to setup the TCP communication
and I am then transferring the requested pages from one host to another.

Once the current patches have been accepted I was planing on sending the
lazy pages over TCP patches. This is, from my point of view, the important
part. It does not make much sense to require to transfer the pages from
one host to another and use them then for they lazy restore. The goal has to
be to transfer as little data as possible for the initial restore and
once the process is running the required pages are transferred via
lazy-restore.

In this case, remote lazy-pages, I need all the initialization to read
the checkpoint directory in the uffd code anyway, as it is running on
another host as the main restore process.

Is there a way to correctly set up the image dir handling logic so that
you would accept it as part of uffd.c?

		Adrian