[CRIU] [RFC PATCH] Try to include userfaultfd with criu

Tue Nov 10 04:34:07 PST 2015

On 11/09/2015 04:31 PM, Adrian Reber wrote:
> On Thu, Sep 24, 2015 at 06:44:16PM +0300, Pavel Emelyanov wrote:
>> On 09/24/2015 04:04 PM, adrian at lisas.de wrote:
>>> From: Adrian Reber <areber at redhat.com>
>>>
>>> This is a first try to include userfaultfd with criu. Right now it
>>> still requires a "normal" checkpoint. After checkpointing the application
>>> it can be restored with the help of userfaultfd.
>>
>> Thanks for looking into this :) Please, see my comments inline.
> 
> Thanks for your comments. Took me some time to look at uffd code again.
> 
>>> The normal restore still copies all pages from the checkpoint to
>>> the right location (this still needs to be disabled) and the usage
>>> of userfaultfd is right now hardcoded in this patch.
>>>
>>> All restored pages with MAP_ANONYMOUS set are marked as being handled by
>>> userfaultfd and also madvise()'d as MADV_DONTNEED.
>>>
>>> If I also enable userfaultfd for pages without MAP_ANONYMOUS the restored process
>>> segfaults. I have not looked into more details into why it segfaults.
>>
>> Hm... IIRC the uffd from Andrea only worked on anonymous private areas, so 
>> when you turned one one (and didn't check the ret code from register ioctl ;)
>> the respective vma just wasn't restored at all.
> 
> Yes, I think I also understood it now.
> 
>>> As soon as the process is restored it blocks on the first memory access
>>> and waits for pages being transferred by userfaultfd.
>>>
>>> To handle the required pages a new criu command has been added. The restore
>>> works now like this:
>>>
>>>   criu restore -D /tmp/3 -j -v4
>>>
>>> This hangs after the restored process is running and needs:
>>>
>>>   criu uffd -v4 -D /tmp/3/
>>
>> So the plan for uffd action is to start a daemon that would read
>> pages from images and put them into some uffd descriptor, right?
> 
> Yes, I have the normal criu restore process running now like this:
> 
> $ ./criu restore -D /tmp/3 -j -v4 --lazy-page

Cool :)

> and it hangs on the uffd file descriptor until criu uffd daemon is
> running:
> 
> $ ./criu uffd -v4 -D /tmp/3/

And how does criu restore "knows" where to look for the uffd ... server?

>>> +
>>> +static int ud_open()
>>> +{
>>> +	int udfd;
>>> +	int newfd;
>>> +
>>> +	if ((udfd = client_conn()) < 0) {
>>> +		pr_err("unix domain socket connection error");
>>> +		return (-1);
>>> +	}
>>> +
>>> +	newfd = recv_fd(udfd);
>>> +	close(udfd);
>>> +
>>> +	return newfd;
>>> +}
>>> +
>>> +
>>> +static unsigned long find_page(unsigned long addr)
>>
>> There's a seek_pagemap_page() call for this type of operation. One problem with
>> it is that it only works sequentially, i.e. when subsequent calls to it pass
>> constantly increasing addresses.
>>
>> I'd suggest fixing this seek_pagemap_page() to overcome this limitation and use
>> one here. This would also be much faster than readdir() and open_page_read() for
>> every single page.
> 
> I think I almost have this figured out. I am now doing:
> 
> pr->get_pagemap(pr, &iov);
> seek_pagemap_page(pr, addr, true);
> pr->read_page(pr, addr, buf);
> 
> As this only works for the first page I introduced a rewind_pagemap()
> function with:
> 
> lseek(img_raw_fd(pr->pi), 0, SEEK_SET);
> 
> Unfortunately the next call to get_pagemap() now returns 0 instead of
> the usual 1. So somehow I am missing something which is needed to rewind
> the pagemap reading. Any idea how to do this better/correctly?

The get_pagemap() reads pagemap entry from the image file which shows
where in the virtual memory the corresponding pages should placed. Then
->read_page() reads the page.

Your aim is to read arbitrary page by vaddr. With the current code the
only way to do it is by scanning the pagemap starting from zero or last
address.

> I can also post my current uffd patch if that makes discussion easier...

Sure!

-- Pavel