[CRIU] [RFC PATCH 07/12] lazy-pages: add handling of UFFD_EVENT_REMAP

Pavel Emelyanov xemul at virtuozzo.com
Tue Jan 10 23:28:16 PST 2017


On 01/11/2017 10:02 AM, Mike Rapoport wrote:
> On Tue, Jan 10, 2017 at 07:15:28PM +0300, Pavel Emelyanov wrote:
>> On 01/10/2017 05:20 PM, Mike Rapoport wrote:
>>> On Tue, Jan 10, 2017 at 05:07:01PM +0300, Pavel Emelyanov wrote:
>>>> On 01/09/2017 11:23 AM, Mike Rapoport wrote:
>>>>> When the restored process calls mremap(), we will see #PF's on the new
>>>>> addresses and we have to create a correspondence between the addresses
>>>>> found in the dump and the actual addresses the process uses. If the
>>>>> mremap() call causes the mapping to grow, the additional part will receive
>>>>> zero pages, as expected.
>>>>>
>>>>> FIXME: is the mapping shrinks, we will try to fill the dropped part and
>>>>> lazy-pages daemon will fail. It should be possible to track VMA changes in
>>>>> the lazy-pages daemon, but simpler, and, apparently, more correct would be
>>>>> to add UFFD_EVENT_MUNMAP to the kernel.
>>>>>
>>>>> Signed-off-by: Mike Rapoport <rppt at linux.vnet.ibm.com>
>>>>> ---
>>>>>  criu/uffd.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>  1 file changed, 46 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/criu/uffd.c b/criu/uffd.c
>>>>> index 3bc7bc1..8bf4a14 100644
>>>>> --- a/criu/uffd.c
>>>>> +++ b/criu/uffd.c
>>>>> @@ -53,7 +53,12 @@ struct lazy_iovec {
>>>>>  	unsigned long len;
>>>>>  };
>>>>>  
>>>>> -struct lazy_pages_info;
>>>>> +struct lazy_remap {
>>>>> +	struct list_head l;
>>>>> +	unsigned long from;
>>>>> +	unsigned long to;
>>>>> +	unsigned long len;
>>>>> +};
>>>>>  
>>>>>  struct pf_info {
>>>>>  	unsigned long addr;
>>>>> @@ -65,6 +70,7 @@ struct lazy_pages_info {
>>>>>  
>>>>>  	struct list_head iovs;
>>>>>  	struct list_head pfs;
>>>>> +	struct list_head remaps;
>>>>>  
>>>>>  	struct page_read pr;
>>>>>  
>>>>> @@ -92,6 +98,7 @@ static struct lazy_pages_info *lpi_init(void)
>>>>>  	memset(lpi, 0, sizeof(*lpi));
>>>>>  	INIT_LIST_HEAD(&lpi->iovs);
>>>>>  	INIT_LIST_HEAD(&lpi->pfs);
>>>>> +	INIT_LIST_HEAD(&lpi->remaps);
>>>>>  	INIT_LIST_HEAD(&lpi->l);
>>>>>  	lpi->lpfd.revent = handle_uffd_event;
>>>>>  
>>>>> @@ -101,12 +108,15 @@ static struct lazy_pages_info *lpi_init(void)
>>>>>  static void lpi_fini(struct lazy_pages_info *lpi)
>>>>>  {
>>>>>  	struct lazy_iovec *p, *n;
>>>>> +	struct lazy_remap *p1, *n1;
>>>>>  
>>>>>  	if (!lpi)
>>>>>  		return;
>>>>>  	free(lpi->buf);
>>>>>  	list_for_each_entry_safe(p, n, &lpi->iovs, l)
>>>>>  		xfree(p);
>>>>> +	list_for_each_entry_safe(p1, n1, &lpi->remaps, l)
>>>>> +		xfree(p1);
>>>>>  	if (lpi->lpfd.fd > 0)
>>>>>  		close(lpi->lpfd.fd);
>>>>>  	if (lpi->pr.close)
>>>>> @@ -515,9 +525,17 @@ out:
>>>>>  static int uffd_copy(struct lazy_pages_info *lpi, __u64 address, int nr_pages)
>>>>>  {
>>>>>  	struct uffdio_copy uffdio_copy;
>>>>> +	struct lazy_remap *r;
>>>>>  	unsigned long len = nr_pages * page_size();
>>>>>  	int rc;
>>>>>  
>>>>> +	list_for_each_entry(r, &lpi->remaps, l) {
>>>>> +		if (address >= r->from && address < r->from + r->len) {
>>>>> +			address += (r->to - r->from);
>>>>> +			break;
>>>>> +		}
>>>>> +	}
>>>>> +
>>>>>  	uffdio_copy.dst = address;
>>>>>  	uffdio_copy.src = (unsigned long)lpi->buf;
>>>>>  	uffdio_copy.len = len;
>>>>> @@ -670,9 +688,27 @@ static int handle_madv_dontneed(struct lazy_pages_info *lpi,
>>>>>  	return 0;
>>>>>  }
>>>>>  
>>>>> +static int handle_remap(struct lazy_pages_info *lpi, struct uffd_msg *msg)
>>>>> +{
>>>>> +	struct lazy_remap *remap;
>>>>> +
>>>>> +	remap = xmalloc(sizeof(*remap));
>>>>> +	if (!remap)
>>>>> +		return -1;
>>>>> +
>>>>> +	INIT_LIST_HEAD(&remap->l);
>>>>> +	remap->from = msg->arg.remap.from;
>>>>> +	remap->to = msg->arg.remap.to;
>>>>> +	remap->len = msg->arg.remap.len;
>>>>> +	list_add_tail(&remap->l, &lpi->remaps);
>>>>
>>>> Shouldn't we punch a hole in original region? Not to handle #PF there?
>>>
>>> I don't quite follow you here. The #PF's are delivered to uffd only if the
>>> VMA covering the range is registered with uffd. For moving remaps, the
>>> original address range will not be covered by a VMA with uffd. 
>>
>> Indeed :) Aren't you worried with the fact that the in-memory state of lazyd
>> will differ from the real state in the kernel?
> 
> I am worried about all the UFFD_EVENT_* stuff. Except, perhaps,
> MADV_DONTNEED. You example below with swap of two regions clearly shows
> that list of (from,to) patches will not work. We do need to have more
> complex state in lazyd that will be able to correspond old and new
> addresses.

How about extending the existing iovec element with 'request_addr' field
that gets initialized with 1:1 map. Then upon mremap()-s the iovs get
physically remapped in lists, but the 'request_addr' remains unchanged
and is used when sending requests to page_read-s?

-- Pavel



More information about the CRIU mailing list