[CRIU] [RFC PATCH 07/12] lazy-pages: add handling of UFFD_EVENT_REMAP
Pavel Emelyanov
xemul at virtuozzo.com
Tue Jan 10 23:28:16 PST 2017
On 01/11/2017 10:02 AM, Mike Rapoport wrote:
> On Tue, Jan 10, 2017 at 07:15:28PM +0300, Pavel Emelyanov wrote:
>> On 01/10/2017 05:20 PM, Mike Rapoport wrote:
>>> On Tue, Jan 10, 2017 at 05:07:01PM +0300, Pavel Emelyanov wrote:
>>>> On 01/09/2017 11:23 AM, Mike Rapoport wrote:
>>>>> When the restored process calls mremap(), we will see #PF's on the new
>>>>> addresses and we have to create a correspondence between the addresses
>>>>> found in the dump and the actual addresses the process uses. If the
>>>>> mremap() call causes the mapping to grow, the additional part will receive
>>>>> zero pages, as expected.
>>>>>
>>>>> FIXME: is the mapping shrinks, we will try to fill the dropped part and
>>>>> lazy-pages daemon will fail. It should be possible to track VMA changes in
>>>>> the lazy-pages daemon, but simpler, and, apparently, more correct would be
>>>>> to add UFFD_EVENT_MUNMAP to the kernel.
>>>>>
>>>>> Signed-off-by: Mike Rapoport <rppt at linux.vnet.ibm.com>
>>>>> ---
>>>>> criu/uffd.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
>>>>> 1 file changed, 46 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/criu/uffd.c b/criu/uffd.c
>>>>> index 3bc7bc1..8bf4a14 100644
>>>>> --- a/criu/uffd.c
>>>>> +++ b/criu/uffd.c
>>>>> @@ -53,7 +53,12 @@ struct lazy_iovec {
>>>>> unsigned long len;
>>>>> };
>>>>>
>>>>> -struct lazy_pages_info;
>>>>> +struct lazy_remap {
>>>>> + struct list_head l;
>>>>> + unsigned long from;
>>>>> + unsigned long to;
>>>>> + unsigned long len;
>>>>> +};
>>>>>
>>>>> struct pf_info {
>>>>> unsigned long addr;
>>>>> @@ -65,6 +70,7 @@ struct lazy_pages_info {
>>>>>
>>>>> struct list_head iovs;
>>>>> struct list_head pfs;
>>>>> + struct list_head remaps;
>>>>>
>>>>> struct page_read pr;
>>>>>
>>>>> @@ -92,6 +98,7 @@ static struct lazy_pages_info *lpi_init(void)
>>>>> memset(lpi, 0, sizeof(*lpi));
>>>>> INIT_LIST_HEAD(&lpi->iovs);
>>>>> INIT_LIST_HEAD(&lpi->pfs);
>>>>> + INIT_LIST_HEAD(&lpi->remaps);
>>>>> INIT_LIST_HEAD(&lpi->l);
>>>>> lpi->lpfd.revent = handle_uffd_event;
>>>>>
>>>>> @@ -101,12 +108,15 @@ static struct lazy_pages_info *lpi_init(void)
>>>>> static void lpi_fini(struct lazy_pages_info *lpi)
>>>>> {
>>>>> struct lazy_iovec *p, *n;
>>>>> + struct lazy_remap *p1, *n1;
>>>>>
>>>>> if (!lpi)
>>>>> return;
>>>>> free(lpi->buf);
>>>>> list_for_each_entry_safe(p, n, &lpi->iovs, l)
>>>>> xfree(p);
>>>>> + list_for_each_entry_safe(p1, n1, &lpi->remaps, l)
>>>>> + xfree(p1);
>>>>> if (lpi->lpfd.fd > 0)
>>>>> close(lpi->lpfd.fd);
>>>>> if (lpi->pr.close)
>>>>> @@ -515,9 +525,17 @@ out:
>>>>> static int uffd_copy(struct lazy_pages_info *lpi, __u64 address, int nr_pages)
>>>>> {
>>>>> struct uffdio_copy uffdio_copy;
>>>>> + struct lazy_remap *r;
>>>>> unsigned long len = nr_pages * page_size();
>>>>> int rc;
>>>>>
>>>>> + list_for_each_entry(r, &lpi->remaps, l) {
>>>>> + if (address >= r->from && address < r->from + r->len) {
>>>>> + address += (r->to - r->from);
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> uffdio_copy.dst = address;
>>>>> uffdio_copy.src = (unsigned long)lpi->buf;
>>>>> uffdio_copy.len = len;
>>>>> @@ -670,9 +688,27 @@ static int handle_madv_dontneed(struct lazy_pages_info *lpi,
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +static int handle_remap(struct lazy_pages_info *lpi, struct uffd_msg *msg)
>>>>> +{
>>>>> + struct lazy_remap *remap;
>>>>> +
>>>>> + remap = xmalloc(sizeof(*remap));
>>>>> + if (!remap)
>>>>> + return -1;
>>>>> +
>>>>> + INIT_LIST_HEAD(&remap->l);
>>>>> + remap->from = msg->arg.remap.from;
>>>>> + remap->to = msg->arg.remap.to;
>>>>> + remap->len = msg->arg.remap.len;
>>>>> + list_add_tail(&remap->l, &lpi->remaps);
>>>>
>>>> Shouldn't we punch a hole in original region? Not to handle #PF there?
>>>
>>> I don't quite follow you here. The #PF's are delivered to uffd only if the
>>> VMA covering the range is registered with uffd. For moving remaps, the
>>> original address range will not be covered by a VMA with uffd.
>>
>> Indeed :) Aren't you worried with the fact that the in-memory state of lazyd
>> will differ from the real state in the kernel?
>
> I am worried about all the UFFD_EVENT_* stuff. Except, perhaps,
> MADV_DONTNEED. You example below with swap of two regions clearly shows
> that list of (from,to) patches will not work. We do need to have more
> complex state in lazyd that will be able to correspond old and new
> addresses.
How about extending the existing iovec element with 'request_addr' field
that gets initialized with 1:1 map. Then upon mremap()-s the iovs get
physically remapped in lists, but the 'request_addr' remains unchanged
and is used when sending requests to page_read-s?
-- Pavel
More information about the CRIU
mailing list