[CRIU] [PATCH v2 3/3] aio: Restore aio ring content

Kirill Tkhai ktkhai at virtuozzo.com
Mon Mar 21 03:10:47 PDT 2016



On 21.03.2016 12:49, Pavel Emelyanov wrote:
> On 03/21/2016 11:06 AM, Kirill Tkhai wrote:
>>
>>
>> On 21.03.2016 09:19, Pavel Emelyanov wrote:
>>> On 03/18/2016 01:31 PM, Kirill Tkhai wrote:
>>>>
>>>>
>>>> On 17.03.2016 22:34, Pavel Emelyanov wrote:
>>>>>
>>>>>>>>> I'm not sure this is safe. How would pre-dumps act on rings?
>>>>>>>>
>>>>>>>> Could you please explain what kind of problems are possible here?
>>>>>>>> I don't see a memory predump.
>>>>>>>
>>>>>>> The vma_entry_is_private() check is too generic. E.g. such vmas are being
>>>>>>> soft-dirty-tracked. Do we want the same for AIO rings? I bet we don't :)
>>>>>>
>>>>>> For user AIO ring buffer looks like an anonymous memory. There are no difference
>>>>>> between them, it's writable and modifiable. So if we track anonymous memory,
>>>>>> we have to track AIO ring buffer too.
>>>>>
>>>>> Will it get tracked by the kernel's soft-dirty bits? I heavily doubt it.
>>>>
>>>> It's tracked. Below is the prove.
>>>>
>>>> #define _GNU_SOURCE
>>>> #include <stdio.h>
>>>> #include <unistd.h>
>>>> #include <sys/syscall.h>
>>>> #include <linux/aio_abi.h>
>>>> #include <fcntl.h>
>>>> #include <inttypes.h>
>>>>
>>>> inline int io_setup(unsigned nr, aio_context_t *ctxp)
>>>> {
>>>> 	return syscall(__NR_io_setup, nr, ctxp);
>>>> }
>>>>
>>>> #define PME_SOFT_DIRTY	(1ULL << 55)
>>>> #define PAGE_SHIFT     12
>>>> #define PAGE_SIZE      (1UL << PAGE_SHIFT)
>>>> #define u64 uint64_t
>>>>
>>>> int main()
>>>> {
>>>> 	aio_context_t ctx = 0;
>>>> 	int ret, fd, pm2;
>>>> 	u64 pmap;
>>>>
>>>> 	ret = io_setup(128, &ctx);
>>>> 	if (ret < 0) {
>>>> 		perror("io_setup error");
>>>> 		return -1;
>>>> 	}
>>>>
>>>> 	fd = open("/proc/self/clear_refs", O_WRONLY);
>>>> 	if (fd < 0) {
>>>> 		perror("clear_refs open");
>>>> 		return -1;
>>>> 	}
>>>>
>>>> 	if (write(fd, "4", 1) != 1) {
>>>> 		perror("clear_refs write");
>>>> 		return -1;
>>>> 	}
>>>> 	close(fd);
>>>>
>>>> 	pm2 = open("/proc/self/pagemap", O_RDONLY);
>>>> 	if (pm2 < 0) {
>>>> 		perror("Can't open pagemap file");
>>>> 		return -1;
>>>> 	}
>>>>
>>>> 	((char *)ctx)[0] = '\0';
>>>> 	lseek(pm2, ctx / PAGE_SIZE * sizeof(u64), SEEK_SET);
>>>> 	ret = read(pm2, &pmap, sizeof(pmap));
>>>> 	if (ret < 0)
>>>> 		perror("Read pmap err!");
>>>> 	close(pm2);
>>>> 	if (pmap & PME_SOFT_DIRTY)
>>>> 		printf("Dirty tracking exists on aio\n");
>>>> 	else
>>>> 		printf("Shit happens\n");
>>>
>>> That's not prove. Kernel also updates the ring when completing requests, but
>>> you don't check this case.
>>
>> It's "inflight" requests. We used to do not handle them, and the patch does not
>> change anything in this moment.
> 
> No it's not in-flight. You make pre-dump and pick the ring page, then
> app does aio req, it gets completed and kernel updates the ring. You
> go with the 2nd predump or dump and the ring page is _not_ marked as
> soft-dirty. Boom! You've just lost the completed request.

Ah, sure. Thanks.
 
>> I'm going to add plugin to parasite_check_aios(), and to wait inflight requests
>> from there.
>>  
>>> Anyway, I don't think treating aio ring buffer as regular anonymous memory
>>> is good idea.
>>
>> What do you suggest? Add vma_entry_is_private(entry) | vma_entry_is(entry, VMA_AREA_AIORING)
>> every place we used it?
> 
> Not every. My current opinion is that soft-dirty tracking should NOT
> be done for AIO rings.

So, if we skip AIO on pre-dump, but we will dump it like any other private memory,
is this OK for you?


More information about the CRIU mailing list