[CRIU] [PATCH v2 3/3] aio: Restore aio ring content
Kirill Tkhai
ktkhai at virtuozzo.com
Mon Mar 21 03:10:47 PDT 2016
On 21.03.2016 12:49, Pavel Emelyanov wrote:
> On 03/21/2016 11:06 AM, Kirill Tkhai wrote:
>>
>>
>> On 21.03.2016 09:19, Pavel Emelyanov wrote:
>>> On 03/18/2016 01:31 PM, Kirill Tkhai wrote:
>>>>
>>>>
>>>> On 17.03.2016 22:34, Pavel Emelyanov wrote:
>>>>>
>>>>>>>>> I'm not sure this is safe. How would pre-dumps act on rings?
>>>>>>>>
>>>>>>>> Could you please explain what kind of problems are possible here?
>>>>>>>> I don't see a memory predump.
>>>>>>>
>>>>>>> The vma_entry_is_private() check is too generic. E.g. such vmas are being
>>>>>>> soft-dirty-tracked. Do we want the same for AIO rings? I bet we don't :)
>>>>>>
>>>>>> For user AIO ring buffer looks like an anonymous memory. There are no difference
>>>>>> between them, it's writable and modifiable. So if we track anonymous memory,
>>>>>> we have to track AIO ring buffer too.
>>>>>
>>>>> Will it get tracked by the kernel's soft-dirty bits? I heavily doubt it.
>>>>
>>>> It's tracked. Below is the prove.
>>>>
>>>> #define _GNU_SOURCE
>>>> #include <stdio.h>
>>>> #include <unistd.h>
>>>> #include <sys/syscall.h>
>>>> #include <linux/aio_abi.h>
>>>> #include <fcntl.h>
>>>> #include <inttypes.h>
>>>>
>>>> inline int io_setup(unsigned nr, aio_context_t *ctxp)
>>>> {
>>>> return syscall(__NR_io_setup, nr, ctxp);
>>>> }
>>>>
>>>> #define PME_SOFT_DIRTY (1ULL << 55)
>>>> #define PAGE_SHIFT 12
>>>> #define PAGE_SIZE (1UL << PAGE_SHIFT)
>>>> #define u64 uint64_t
>>>>
>>>> int main()
>>>> {
>>>> aio_context_t ctx = 0;
>>>> int ret, fd, pm2;
>>>> u64 pmap;
>>>>
>>>> ret = io_setup(128, &ctx);
>>>> if (ret < 0) {
>>>> perror("io_setup error");
>>>> return -1;
>>>> }
>>>>
>>>> fd = open("/proc/self/clear_refs", O_WRONLY);
>>>> if (fd < 0) {
>>>> perror("clear_refs open");
>>>> return -1;
>>>> }
>>>>
>>>> if (write(fd, "4", 1) != 1) {
>>>> perror("clear_refs write");
>>>> return -1;
>>>> }
>>>> close(fd);
>>>>
>>>> pm2 = open("/proc/self/pagemap", O_RDONLY);
>>>> if (pm2 < 0) {
>>>> perror("Can't open pagemap file");
>>>> return -1;
>>>> }
>>>>
>>>> ((char *)ctx)[0] = '\0';
>>>> lseek(pm2, ctx / PAGE_SIZE * sizeof(u64), SEEK_SET);
>>>> ret = read(pm2, &pmap, sizeof(pmap));
>>>> if (ret < 0)
>>>> perror("Read pmap err!");
>>>> close(pm2);
>>>> if (pmap & PME_SOFT_DIRTY)
>>>> printf("Dirty tracking exists on aio\n");
>>>> else
>>>> printf("Shit happens\n");
>>>
>>> That's not prove. Kernel also updates the ring when completing requests, but
>>> you don't check this case.
>>
>> It's "inflight" requests. We used to do not handle them, and the patch does not
>> change anything in this moment.
>
> No it's not in-flight. You make pre-dump and pick the ring page, then
> app does aio req, it gets completed and kernel updates the ring. You
> go with the 2nd predump or dump and the ring page is _not_ marked as
> soft-dirty. Boom! You've just lost the completed request.
Ah, sure. Thanks.
>> I'm going to add plugin to parasite_check_aios(), and to wait inflight requests
>> from there.
>>
>>> Anyway, I don't think treating aio ring buffer as regular anonymous memory
>>> is good idea.
>>
>> What do you suggest? Add vma_entry_is_private(entry) | vma_entry_is(entry, VMA_AREA_AIORING)
>> every place we used it?
>
> Not every. My current opinion is that soft-dirty tracking should NOT
> be done for AIO rings.
So, if we skip AIO on pre-dump, but we will dump it like any other private memory,
is this OK for you?
More information about the CRIU
mailing list