[CRIU] Manipulating VM areas before parasite code
Maksym Planeta
mplaneta at os.inf.tu-dresden.de
Wed Jun 12 16:13:24 MSK 2019
OK. There were couple silly bugs, but now this part seems to be fixed.
On 12/06/2019 10:59, Maksym Planeta wrote:
> At this point I changed function "unmap_old_vmas" such that it does not
> unmap the region I previously allocated.
>
> Now, instead of segfault, I get:
>
> (00.047557) `- Expecting exit
> (00.047598) 30480 was trapped
> (00.047622) 30480 (native) is going to execute the syscall 11, required
> is 11
> (00.047723) 30480 was stopped
> (00.047759) Running pre-resume scripts
> (00.047796) Writing stats
> (00.048128) Running post-resume scripts
> *** stack smashing detected ***: <unknown> terminated
>
> And again core dump is not generated, although following commands result
> in a core dump on the same terminal:
>
> [ec2-user at ip-172-31-43-32 criu]$ sleep 1000 &
> [1] 30571
> [ec2-user at ip-172-31-43-32 criu]$ killall -SEGV sleep
> [1]+ Segmentation fault (core dumped) sleep 1000
>
>
> On 11/06/2019 22:57, Maksym Planeta wrote:
>> Hi,
>>
>> I think the reason for the segfault. It turns out that parasite code
>> unmaps most of the memory (unmap_old_vmas), so whatever I map before
>> that gets lost.
>>
>> I'll right back if the problem will not be fixed.
>>
>> On 11/06/2019 13:02, Maksym Planeta wrote:
>>> Hello,
>>>
>>> I'm trying to add checkpointing support for ibverbs interface (one of
>>> network interfaces with RDMA capabilities). To simplify
>>> implementation I add support only for ibverbs with SoftRoCE as a
>>> backend.
>>>
>>> I checkpoint the kernel part of the ibverbs state by adding an
>>> additional ibverbs call, so that kernel serializes the state itself.
>>> But to restore the state, I try to reuse existing ibverbs calls.
>>>
>>> For example, there is a concept of Memory Regions (MR) in ibvebs,
>>> that represents pinned memory that can be used for RDMA. Pinning the
>>> memory works by calling ibv_reg_mr function that pins aligned memory
>>> range that was previously allocated, for example by mmap. Registering
>>> an MR also incurs some bookkeeping from the kernel side.
>>>
>>> I reregister the original MR by calling ibv_reg_mr with the same
>>> parameters, but for the call to work I also need to make sure that
>>> the actual memory already exists and mapped. It turned out that it is
>>> hard to guarantee the last part in CRIU.
>>>
>>> Existing memory premapping does not work, because CRIU mapps memory
>>> first into temporary region, and then remaps it into the final
>>> destination in the parasite code. Registering memory region inside
>>> parasite code does not work for other reason, that I can explain
>>> separately.
>>>
>>> As result, I try to modify CRIU code to create a mapping in the
>>> proper destination before the parasite code. I add an additional mmap
>>> call for VM area (struct vma_area) for areas that have at least part
>>> of it registered as ibverbs MR as follows:
>>>
>>> addr = mmap((void *)vma->e->start, vma_entry_len(vma->e),
>>> vma->e->prot | PROT_WRITE,
>>> vma->e->flags | MAP_FIXED,
>>> vma->e->fd, vma->e->pgoff);
>>> if (addr == MAP_FAILED) {
>>> pr_perror("Unable to map VMA_IBVERBS");
>>> return -1;
>>> }
>>>
>>> This call happens in premap_private_vma right before the original
>>> mmap. As result, now VMA has to mappings during the recovery.
>>>
>>> Inside the parasite code I update vma_remap function by unmapping the
>>> normally created region, instead of remapping it to the final
>>> destination as follows:
>>>
>>> if (vma_entry_is(vma_entry, VMA_AREA_IBVERBS)) {
>>> if (guard != 0) {
>>> pr_err("No idea what to do with guard pages\n");
>>> return -1;
>>> }
>>> sys_munmap((void *)src, len);
>>> return 0;
>>> }
>>>
>>> This code is added before "if (guard != 0) {"
>>>
>>> Now, if I try to restore the program it almost immediately crashes.
>>> Here are last lines of the restore log:
>>>
>>> (00.030087) Running post-restore scripts
>>> (00.030109) Unlock network
>>> (00.030146) Running iptables [iptables -w -t filter -D INPUT
>>> --protocol tcp -m mark ! --mark 0xC114 --source 172.16.2.189 --sport
>>> 18515 --destination 172.16.2.127 --dport 38660 -j DROP]
>>> (00.037551) Unlocked 172.16.2.127:38660 - 172.16.2.189:18515 connection
>>> (00.037582) Running iptables [iptables -w -t filter -D OUTPUT
>>> --protocol tcp -m mark ! --mark 0xC114 --source 172.16.2.127 --sport
>>> 38660 --destination 172.16.2.189 --dport 18515 -j DROP]
>>> (00.044836) Unlocked 172.16.2.189:18515 - 172.16.2.127:38660 connection
>>> (00.044946) pie: 18861: pie: Turning repair off for 5 (reuse 0)
>>> (00.045007) pie: 18861: seccomp: mode 0 on tid 18861
>>> (00.045119) Force no-breakpoints restore
>>> (00.045144) Restore finished successfully. Resuming tasks.
>>> (00.045184) 18861 was trapped
>>> (00.045212) 18861 (native) is going to execute the syscall 202,
>>> required is 15
>>> (00.045263) 18861 was trapped
>>> (00.045280) `- Expecting exit
>>> (00.045320) 18861 was trapped
>>> (00.045348) 18861 (native) is going to execute the syscall 3,
>>> required is 15
>>> (00.045401) 18861 was trapped
>>> (00.045419) `- Expecting exit
>>> (00.045459) 18861 was trapped
>>> (00.045484) 18861 (native) is going to execute the syscall 3,
>>> required is 15
>>> (00.045529) 18861 was trapped
>>> (00.045547) `- Expecting exit
>>> (00.045587) 18861 was trapped
>>> (00.045612) 18861 (native) is going to execute the syscall 11,
>>> required is 15
>>> (00.045693) 18861 was trapped
>>> (00.045710) `- Expecting exit
>>> (00.045760) 18861 was trapped
>>> (00.045786) 18861 (native) is going to execute the syscall 15,
>>> required is 15
>>> (00.045839) 18861 was stopped
>>> (00.045928) 18861 was trapped
>>> (00.045954) 18861 (native) is going to execute the syscall 11,
>>> required is 11
>>> (00.046055) 18861 was stopped
>>> (00.046089) Running pre-resume scripts
>>> (00.046121) Writing stats
>>> (00.046425) Running post-resume scripts
>>>
>>> In the dmesg I see that there was a SEGFAULT:
>>>
>>> [84616.758753] ib_send_bw[18861]: segfault at 7f7333733bf0 ip
>>> 00007f7332689579 sp 00007ffdc7a49f28 error 6 in
>>> libc-2.26.so[7f73325c6000+1ad000]
>>> [84616.767105] Code: 05 48 3d 00 f0 ff ff 77 2a 89 d7 89 44 24 0c e8
>>> 9d ee 03 00 8b 44 24 0c 48 83 c4 18 5b 5d c3 66 90 48 8b 15 e9 c8 2e
>>> 00 f7 d8 <64> 89 02 b8 ff ff ff ff c3 48 8b 0d d7 c8 2e 00 f7 d8 64
>>> 89 01 b8
>>>
>>> ib_send_bw is a test application I'm checkpointing.
>>>
>>> And the problem is that I don't know how to debug this issue, because
>>> core dumps are not being created (I set ulimit -c unlimited) and
>>> strace does not give me useful information.
>>>
>>> It seems that the program crashes almost immediately after the
>>> parasite code ends, but I also don't know where exactly.
>>>
>>> Could you help me debugging this problem?
>>>
>>> If required, I can share all my code (updates to CRIU, ibvebrs
>>> libraries and the kernel).
>>>
>>
>
--
Regards,
Maksym Planeta
More information about the CRIU
mailing list