[CRIU] Manipulating VM areas before parasite code
Maksym Planeta
mplaneta at os.inf.tu-dresden.de
Wed Jun 12 11:59:33 MSK 2019
At this point I changed function "unmap_old_vmas" such that it does not
unmap the region I previously allocated.
Now, instead of segfault, I get:
(00.047557) `- Expecting exit
(00.047598) 30480 was trapped
(00.047622) 30480 (native) is going to execute the syscall 11, required
is 11
(00.047723) 30480 was stopped
(00.047759) Running pre-resume scripts
(00.047796) Writing stats
(00.048128) Running post-resume scripts
*** stack smashing detected ***: <unknown> terminated
And again core dump is not generated, although following commands result
in a core dump on the same terminal:
[ec2-user at ip-172-31-43-32 criu]$ sleep 1000 &
[1] 30571
[ec2-user at ip-172-31-43-32 criu]$ killall -SEGV sleep
[1]+ Segmentation fault (core dumped) sleep 1000
On 11/06/2019 22:57, Maksym Planeta wrote:
> Hi,
>
> I think the reason for the segfault. It turns out that parasite code
> unmaps most of the memory (unmap_old_vmas), so whatever I map before
> that gets lost.
>
> I'll right back if the problem will not be fixed.
>
> On 11/06/2019 13:02, Maksym Planeta wrote:
>> Hello,
>>
>> I'm trying to add checkpointing support for ibverbs interface (one of
>> network interfaces with RDMA capabilities). To simplify implementation
>> I add support only for ibverbs with SoftRoCE as a backend.
>>
>> I checkpoint the kernel part of the ibverbs state by adding an
>> additional ibverbs call, so that kernel serializes the state itself.
>> But to restore the state, I try to reuse existing ibverbs calls.
>>
>> For example, there is a concept of Memory Regions (MR) in ibvebs, that
>> represents pinned memory that can be used for RDMA. Pinning the memory
>> works by calling ibv_reg_mr function that pins aligned memory range
>> that was previously allocated, for example by mmap. Registering an MR
>> also incurs some bookkeeping from the kernel side.
>>
>> I reregister the original MR by calling ibv_reg_mr with the same
>> parameters, but for the call to work I also need to make sure that the
>> actual memory already exists and mapped. It turned out that it is hard
>> to guarantee the last part in CRIU.
>>
>> Existing memory premapping does not work, because CRIU mapps memory
>> first into temporary region, and then remaps it into the final
>> destination in the parasite code. Registering memory region inside
>> parasite code does not work for other reason, that I can explain
>> separately.
>>
>> As result, I try to modify CRIU code to create a mapping in the proper
>> destination before the parasite code. I add an additional mmap call
>> for VM area (struct vma_area) for areas that have at least part of it
>> registered as ibverbs MR as follows:
>>
>> addr = mmap((void *)vma->e->start, vma_entry_len(vma->e),
>> vma->e->prot | PROT_WRITE,
>> vma->e->flags | MAP_FIXED,
>> vma->e->fd, vma->e->pgoff);
>> if (addr == MAP_FAILED) {
>> pr_perror("Unable to map VMA_IBVERBS");
>> return -1;
>> }
>>
>> This call happens in premap_private_vma right before the original
>> mmap. As result, now VMA has to mappings during the recovery.
>>
>> Inside the parasite code I update vma_remap function by unmapping the
>> normally created region, instead of remapping it to the final
>> destination as follows:
>>
>> if (vma_entry_is(vma_entry, VMA_AREA_IBVERBS)) {
>> if (guard != 0) {
>> pr_err("No idea what to do with guard pages\n");
>> return -1;
>> }
>> sys_munmap((void *)src, len);
>> return 0;
>> }
>>
>> This code is added before "if (guard != 0) {"
>>
>> Now, if I try to restore the program it almost immediately crashes.
>> Here are last lines of the restore log:
>>
>> (00.030087) Running post-restore scripts
>> (00.030109) Unlock network
>> (00.030146) Running iptables [iptables -w -t filter -D INPUT
>> --protocol tcp -m mark ! --mark 0xC114 --source 172.16.2.189 --sport
>> 18515 --destination 172.16.2.127 --dport 38660 -j DROP]
>> (00.037551) Unlocked 172.16.2.127:38660 - 172.16.2.189:18515 connection
>> (00.037582) Running iptables [iptables -w -t filter -D OUTPUT
>> --protocol tcp -m mark ! --mark 0xC114 --source 172.16.2.127 --sport
>> 38660 --destination 172.16.2.189 --dport 18515 -j DROP]
>> (00.044836) Unlocked 172.16.2.189:18515 - 172.16.2.127:38660 connection
>> (00.044946) pie: 18861: pie: Turning repair off for 5 (reuse 0)
>> (00.045007) pie: 18861: seccomp: mode 0 on tid 18861
>> (00.045119) Force no-breakpoints restore
>> (00.045144) Restore finished successfully. Resuming tasks.
>> (00.045184) 18861 was trapped
>> (00.045212) 18861 (native) is going to execute the syscall 202,
>> required is 15
>> (00.045263) 18861 was trapped
>> (00.045280) `- Expecting exit
>> (00.045320) 18861 was trapped
>> (00.045348) 18861 (native) is going to execute the syscall 3, required
>> is 15
>> (00.045401) 18861 was trapped
>> (00.045419) `- Expecting exit
>> (00.045459) 18861 was trapped
>> (00.045484) 18861 (native) is going to execute the syscall 3, required
>> is 15
>> (00.045529) 18861 was trapped
>> (00.045547) `- Expecting exit
>> (00.045587) 18861 was trapped
>> (00.045612) 18861 (native) is going to execute the syscall 11,
>> required is 15
>> (00.045693) 18861 was trapped
>> (00.045710) `- Expecting exit
>> (00.045760) 18861 was trapped
>> (00.045786) 18861 (native) is going to execute the syscall 15,
>> required is 15
>> (00.045839) 18861 was stopped
>> (00.045928) 18861 was trapped
>> (00.045954) 18861 (native) is going to execute the syscall 11,
>> required is 11
>> (00.046055) 18861 was stopped
>> (00.046089) Running pre-resume scripts
>> (00.046121) Writing stats
>> (00.046425) Running post-resume scripts
>>
>> In the dmesg I see that there was a SEGFAULT:
>>
>> [84616.758753] ib_send_bw[18861]: segfault at 7f7333733bf0 ip
>> 00007f7332689579 sp 00007ffdc7a49f28 error 6 in
>> libc-2.26.so[7f73325c6000+1ad000]
>> [84616.767105] Code: 05 48 3d 00 f0 ff ff 77 2a 89 d7 89 44 24 0c e8
>> 9d ee 03 00 8b 44 24 0c 48 83 c4 18 5b 5d c3 66 90 48 8b 15 e9 c8 2e
>> 00 f7 d8 <64> 89 02 b8 ff ff ff ff c3 48 8b 0d d7 c8 2e 00 f7 d8 64 89
>> 01 b8
>>
>> ib_send_bw is a test application I'm checkpointing.
>>
>> And the problem is that I don't know how to debug this issue, because
>> core dumps are not being created (I set ulimit -c unlimited) and
>> strace does not give me useful information.
>>
>> It seems that the program crashes almost immediately after the
>> parasite code ends, but I also don't know where exactly.
>>
>> Could you help me debugging this problem?
>>
>> If required, I can share all my code (updates to CRIU, ibvebrs
>> libraries and the kernel).
>>
>
--
Regards,
Maksym Planeta
More information about the CRIU
mailing list