[CRIU] Manipulating VM areas before parasite code

Wed Jun 12 11:59:33 MSK 2019

At this point I changed function "unmap_old_vmas" such that it does not 
unmap the region I previously allocated.

Now, instead of segfault, I get:

(00.047557) `- Expecting exit
(00.047598) 30480 was trapped
(00.047622) 30480 (native) is going to execute the syscall 11, required 
is 11
(00.047723) 30480 was stopped
(00.047759) Running pre-resume scripts
(00.047796) Writing stats
(00.048128) Running post-resume scripts
*** stack smashing detected ***: <unknown> terminated

And again core dump is not generated, although following commands result 
in a core dump on the same terminal:

[ec2-user at ip-172-31-43-32 criu]$ sleep 1000 &
[1] 30571
[ec2-user at ip-172-31-43-32 criu]$ killall -SEGV  sleep
[1]+  Segmentation fault      (core dumped) sleep 1000

On 11/06/2019 22:57, Maksym Planeta wrote:
> Hi,
> 
> I think the reason for the segfault. It turns out that parasite code 
> unmaps most of the memory (unmap_old_vmas), so whatever I map before 
> that gets lost.
> 
> I'll right back if the problem will not be fixed.
> 
> On 11/06/2019 13:02, Maksym Planeta wrote:
>> Hello,
>>
>> I'm trying to add checkpointing support for ibverbs interface (one of 
>> network interfaces with RDMA capabilities). To simplify implementation 
>> I add support only for ibverbs with SoftRoCE as a backend.
>>
>> I checkpoint the kernel part of the ibverbs state by adding an 
>> additional ibverbs call, so that kernel serializes the state itself. 
>> But to restore the state, I try to reuse existing ibverbs calls.
>>
>> For example, there is a concept of Memory Regions (MR) in ibvebs, that 
>> represents pinned memory that can be used for RDMA. Pinning the memory 
>> works by calling ibv_reg_mr function that pins aligned memory range 
>> that was previously allocated, for example by mmap. Registering an MR 
>> also incurs some bookkeeping from the kernel side.
>>
>> I reregister the original MR by calling ibv_reg_mr with the same 
>> parameters, but for the call to work I also need to make sure that the 
>> actual memory already exists and mapped. It turned out that it is hard 
>> to guarantee the last part in CRIU.
>>
>> Existing memory premapping does not work, because CRIU mapps memory 
>> first into temporary region, and then remaps it into the final 
>> destination in the parasite code. Registering memory region inside 
>> parasite code does not work for other reason, that I can explain 
>> separately.
>>
>> As result, I try to modify CRIU code to create a mapping in the proper 
>> destination before the parasite code. I add an additional mmap call 
>> for VM area (struct vma_area) for areas that have at least part of it 
>> registered as ibverbs MR as follows:
>>
>>     addr = mmap((void *)vma->e->start, vma_entry_len(vma->e),
>>             vma->e->prot | PROT_WRITE,
>>             vma->e->flags | MAP_FIXED,
>>             vma->e->fd, vma->e->pgoff);
>>     if (addr == MAP_FAILED) {
>>         pr_perror("Unable to map VMA_IBVERBS");
>>         return -1;
>>     }
>>
>> This call happens in premap_private_vma right before the original 
>> mmap. As result, now VMA has to mappings during the recovery.
>>
>> Inside the parasite code I update vma_remap function by unmapping the 
>> normally created region, instead of remapping it to the final 
>> destination as follows:
>>
>>     if (vma_entry_is(vma_entry, VMA_AREA_IBVERBS)) {
>>         if (guard != 0) {
>>             pr_err("No idea what to do with guard pages\n");
>>             return -1;
>>         }
>>         sys_munmap((void *)src, len);
>>         return 0;
>>     }
>>
>> This code is added before "if (guard != 0) {"
>>
>> Now, if I try to restore the program it almost immediately crashes. 
>> Here are last lines of the restore log:
>>
>> (00.030087) Running post-restore scripts
>> (00.030109) Unlock network
>> (00.030146)     Running iptables [iptables -w -t filter -D INPUT 
>> --protocol tcp -m mark ! --mark 0xC114 --source 172.16.2.189 --sport 
>> 18515 --destination 172.16.2.127 --dport 38660 -j DROP]
>> (00.037551) Unlocked 172.16.2.127:38660 - 172.16.2.189:18515 connection
>> (00.037582)     Running iptables [iptables -w -t filter -D OUTPUT 
>> --protocol tcp -m mark ! --mark 0xC114 --source 172.16.2.127 --sport 
>> 38660 --destination 172.16.2.189 --dport 18515 -j DROP]
>> (00.044836) Unlocked 172.16.2.189:18515 - 172.16.2.127:38660 connection
>> (00.044946) pie: 18861: pie: Turning repair off for 5 (reuse 0)
>> (00.045007) pie: 18861: seccomp: mode 0 on tid 18861
>> (00.045119) Force no-breakpoints restore
>> (00.045144) Restore finished successfully. Resuming tasks.
>> (00.045184) 18861 was trapped
>> (00.045212) 18861 (native) is going to execute the syscall 202, 
>> required is 15
>> (00.045263) 18861 was trapped
>> (00.045280) `- Expecting exit
>> (00.045320) 18861 was trapped
>> (00.045348) 18861 (native) is going to execute the syscall 3, required 
>> is 15
>> (00.045401) 18861 was trapped
>> (00.045419) `- Expecting exit
>> (00.045459) 18861 was trapped
>> (00.045484) 18861 (native) is going to execute the syscall 3, required 
>> is 15
>> (00.045529) 18861 was trapped
>> (00.045547) `- Expecting exit
>> (00.045587) 18861 was trapped
>> (00.045612) 18861 (native) is going to execute the syscall 11, 
>> required is 15
>> (00.045693) 18861 was trapped
>> (00.045710) `- Expecting exit
>> (00.045760) 18861 was trapped
>> (00.045786) 18861 (native) is going to execute the syscall 15, 
>> required is 15
>> (00.045839) 18861 was stopped
>> (00.045928) 18861 was trapped
>> (00.045954) 18861 (native) is going to execute the syscall 11, 
>> required is 11
>> (00.046055) 18861 was stopped
>> (00.046089) Running pre-resume scripts
>> (00.046121) Writing stats
>> (00.046425) Running post-resume scripts
>>
>> In the dmesg I see that there was a SEGFAULT:
>>
>> [84616.758753] ib_send_bw[18861]: segfault at 7f7333733bf0 ip 
>> 00007f7332689579 sp 00007ffdc7a49f28 error 6 in 
>> libc-2.26.so[7f73325c6000+1ad000]
>> [84616.767105] Code: 05 48 3d 00 f0 ff ff 77 2a 89 d7 89 44 24 0c e8 
>> 9d ee 03 00 8b 44 24 0c 48 83 c4 18 5b 5d c3 66 90 48 8b 15 e9 c8 2e 
>> 00 f7 d8 <64> 89 02 b8 ff ff ff ff c3 48 8b 0d d7 c8 2e 00 f7 d8 64 89 
>> 01 b8
>>
>> ib_send_bw is a test application I'm checkpointing.
>>
>> And the problem is that I don't know how to debug this issue, because 
>> core dumps are not being created (I set ulimit -c unlimited) and 
>> strace does not give me useful information.
>>
>> It seems that the program crashes almost immediately after the 
>> parasite code ends, but I also don't know where exactly.
>>
>> Could you help me debugging this problem?
>>
>> If required, I can share all my code (updates to CRIU, ibvebrs 
>> libraries and the kernel).
>>
> 

-- 
Regards,
Maksym Planeta