[CRIU] ARM: SIGSEGV in parasite code

Alexander Kartashov alekskartashov at parallels.com
Wed Jun 12 05:50:00 EDT 2013


Hi Chanho,


On 06/12/2013 06:15 AM, Chanho Park wrote:
>> I ran the script dump_test.sh 1000 times and I failed to reproduce the
>> >problem.
> Hmm. I think this weird problem is occurred only to me.
> Did you also use my criu tool?
I tried crtools provided by you as well as compiled from the repository 
head.
> If so, can you tell me your environment?
I use a debootstrapped Linux environment running in the ARM Versatile 
Express for Cortex-A9 QEMU model
compiled from the repository head. I used to catch spurious SEGFAULT's 
with previous versions of QEMU,
however I haven't caught any of them recently.
>
>> >Could you please reproduce the problem with the attached patch applied?
>> >This patch aborts the dumper if the SIGSEGV is intercepted so we may
>> >catch the SIGSEGV in the coredump.
> Yeah. I've attached the coredump and log after applying your patch.
> It might be also generated in the parasite code.
> I also attached socket connection failed log between the tool and pie.
> It didn't SIGSEGV signal.
> But, dumping was failed and also infected program was killed.
> IMHO the parasite code might have some weird problems.
> Can you share your criu tool which compiles with static mode?

Thank you for this dump. I suspect a kernel stack corruption.
Please verify my findings.

(gdb) i r
r0             0xffffffff    4294967295
r1             0xb6fa749c    3069867164
r2             0x0    0
r3             0x10    16
r4             0x0    0
r5             0x0    0
r6             0x0    0
r7             0x0    0
r8             0x0    0
r9             0x0    0
r10            0x0    0
r11            0x0    0
r12            0xb6fa9060    3069874272
sp             0xb6fa9088    0xb6fa9088
lr             0xb6fa46bc    -1225111876
pc             0x0    0
cpsr           0x60000010    1610612752


The suspicious thing is that the PC register is zero. Let's
analyze the code pointed by the LR register:

(gdb) x /3i $lr - 8
    0xb6fa46b4:    mov    r2, #0
    0xb6fa46b8:    bl    0xb6fa4b20
    0xb6fa46bc:    cmp    r0, #0

Analyzing the code at 0xb6fa4b20:

(gdb) x /10i 0xb6fa4b20
    0xb6fa4b20:    push    {r7}
    0xb6fa4b24:    movw    r7, #297    ; 0x129
    0xb6fa4b28:    svc    0x00000000
    0xb6fa4b2c:    pop    {r7}
    0xb6fa4b30:    bx    lr

This is the parasite wrapper for the syscall recvmsg().
It seems the kernel restored the register PC incorrectly.

By the way I'm unable to explain the following mysteries:

* the value of the register R1 is surely a struct msghdr*
    but other registers are clobbered. The most suspicious
    register is R0 that contain 0xffffffff that is probably
    the result of our mismanipulation with ARM_ORIG_r0.

* The following line in the second log:

    pie: __sent ack msg: -1225390624 -1225390624 0

    is totally incomprehensible.

Could you please apply the attached patch and report
whether it helps to cope with the spurious SEGFAULT's?

-- 
Sincerely yours,
Alexander Kartashov

Intern
Core team

www.parallels.com

Skype: aleksandr.kartashov
Email: alekskartashov at parallels.com

-------------- next part --------------
A non-text attachment was scrubbed...
Name: remove-arm-orig-r0.patch
Type: text/x-patch
Size: 723 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20130612/6a85ffe3/attachment.bin>


More information about the CRIU mailing list