[CRIU] criu check --extra output and dump failure

Brinkmann, Harald Harald.Brinkmann at bst-international.com
Fri Apr 21 05:33:11 PDT 2017


Hi Dmitry,

On Thu, 2017-04-20 at 19:05 +0300, Dmitry Safonov wrote:
> 2017-04-20 18:12 GMT+03:00 Brinkmann, Harald
> <Harald.Brinkmann at bst-international.com>:
> > On Thu, 2017-04-20 at 17:36 +0300, Dmitry Safonov wrote:
> >> 2017-04-20 15:50 GMT+03:00 Brinkmann, Harald
> >> <Harald.Brinkmann at bst-international.com>:
> >> > On Thu, 2017-04-20 at 15:00 +0300, Dmitry Safonov wrote:
> >> >> 2017-04-20 11:08 GMT+03:00 Brinkmann, Harald
> >> >> <Harald.Brinkmann at bst-international.com>:
> >> >> > On Wed, 2017-04-19 at 22:48 +0300, Dmitry Safonov wrote:
> >> >> >> 2017-04-19 15:26 GMT+03:00 Brinkmann, Harald
> >> >> >> <Harald.Brinkmann at bst-international.com>:
> >> >> >>
> >> >> >> > The bad news is that I still cannot successfully run that "Simple
> >> >> >> > Loop"-example, although the crash looks different:
> >> >> >> > (...)
> >> >> >>
> >> >> >> Hmm, so from dmesg and logs I don't see obvious reasons, why
> >> >> >> it has crashed. All looks quite normal (except PC) :-/
> >> >> >>
> >> >> >> Could you send me your parasite.built-in.o - so I'll dissect it with gdb/etc?
> >> >> >> Crashdump may be also useful.
> >> >> >
> >> >> > Done.
> >> >> >
> >> >> > These should be from the same test run as the stuff above and
> >> >> > yesterday's attachments.
> >> >>
> >> >> Hi Harald,
> >> >>
> >> >> So the fun thing - is that dumping worked on RPI2 with your parasite blob :)
> >> >> I've double checked, that criu loaded your parasite in a task.
> >> >>
> >> >> As your parasite is compiled with THUMB mode, could you check that
> >> >> your kernel has CONFIG_ARM_THUMB enabled?
> >> >>
> >> >> To make sure that the issue here is no more in parasite, but in
> >> >> environment/kernel/criu/etc, could you make test with parasite blob from
> >> >> my compilation?
> >> >> Just replace criu/pie/parasite.built-in.o with mine and run `make` - so
> >> >> it'll regenerate parasite-blob.h header.
> >> >
> >> > Yes, CONFIG_ARM_THUMB is enabled. Just to make sure I haven't missed
> >> > anything else, I am attaching the complete kernel config file.
> >> >
> >> > I did use your blob and the same thing happens. :-( Not sure whether
> >> > that was expected, though.
> >>
> >> Hmm, it looks like, the problem is not in parasite - if it fails with
> >> the same blob
> >> for you and succeed for me.
> >>
> >> But as it results in segfault in parasite - I've no other idea then
> >> start debugging
> >> from there.
> >> Could you run with attached diff to see, where parasite stops with SIGTRAP?
> >> (please, clear previous pie before building with `git clean -fdx criu/pie`)
> >> It *should* stop in parasite_service() as it does for me, you can check it
> >> from dump message:
> >> (00.090640) Putting parasite blob into 0x76c3a000->0x76fb1000
> >>
> >> and from dmesg message:
> >> [14771.879711] Unhandled prefetch abort: breakpoint debug exception
> >> (0x002) at 0x76fb37d4
> >>
> >> Where parasite_service is:
> >> 000027c8 <parasite_service>:
> >>     27c8:       e92d4070        push    {r4, r5, r6, lr}
> >>     27cc:       e1a04000        mov     r4, r0
> >>     27d0:       e1a05001        mov     r5, r1
> >>     27d4:       e1200071        bkpt    0x0001
> >>
> >> So, afterward moving this:
> >> +       asm volatile ("\tbkpt #1\n");
> >> around you can find, what causes segfault in parasite.
> >>
> >> If you have gdb on target, you can debug it by single-stepping:
> >> 1. change
> >> asm volatile ("\tbkpt #1\n");
> >> to
> >> asm volatile ("\tbl .\n"); /* busyloop */
> >> 2. try to dump - it'll hang in busyloop
> >> 3. kill criu process (as there can be only one tracer)
> >> 4. attach to busylooping dumpee and jump over bl instruction
> >> by setting PC like: `set $pc = <addr after bl>`
> >> 5. single-step to segfault
> >>
> >> By either of this ways, you can find the point which results in segfault.
> >> As from your dmesg segfault message I have no clue, what causes it.
> >
> > I have to run in a minute and don't have time to do all the tests you
> > asked for today. However, this is what I have:
>
> Ok.
> What I think - maybe there is an unaligned access or something like
> that in parasite, which works with my kernel/environment, but fails for
> you. So if you find what instruction causes the parasite to jump nowhere
> with LR == 0, which results in segfault - there will be a clue.

I have tried your five-step program with gdb and I stepped right through
the entire parasite back into the busybox main loop. Twice. No crash.
The dumpee was even still running.

In the end I took out 'asm volatile ("...");' from parasite_service()
and ran the same test again. The crash addresses when converted to
parasite.built-in.o dump offsets were 0x15b8, 0x2742 and 0x00b4. That
looks ominously like a concurrency problem to me and that would explain
the previous observations.

My next chance to test something will be Monday afternoon at the
earliest.


--

Harald

BST eltromat International GmbH
Werk Leopoldshöhe
Herforder Straße 249-251
D-33818 Leopoldshöhe

T:      +49 (5208) 987-513

E:      harald.brinkmann at bst-international.com
W:      http://www.bst-eltromat.com




_______________________________________________________
Amtsgericht Bielefeld, HRB Nr. 30830
Geschäftsführer Kristian Jünke, Dr. Johann-Carsten Kipp, Dr. Gunter
Tautorus
Sitz der Gesellschaft: Bielefeld
Vertrauliche E-Mail von BST eltromat International GmbH




More information about the CRIU mailing list