[CRIU] criu check --extra output and dump failure

Dmitry Safonov 0x7f454c46 at gmail.com
Fri Apr 21 03:14:51 PDT 2017


2017-04-21 11:53 GMT+03:00 Brinkmann, Harald
<Harald.Brinkmann at bst-international.com>:
>
> Hi Dmitry,
>
> On Thu, 2017-04-20 at 19:05 +0300, Dmitry Safonov wrote:
>> 2017-04-20 18:12 GMT+03:00 Brinkmann, Harald
>> <Harald.Brinkmann at bst-international.com>:
>> > On Thu, 2017-04-20 at 17:36 +0300, Dmitry Safonov wrote:
>> >> 2017-04-20 15:50 GMT+03:00 Brinkmann, Harald
>> >> <Harald.Brinkmann at bst-international.com>:
>> >> > On Thu, 2017-04-20 at 15:00 +0300, Dmitry Safonov wrote:
>> >> >> 2017-04-20 11:08 GMT+03:00 Brinkmann, Harald
>> >> >> <Harald.Brinkmann at bst-international.com>:
>> >> >> > On Wed, 2017-04-19 at 22:48 +0300, Dmitry Safonov wrote:
>> >> >> >> 2017-04-19 15:26 GMT+03:00 Brinkmann, Harald
>> >> >> >> <Harald.Brinkmann at bst-international.com>:
>> >> >> >>
>> >> >> >> > The bad news is that I still cannot successfully run that "Simple
>> >> >> >> > Loop"-example, although the crash looks different:
>> >> >> >> > (...)
>> >> >> >>
>> >> >> >> Hmm, so from dmesg and logs I don't see obvious reasons, why
>> >> >> >> it has crashed. All looks quite normal (except PC) :-/
>> >> >> >>
>> >> >> >> Could you send me your parasite.built-in.o - so I'll dissect it with gdb/etc?
>> >> >> >> Crashdump may be also useful.
>> >> >> >
>> >> >> > Done.
>> >> >> >
>> >> >> > These should be from the same test run as the stuff above and
>> >> >> > yesterday's attachments.
>> >> >>
>> >> >> Hi Harald,
>> >> >>
>> >> >> So the fun thing - is that dumping worked on RPI2 with your parasite blob :)
>> >> >> I've double checked, that criu loaded your parasite in a task.
>> >> >>
>> >> >> As your parasite is compiled with THUMB mode, could you check that
>> >> >> your kernel has CONFIG_ARM_THUMB enabled?
>> >> >>
>> >> >> To make sure that the issue here is no more in parasite, but in
>> >> >> environment/kernel/criu/etc, could you make test with parasite blob from
>> >> >> my compilation?
>> >> >> Just replace criu/pie/parasite.built-in.o with mine and run `make` - so
>> >> >> it'll regenerate parasite-blob.h header.
>> >> >
>> >> > Yes, CONFIG_ARM_THUMB is enabled. Just to make sure I haven't missed
>> >> > anything else, I am attaching the complete kernel config file.
>> >> >
>> >> > I did use your blob and the same thing happens. :-( Not sure whether
>> >> > that was expected, though.
>> >>
>> >> Hmm, it looks like, the problem is not in parasite - if it fails with
>> >> the same blob
>> >> for you and succeed for me.
>> >>
>> >> But as it results in segfault in parasite - I've no other idea then
>> >> start debugging
>> >> from there.
>> >> Could you run with attached diff to see, where parasite stops with SIGTRAP?
>> >> (please, clear previous pie before building with `git clean -fdx criu/pie`)
>> >> It *should* stop in parasite_service() as it does for me, you can check it
>> >> from dump message:
>> >> (00.090640) Putting parasite blob into 0x76c3a000->0x76fb1000
>> >>
>> >> and from dmesg message:
>> >> [14771.879711] Unhandled prefetch abort: breakpoint debug exception
>> >> (0x002) at 0x76fb37d4
>> >>
>> >> Where parasite_service is:
>> >> 000027c8 <parasite_service>:
>> >>     27c8:       e92d4070        push    {r4, r5, r6, lr}
>> >>     27cc:       e1a04000        mov     r4, r0
>> >>     27d0:       e1a05001        mov     r5, r1
>> >>     27d4:       e1200071        bkpt    0x0001
>> >>
>> >> So, afterward moving this:
>> >> +       asm volatile ("\tbkpt #1\n");
>> >> around you can find, what causes segfault in parasite.
>> >>
>> >> If you have gdb on target, you can debug it by single-stepping:
>> >> 1. change
>> >> asm volatile ("\tbkpt #1\n");
>> >> to
>> >> asm volatile ("\tbl .\n"); /* busyloop */
>> >> 2. try to dump - it'll hang in busyloop
>> >> 3. kill criu process (as there can be only one tracer)
>> >> 4. attach to busylooping dumpee and jump over bl instruction
>> >> by setting PC like: `set $pc = <addr after bl>`
>> >> 5. single-step to segfault
>> >>
>> >> By either of this ways, you can find the point which results in segfault.
>> >> As from your dmesg segfault message I have no clue, what causes it.
>> >
>> > I have to run in a minute and don't have time to do all the tests you
>> > asked for today. However, this is what I have:
>>
>> Ok.
>> What I think - maybe there is an unaligned access or something like
>> that in parasite, which works with my kernel/environment, but fails for
>> you. So if you find what instruction causes the parasite to jump nowhere
>> with LR == 0, which results in segfault - there will be a clue.
>>
>> > It seems to break in the right location, but to me it looks like it
>> > generates the wrong signal (7 instead of 5) or am I missing something?
>>
>> Yes, it may generate SIGBUS instead of SIGTRAP.
>> As long as it works and generates proper pc, that's not a problem.
>
> Something weird is going on here - or I am just too slow again. :-(
>
> I have tried this
>
>> >> 1. change
>> >> asm volatile ("\tbkpt #1\n");
>> >> to
>> >> asm volatile ("\tbl .\n"); /* busyloop */
>
> in compel/plugins/std/infect.c (-181,6 +181,7):
>
> parasite_service(unsigned int cmd, void *args)
>
> and when I then run criu's 'make' the only new files are these:
>
> ./compel/plugins
> ./compel/plugins/std
> ./compel/plugins/std/infect.o
> ./compel/plugins/std/infect.d
> ./compel/plugins/std.lib.a
>
> However, we always took the parasite_service related offsets from
> criu/pie/parasite.built-in.o . That file is unchanged after the 'make'
> run. It even tells me so:
>
> (...)
> make -r -R -f __BUILD_HOST__/platform-imx6/build-target/criu-2017-04-10-master-ge77d36c375ce/scripts/nmk/scripts/main.mk makefile=Makefile.library obj=criu/pie all
> make[3]: Nothing to be done for 'all'.
> make -r -R -f __BUILD_HOST__/platform-imx6/build-target/criu-2017-04-10-master-ge77d36c375ce/scripts/nmk/scripts/main.mk makefile=Makefile obj=criu/pie all
> make[3]: Nothing to be done for 'all'.
> (...)

Yes, that's expected - the dependencies now are between compel
and criu and not compel and parasite. That might be worth fixing,
but as this is not a major issue - just do `git clean -fdx criu/pie/`
after changes to compel.

> If I run that thing, rather than hanging in the busyloop, it segfaults
> again.

Yep, that's old parasite object.

> By the way, and I have no idea whether it makes any difference, our
> setup adds compiler flags "-fPIE" and "-pie", unless specifically
> prevented from doing that. Might that be a problem?

Well, as the same parasite works for me and doesn't work for you,
I expect it to be something target-related, not build related.

-- 
             Dmitry


More information about the CRIU mailing list