[CRIU] Bug report: a process restored with criu crashes on SIGFPE

Dmitry Safonov 0x7f454c46 at gmail.com
Mon Jan 29 16:35:01 MSK 2018


Hi Shlomi,

2018-01-29 12:45 GMT+00:00 Shlomi Matichin <shlomi at binaris.com>:
> hello andrei and dimitry,
>
> (dimitry, thats a cool email address).

;-)

> so i started creating a VM for you guys to reproduce the bug, and found out
> it works great on t2.large aws instances - but crashes consistently on
> c5.large aws instances, which i think explains why you couldn't reproduce
> it...
> dimitry's test fails on the c5.large, and succeeds on the t2.large (rest of
> the tests you requested, pass on both). output attached. (i had to add sudo
> and change protobuf-python -> python protobuf).

Yes, that's what I've expected..
As far as I've checked the code on the weekend, kernel may expect mxcsr
to be placed on a different offset in fpu state structure, depending on cpu
features, compile options, etc. As you don't migrate the application at this
point between different machines with different cpu features, I think that
it's that we fail at parsing of ptrace() GETFPREGS depending on kernel options
or on the cpu feature set.

Ugh, need to revisit and review fpu code..
Anyway, I also think it would be kinda cool and nice idea to run some zdtm
tests with different cpu features masked - it may reveal some more possible
problems.. Maybe even dump on one feature set and restore on a richer one.

> pypy crashes after restore on my personal laptop as well, but unfortunatly
> the tests didn't run on my laptop, output exception and cpuinfo also
> attached.
>
> i'm creating a VM for you guys on AWS with a c5.large instance type to work
> on, will send connection details later on a private email.
> i'm using the following ami, at eu-west-1 aws region
> "ami-0741d47e" #
> ubuntu/images/hvm-ssd/ubuntu-artful-17.10-amd64-server-20180102
>
> i'll also finish writing a script to reproduce the issue, and will send you
> the instructions in that email.

Thanks for your effort and constant reports.
I didn't reply to your mails because there were some problems with ml
and I just haven't seen them - Andrey helped with this.

-- 
             Dmitry


More information about the CRIU mailing list