[CRIU] Bug report: a process restored with criu crashes on SIGFPE

Cyrill Gorcunov gorcunov at gmail.com
Mon Jan 29 21:54:10 MSK 2018


On Mon, Jan 29, 2018 at 4:35 PM, Dmitry Safonov <0x7f454c46 at gmail.com> wrote:
> Hi Shlomi,
>
> 2018-01-29 12:45 GMT+00:00 Shlomi Matichin <shlomi at binaris.com>:
>> hello andrei and dimitry,
>>
>> (dimitry, thats a cool email address).
>
> ;-)
>
>> so i started creating a VM for you guys to reproduce the bug, and found out
>> it works great on t2.large aws instances - but crashes consistently on
>> c5.large aws instances, which i think explains why you couldn't reproduce
>> it...
>> dimitry's test fails on the c5.large, and succeeds on the t2.large (rest of
>> the tests you requested, pass on both). output attached. (i had to add sudo
>> and change protobuf-python -> python protobuf).
>
> Yes, that's what I've expected..
> As far as I've checked the code on the weekend, kernel may expect mxcsr
> to be placed on a different offset in fpu state structure, depending on cpu
> features, compile options, etc. As you don't migrate the application at this
> point between different machines with different cpu features, I think that
> it's that we fail at parsing of ptrace() GETFPREGS depending on kernel options
> or on the cpu feature set.
>
> Ugh, need to revisit and review fpu code..
> Anyway, I also think it would be kinda cool and nice idea to run some zdtm
> tests with different cpu features masked - it may reveal some more possible
> problems.. Maybe even dump on one feature set and restore on a richer one.
>
>> pypy crashes after restore on my personal laptop as well, but unfortunatly
>> the tests didn't run on my laptop, output exception and cpuinfo also
>> attached.
>>
>> i'm creating a VM for you guys on AWS with a c5.large instance type to work
>> on, will send connection details later on a private email.
>> i'm using the following ami, at eu-west-1 aws region
>> "ami-0741d47e" #
>> ubuntu/images/hvm-ssd/ubuntu-artful-17.10-amd64-server-20180102
>>
>> i'll also finish writing a script to reproduce the issue, and will send you
>> the instructions in that email.
>
> Thanks for your effort and constant reports.
> I didn't reply to your mails because there were some problems with ml
> and I just haven't seen them - Andrey helped with this.

JFYI: This issue is due to lack of AVX512 extension support in criu


More information about the CRIU mailing list