[CRIU] Bug report: a process restored with criu crashes on SIGFPE

Shlomi Matichin shlomi at binaris.com
Thu Jan 25 00:48:14 MSK 2018


Hello,

first, thank you guys for all your awesome work with criu. i have a bug
report i would like to ask your help with, but please know that criu to me
is magic, and its amazing how well it works.

REPRODUCING CODE ATTACHED:
attached are two programs written in python3. the server side is a simple
tcp socket accept connection, compute, return answer and close connection
loop (implemented with two files, main.py, generated.py). the client side
just connects and prints whatever comes on the tcp connection.

STEPS TO REPRODUCE:
on terminal 1: pypy main.py
on terminal 2: python3 client.py
on terminal 2: cd <dump directory>
on terminal 2: sudo criu dump -t `pidof pypy` --shell-job
on terminal 1: <server dies>
on terminal 2: sudo criu restore --shell-job
on terminal 3: sudo strace -fF -p `pidof pypy`
on terminal 1: python3 client.py
on terminal 2: <pypy crashes, parent process exists>
on terminal 3: <output follows:>
strace: Process 326 attached
accept(3, {sa_family=AF_INET, sin_port=htons(56262),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 4
--- SIGFPE {si_signo=SIGFPE, si_code=FPE_FLTRES, si_addr=0x7f6b19ce76d1} ---
+++ killed by SIGFPE (core dumped) +++

same scenario exactly, but instead of running "pypy main.py" on the first
line, running "python3 main.py" works perfectly. it only happens when
running with pypy.

REPRODUCTION ENVIRONMENT:
1. tested with personal ubuntu 17.10 laptop, and aws ubuntu 17.10 ec2
server.
2. pypy installed with "sudo apt-get install pypy"
3. two versions of criu on both machines reproduce the bug: 3.7 stable
built from source (downloaded from criu.org), and 3.4 installed with "sudo
apt-get install criu"

motivation behind project:
pypy is a python jit, which accelerates python computations significantly.
the use case in generated.py takes ~2minutes to run using python3, but 4.1s
using pypy! however, the pypy jit needs to "warm up": the same computation
takes 3.6s running for the second time inside the same process. of course
this is just a "sample", the real application the improvement between warm
and cold jit is around 2X. the sample application attached was to simplify
reproduction to a trivial application (a single tcp socket in "accepting"
state).
the pypy team declare that the jit cannot be snapshotted (
http://doc.pypy.org/en/latest/faq.html#couldn-t-the-jit-dump-and-reload-already-compiled-machine-code
), so we thought we can emulate the effect with criu.

please help me!
thanks in advance,
Shlomi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20180124/27a4e67e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.py
Type: text/x-python
Size: 302 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20180124/27a4e67e/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: client.py
Type: text/x-python
Size: 106 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20180124/27a4e67e/attachment-0001.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: generated.py
Type: text/x-python
Size: 7074 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20180124/27a4e67e/attachment-0002.py>


More information about the CRIU mailing list