[CRIU] BUG: CRIU corrupt floating point state after checkpoint

Diyu Zhou zhoudiyupku at gmail.com
Wed Sep 25 04:28:12 MSK 2019


Hi Cyrill and Andrei,

Thank you for your help.

I have tried to run it on a machine with xsave instruction and the problem is
still there.  Dump log and cpuinfo is attached.

> Another question is -- the problem appears after chekpoint only, you didnt do
> restore procedure?

Correct. I will explain more in detail below.

The problem seems to me is that the checkpoint process corrupts the floating
point register value, after it have obtained the value of floating point
register.  If I leave the floating point process continue to run after
checkpointing, the floating point process will yield an error.

However, the floating point register CRIU obtain is correct. I have verified it
with a script that keeps dumping (and kill it) and restoring the floating point
program with a 30ms interval. The floating point program runs to the end
correctly.

So I conclude the corruption occurs after obtaining the FPU register value. My
guess is some part of the parasite code somehow executes floating point
instruction or invoke functions like memset, memcpy that potentially uses SSE.


Best,
Diyu

On Tue, Sep 24, 2019 at 3:20 PM Cyrill Gorcunov <gorcunov at gmail.com> wrote:
>
> On Tue, Sep 24, 2019 at 02:09:17PM -0700, Andrei Vagin wrote:
> > On Mon, Sep 23, 2019 at 09:35:14PM -0700, Diyu Zhou wrote:
> > > Hey CRIU,
> > >
> > > It seems to me that CRIU corrupts the floating point of the process after
> > > checkpointing. I was wondering if I did something wrong with CRIU or it is a
> > > bug.
> >
> >
> > (00.000193) cpu: x86_family 6 x86_vendor_id GenuineIntel x86_model_id
> > Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
> > (00.000198) cpu: fpu: x87 FPU will use FXSAVE
> > (00.000201) cpu: fpu:1 fxsr:1 xsave:0 xsaveopt:0 xsavec:0 xgetbv1:0 xsaves:0
> >
> > Cyrill or Duma, could you take a look at this?
> >
>
> I didn't look into details yet (hopefully I'll manage on a week)
> but what is more important -- we've been targeting xsave instruction
> as a requirement and as far as I remember never tested deeply
> on the machines without it.
>
> Another question is -- the problem appears after chekpoint only,
> you didnt do restore procedure?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dump.log
Type: text/x-log
Size: 70683 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20190924/d524956f/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cpuinfo
Type: application/octet-stream
Size: 88745 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20190924/d524956f/attachment-0001.obj>


More information about the CRIU mailing list