[CRIU] BUG: CRIU corrupt floating point state after checkpoint

Dmitry Safonov 0x7f454c46 at gmail.com
Wed Sep 25 04:36:08 MSK 2019


On 9/25/19 2:28 AM, Diyu Zhou wrote:
> Hi Cyrill and Andrei,
> 
> Thank you for your help.
> 
> I have tried to run it on a machine with xsave instruction and the problem is
> still there.  Dump log and cpuinfo is attached.
> 
>> Another question is -- the problem appears after chekpoint only, you didnt do
>> restore procedure?
> 
> Correct. I will explain more in detail below.
> 
> The problem seems to me is that the checkpoint process corrupts the floating
> point register value, after it have obtained the value of floating point
> register.  If I leave the floating point process continue to run after
> checkpointing, the floating point process will yield an error.
> 
> However, the floating point register CRIU obtain is correct. I have verified it
> with a script that keeps dumping (and kill it) and restoring the floating point
> program with a 30ms interval. The floating point program runs to the end
> correctly.
> 
> So I conclude the corruption occurs after obtaining the FPU register value. My
> guess is some part of the parasite code somehow executes floating point
> instruction or invoke functions like memset, memcpy that potentially uses SSE.

Oh, that's a good guess.
I had in TODO adding a warning/breaking the build when parasite blob
found to be using fpu.. But I thought, it doesn't happen so it still on
the list somewhere.

Could you upload `objdump -dS criu/pie/parasite.built-in.o` somewhere?
(like gist on github i.e.)

Thanks,
          Dmitry


More information about the CRIU mailing list