[CRIU] x86: Hardware breakpoints are not always triggered

Andrey Wagin avagin at gmail.com
Thu Jan 28 00:31:41 PST 2016


Hi,

We use hardware breakpoints in CRIU and we found that sometimes we set
a break-point, but a process doesn't stop on it.

I write a small reproducer for this bug. It create two processes,
where a parent process traces a child. The parent process sets a
break-point and each time when the child stop on it, the parent sets
the variable "xxx" to A in a child process. The child runs an infinite
loop, where it check the variable "xxx" and sets it to B. If a child
process finds that xxx is equal to B, it exits with a non-zero code,
what means that a break-point was not triggered. The source code is
attached.

The reproducer uses a different break-point address if it is executed
with arguments than when it executed without arguments.

Then I made a few experiments. The bug is triggered, if we execute
this program a few times in a KVM virtual machine.

[root at fc22-vm ptrace]# ( while :; do ./ptrace_breakpoint > /dev/null
|| { echo "FAIL - $?"; break; }; done ) &
[3] 4088
[root at fc22-vm ptrace]# ( while :; do ./ptrace_breakpoint > /dev/null
|| { echo "FAIL - $?"; break; }; done ) &
[4] 4091
[root at fc22-vm ptrace]# ( while :; do ./ptrace_breakpoint 1 2 >
/dev/null || { echo "FAIL - $?"; break; }; done ) &
[5] 4094
[root at fc22-vm ptrace]# ( while :; do ./ptrace_breakpoint 1 2 >
/dev/null || { echo "FAIL - $?"; break; }; done ) &
[6] 4097
[8] 4103
[root at fc22-vm ptrace]# 0087: exit - 5
0131: exited, status=1
0126: wait: No child processes
FAIL - 3

I tried to execute the reproducer on the host (where kvm VM-s are
running), but the bug was not triggered during one hour.

When I executed the reproducer in VM without stopping processes on the
host, I found that a bug is triggered much faster in this case.

[root at fc22-vm ptrace]# ./ptrace_breakpoint 1
....
stop 24675
cont
child2 1
stop 24676
cont
child2 1
child2 5
0088: exit - 5
stop 24677
0132: exited, status=1
cont
0127: wait: No child processes

I know that this bug can be reproduced starting with the 4.2 kernel. I
haven't test older versions of the kernel.

I tried to print drX registers after a break-point. Looks like they
are set correctly.

Maybe someone has any ideas where a problem is or how it can be investigated.

Here is a criu issue for this problem:
https://github.com/xemul/criu/issues/107

Thanks,
Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ptrace_breakpoint.c
Type: text/x-csrc
Size: 3520 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20160128/7a78f2e0/attachment-0001.bin>


More information about the CRIU mailing list