[Devel] Re: [PATCH] c/r: tolerate X86_EFLAGS_RF on restart

Oren Laadan orenl at librato.com
Wed Oct 28 18:20:04 PDT 2009


After reading the code a bit more, and seeing that:

1) ptrace uses the debugreg of a process so may be interested in
that particular flag, and

2) even without ptrace userspace can set that flag (by suitably
setting the restore context from a signal handler)

I now think that we should instead:

1) Keep the X86_EFLAGS_RF from the time of checkpoint *as is*

2) If the restarting task already has this flag set prior to
restoring eflags from the saved value, then preserve the existing
flag even if at the time of checkpoint it wasn't set.

Unless someones yells, I'll commit this soon.

Oren

Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl at librato.com):
>> On restart, the c/r code that sanitizes a task's EFLAGS failed with an
>> error if the X86_EFLAGS_RF was set. However, there are cases in which
>> a task legitimately has this flag recorded at checkpoint, such as:
>>
>> 1) We are running inside a KVM guest, and the guest is being debugged
>> (see arch/x86/kvm/{svm,vmx}.c, via callbacks from qemu.
>>
>> 2) The kernel itself is being debugged using kgbd.
>>
>> 3) Exceptions that occur under certain condition will set this flag
>> on the currently running task. For details see page 679 in Intel's
>> manual (http://www.intel.com/Assets/PDF/manual/253668.pdf), like:
>>
>>   When calling an event handler, Intel 64 and IA-32 processors
>>   establish the value of the RF flag in the EFLAGS image pushed on the
>>   stack:
>>     • For any fault-class exception except a debug exception
>>       generated in response to an instruction breakpoint, the value
>>       pushed for RF is 1.
>>     • For any interrupt arriving after any iteration of a repeated
>>       string instruction but the last iteration, the value pushed for
>>       RF is 1.
>>     • For any trap-class exception generated by any iteration of a
>>       repeated string instruction but the last iteration, the value
>>       pushed for RF is 1.
> 
> I wasn't quite sure what a 'trap-class exception' is,
> http://www.yashks.com/exception-basics.html says:
> 
> Example: INT 3(or CC or CD 21) is a trap instruction. If a program has this
> instruction and if the debugger is running, it will return the next instruction
> address to this debugger. Where debugger can continue it debugging.
> 
> Does that not mean that if we checkpoint t1 and t2 where t1 is ptracing t2, t2
> might have RF set to 1?  So this isn't generally safe, right?
> 
> IIUC it also could be set for things like a "General Protection Exception(Type:
> Fault)" at checkpoint in which case we could really mess up the sw at
> restart.
> 
> Or maybe I'm totally misunderstanding...?
> 
>>     • For other cases, the value pushed for RF is the value that was
>>       in EFLAG.RF at the time the event handler was called.
>>   [...]
>>
>> Case #3 is consistent with the fact that in all cases where we
>> observed the issue there was a sendmail process doing IO.
>>
>> Clearly, the X86_EFLAGS_RF is valid to be set.
>>
>> This patches fixes the restart to:
>>
>> 1) Silently ignore the X86_EFLAGS_RF if was set at checkpoint: this
>> allows restart to tolerate the flag and safely complete the restart.
>>
>> 2) Preserve the existing X86_EFLAGS_RF of a restarting task (instead
>> of overwriting it): this ensures that restarting does not interfere
>> with kvm, kgdb, etc.
>>
>> Signed-off-by: Oren Laadan <orenl at cs.columbia.edu>
>> Cc: Alexey Dobriyan <adobriyan at gmail.com>
>> Cc: Matt Helsley <matthltc at us.ibm.com>
>> Cc: Avi Kivity <avi at redhat.com>
>> ---
>>  arch/x86/mm/checkpoint.c |   40 ++++++++++++++++++++++++++++++++++++++--
>>  1 files changed, 38 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
>> index 9dd8e12..a93bb6f 100644
>> --- a/arch/x86/mm/checkpoint.c
>> +++ b/arch/x86/mm/checkpoint.c
>> @@ -27,13 +27,49 @@ static int check_eflags(__u32 eflags)
>>  #define X86_EFLAGS_CKPT_MASK  \
>>  	(X86_EFLAGS_CF | X86_EFLAGS_PF | X86_EFLAGS_AF | X86_EFLAGS_ZF | \
>>  	 X86_EFLAGS_SF | X86_EFLAGS_TF | X86_EFLAGS_DF | X86_EFLAGS_OF | \
>> -	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID)
>> +	 X86_EFLAGS_NT | X86_EFLAGS_AC | X86_EFLAGS_ID | X86_EFLAGS_RF)
>>  
>>  	if ((eflags & ~X86_EFLAGS_CKPT_MASK) != (X86_EFLAGS_IF | 0x2))
>>  		return 0;
>>  	return 1;
>>  }
>>  
>> +static void restore_eflags(struct pt_regs *regs, __u32 eflags)
>> +{
>> +	/*
>> +	 * A task may have had X86_EFLAGS_RF set at checkpoint, .e.g:
>> +	 * 1) It ran in a KVM guest, and the guest was being debugged,
>> +	 * 2) The kernel was debugged using kgbd,
>> +	 * 3) From Intel's manual: "When calling an event handler, 
>> +	 *    Intel 64 and IA-32 processors establish the value of the
>> +	 *    RF flag in the EFLAGS image pushed on the stack:
>> +	 *  - For any fault-class exception except a debug exception
>> +	 *    generated in response to an instruction breakpoint, the
>> +	 *    value pushed for RF is 1.
>> +	 *  - For any interrupt arriving after any iteration of a
>> +	 *    repeated string instruction but the last iteration, the
>> +	 *    value pushed for RF is 1.
>> +	 *  - For any trap-class exception generated by any iteration
>> +	 *    of a repeated string instruction but the last iteration,
>> +	 *    the value pushed for RF is 1.
>> +	 *  - For other cases, the value pushed for RF is the value
>> +	 *    that was in EFLAG.RF at the time the event handler was
>> +	 *    called.
>> +	 *  [from: http://www.intel.com/Assets/PDF/manual/253668.pdf]
>> +	 *
>> +	 * So: (1) silently ignore X86_EFLAGS_RF from checkpoint time,
>> +	 * as it's irrelevant for the current process, (2) preserve
>> +	 * the existing X86_EFLAGS_RF of this process as it it may be
>> +	 * used by kvm/kgdb. Disable preemption to protect test and
>> +	 * change of our eflags.
>> +	 */
>> +	preempt_disable();
>> +	eflags &= ~X86_EFLAGS_RF;
>> +	eflags |= (regs->flags & X86_EFLAGS_RF);
>> +	regs->flags = eflags;
>> +	preempt_enable();
>> +}
>> +
>>  static int check_tls(struct desc_struct *desc)
>>  {
>>  	if (!desc->a && !desc->b)
>> @@ -433,7 +469,7 @@ static int load_cpu_regs(struct ckpt_hdr_cpu *h, struct task_struct *t)
>>  	regs->orig_ax = h->orig_ax;
>>  	regs->ip = h->ip;
>>  
>> -	regs->flags = h->flags;
>> +	restore_eflags(regs, h->flags);
>>  	regs->sp = h->sp;
>>  
>>  	regs->ds = decode_segment(h->ds);
>> -- 
>> 1.6.0.4
>>
>> _______________________________________________
>> Containers mailing list
>> Containers at lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list