[Devel] [PATCH RHEL7 COMMIT] ms/x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro

Konstantin Khorenko khorenko at virtuozzo.com
Wed Nov 18 01:23:12 PST 2015


The commit is pushed to "branch-rh7-3.10.0-229.7.2.vz7.9.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-229.7.2.vz7.9.10
------>
commit bec1cac9c80e36db4a93412b1fcbccdb42956938
Author: Andrey Ryabinin <aryabinin at virtuozzo.com>
Date:   Wed Nov 18 13:23:12 2015 +0400

    ms/x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro
    
    Patchset description:
    
    Fix get_wchan() & Silence KASAN warnings in it.
    
    First 4 patches are preps,
    Patch 5 fixes get_wchan()
    6 - cleanup
    7,8 - shut up false positives
    
    0001-x86-asm-entry-Create-and-use-a-TOP_OF_KERNEL_STACK_P.patch
    0002-kernel-Provide-READ_ONCE-and-ASSIGN_ONCE.patch
    0003-kernel-make-READ_ONCE-valid-on-const-arguments.patch
    0004-locking-Remove-atomicy-checks-from-READ-WRITE-_ONCE.patch
    
    0005-x86-process-Add-proper-bound-checks-in-64bit-get_wch.patch
    0006-x86-process-Unify-32bit-and-64bit-implementations-of.patch
    0007-compiler-atomics-kasan-Provide-READ_ONCE_NOCHECK.patch
    0008-x86-mm-kasan-Silence-KASAN-warnings-in-get_wchan.patch
    
    Andrey Ryabinin (2):
      compiler, atomics, kasan: Provide READ_ONCE_NOCHECK()
      x86/mm, kasan: Silence KASAN warnings in get_wchan()
    
    Andy Lutomirski (1):
      x86/asm/entry: Create and use a 'TOP_OF_KERNEL_STACK_PADDING' macro
    
    Christian Borntraeger (1):
      kernel: Provide READ_ONCE and ASSIGN_ONCE
    
    Linus Torvalds (1):
      kernel: make READ_ONCE() valid on const arguments
    
    Peter Zijlstra (1):
      locking: Remove atomicy checks from {READ,WRITE}_ONCE
    
    Thomas Gleixner (2):
      x86/process: Add proper bound checks in 64bit get_wchan()
      x86/process: Unify 32bit and 64bit implementations of get_wchan()
    
    ====================================================================
    This patch description:
    
    From: Andy Lutomirski <luto at amacapital.net>
    
    x86_32, unlike x86_64, pads the top of the kernel stack, because the
    hardware stack frame formats are variable in size.
    
    Document this padding and give it a name.
    
    This should make no change whatsoever to the compiled kernel
    image. It also doesn't fix any of the current bugs in this area.
    
    Signed-off-by: Andy Lutomirski <luto at amacapital.net>
    Acked-by: Denys Vlasenko <dvlasenk at redhat.com>
    Cc: Borislav Petkov <bp at alien8.de>
    Cc: H. Peter Anvin <hpa at zytor.com>
    Cc: Linus Torvalds <torvalds at linux-foundation.org>
    Cc: Oleg Nesterov <oleg at redhat.com>
    Cc: Thomas Gleixner <tglx at linutronix.de>
    Link: http://lkml.kernel.org/r/02bf2f54b8dcb76a62a142b6dfe07d4ef7fc582e.1426009661.git.luto@amacapital.net
    [ Fixed small details, such as a missed magic constant in entry_32.S pointed out by Denys Vlasenko. ]
    Signed-off-by: Ingo Molnar <mingo at kernel.org>
    
    (cherry picked from commit 3ee4298f440c81638cbb5ec06f2497fb7a9a9eb4)
    Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
    
    Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
---
 arch/x86/include/asm/processor.h   |  3 ++-
 arch/x86/include/asm/thread_info.h | 27 +++++++++++++++++++++++++++
 arch/x86/kernel/entry_32.S         |  2 +-
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index e3b63b9..873916d 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -878,7 +878,8 @@ extern unsigned long thread_saved_pc(struct task_struct *tsk);
 #define task_pt_regs(task)                                             \
 ({                                                                     \
        struct pt_regs *__regs__;                                       \
-       __regs__ = (struct pt_regs *)(KSTK_TOP(task_stack_page(task))-8); \
+       __regs__ = (struct pt_regs *)(KSTK_TOP(task_stack_page(task)) - \
+				     TOP_OF_KERNEL_STACK_PADDING);     \
        __regs__ - 1;                                                   \
 })
 
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index f9513ef..4ffe7b9 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -12,6 +12,33 @@
 #include <asm/types.h>
 
 /*
+ * TOP_OF_KERNEL_STACK_PADDING is a number of unused bytes that we
+ * reserve at the top of the kernel stack.  We do it because of a nasty
+ * 32-bit corner case.  On x86_32, the hardware stack frame is
+ * variable-length.  Except for vm86 mode, struct pt_regs assumes a
+ * maximum-length frame.  If we enter from CPL 0, the top 8 bytes of
+ * pt_regs don't actually exist.  Ordinarily this doesn't matter, but it
+ * does in at least one case:
+ *
+ * If we take an NMI early enough in SYSENTER, then we can end up with
+ * pt_regs that extends above sp0.  On the way out, in the espfix code,
+ * we can read the saved SS value, but that value will be above sp0.
+ * Without this offset, that can result in a page fault.  (We are
+ * careful that, in this case, the value we read doesn't matter.)
+ *
+ * In vm86 mode, the hardware frame is much longer still, but we neither
+ * access the extra members from NMI context, nor do we write such a
+ * frame at sp0 at all.
+ *
+ * x86_64 has a fixed-length stack frame.
+ */
+#ifdef CONFIG_X86_32
+# define TOP_OF_KERNEL_STACK_PADDING 8
+#else
+# define TOP_OF_KERNEL_STACK_PADDING 0
+#endif
+
+/*
  * low level task data that entry.S needs immediate access to
  * - this struct should fit entirely inside of one cache line
  * - this struct shares the supervisor stack pages
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index e497ebb..acc9bb9 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -409,7 +409,7 @@ sysenter_past_esp:
 	 * A tiny bit of offset fixup is necessary - 4*4 means the 4 words
 	 * pushed above; +8 corresponds to copy_thread's esp0 setting.
 	 */
-	pushl_cfi ((TI_sysenter_return)-THREAD_SIZE+8+4*4)(%esp)
+	pushl_cfi ((TI_sysenter_return)-THREAD_SIZE+TOP_OF_KERNEL_STACK_PADDING+4*4)(%esp)
 	CFI_REL_OFFSET eip, 0
 
 	pushl_cfi %eax


More information about the Devel mailing list