[Devel] [PATCH RHEL7 COMMIT] ms/mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
Konstantin Khorenko
khorenko at virtuozzo.com
Thu Aug 31 17:54:33 MSK 2017
Please consider to prepare a ReadyKernel patch for it.
https://readykernel.com/
--
Best regards,
Konstantin Khorenko,
Virtuozzo Linux Kernel Team
On 08/31/2017 05:51 PM, Konstantin Khorenko wrote:
> The commit is pushed to "branch-rh7-3.10.0-514.26.1.vz7.35.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
> after rh7-3.10.0-514.26.1.vz7.35.5
> ------>
> commit f5b413ea4e53d819c8b4e4a4927fb563bd3ec24f
> Author: Keno Fischer <keno at juliacomputing.com>
> Date: Thu Aug 31 17:51:25 2017 +0300
>
> ms/mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
>
> commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream.
>
> In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from
> __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
> after a COW was resolved to setting the (newly introduced) FOLL_COW
> instead. Simultaneously, the check in gup.c was updated to still allow
> writes with FOLL_FORCE set if FOLL_COW had also been set.
>
> However, a similar check in huge_memory.c was forgotten. As a result,
> remote memory writes to ro regions of memory backed by transparent huge
> pages cause an infinite loop in the kernel (handle_mm_fault sets
> FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
> out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
> true.
>
> While in this state the process is stil SIGKILLable, but little else
> works (e.g. no ptrace attach, no other signals). This is easily
> reproduced with the following code (assuming thp are set to always):
>
> #include <assert.h>
> #include <fcntl.h>
> #include <stdint.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <unistd.h>
>
> #define TEST_SIZE 5 * 1024 * 1024
>
> int main(void) {
> int status;
> pid_t child;
> int fd = open("/proc/self/mem", O_RDWR);
> void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
> MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
> assert(addr != MAP_FAILED);
> pid_t parent_pid = getpid();
> if ((child = fork()) == 0) {
> void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
> MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
> assert(addr2 != MAP_FAILED);
> memset(addr2, 'a', TEST_SIZE);
> pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
> return 0;
> }
> assert(child == waitpid(child, &status, 0));
> assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
> return 0;
> }
>
> Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
> to the update in gup.c in the original commit. The same pattern exists
> in follow_devmap_pmd. However, we should not be able to reach that
> check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
> ever do.
>
> [akpm at linux-foundation.org: coding-style fixes]
> Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.com
> Signed-off-by: Keno Fischer <keno at juliacomputing.com>
> Acked-by: Kirill A. Shutemov <kirill.shutemov at linux.intel.com>
> Cc: Greg Thelen <gthelen at google.com>
> Cc: Nicholas Piggin <npiggin at gmail.com>
> Cc: Willy Tarreau <w at 1wt.eu>
> Cc: Oleg Nesterov <oleg at redhat.com>
> Cc: Kees Cook <keescook at chromium.org>
> Cc: Andy Lutomirski <luto at kernel.org>
> Cc: Michal Hocko <mhocko at suse.com>
> Cc: Hugh Dickins <hughd at google.com>
> Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
> [bwh: Backported to 3.2:
> - Drop change to follow_devmap_pmd()
> - pmd_dirty() is not available; check the page flags as in
> can_follow_write_pte()
> - Adjust context]
> Signed-off-by: Ben Hutchings <ben at decadent.org.uk>
> [mhocko:
> This has been forward ported from the 3.2 stable tree.
> And fixed to return NULL.]
> Reviewed-by: Michal Hocko <mhocko at suse.cz>
> Signed-off-by: Jiri Slaby <jslaby at suse.cz>
> Signed-off-by: Willy Tarreau <w at 1wt.eu>
>
> https://jira.sw.ru/browse/PSBM-70151
>
> Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
> ---
> mm/huge_memory.c | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 477610d..5a07e76 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1317,6 +1317,18 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
> return ret;
> }
>
> +/*
> + * foll_force can write to even unwritable pmd's, but only
> + * after we've gone through a cow cycle and they are dirty.
> + */
> +static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
> + unsigned int flags)
> +{
> + return pmd_write(pmd) ||
> + ((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
> + page && PageAnon(page));
> +}
> +
> struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
> unsigned long addr,
> pmd_t *pmd,
> @@ -1327,9 +1339,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>
> assert_spin_locked(pmd_lockptr(mm, pmd));
>
> - if (flags & FOLL_WRITE && !pmd_write(*pmd))
> - goto out;
> -
> /* Avoid dumping huge zero page */
> if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
> return ERR_PTR(-EFAULT);
> @@ -1340,6 +1349,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>
> page = pmd_page(*pmd);
> VM_BUG_ON_PAGE(!PageHead(page), page);
> +
> + if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> + return NULL;
> +
> if (flags & FOLL_TOUCH) {
> pmd_t _pmd;
> /*
> .
>
More information about the Devel
mailing list