From 942670d0dc41b5fe9b735c31ca9234d80729bf7e Mon Sep 17 00:00:00 2001 From: Wen Congyang Date: Fri, 22 Feb 2013 15:11:47 -0800 Subject: x86/mm/numa: Don't check if node is NUMA_NO_NODE If we aren't debugging per_cpu maps, the cpu's node is stored in per_cpu variable numa_node. If `node' is NUMA_NO_NODE, it means the caller wants to clear the cpu's node. So we should also call set_cpu_numa_node() in this case. Signed-off-by: Wen Congyang Cc: Len Brown Cc: Pavel Machek Cc: "Rafael J. Wysocki" Cc: "H. Peter Anvin" Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar --- arch/x86/mm/numa.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'arch/x86/mm') diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 2d125be1bae..21d02f0d7a2 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -97,8 +97,7 @@ void __cpuinit numa_set_node(int cpu, int node) #endif per_cpu(x86_cpu_to_node_map, cpu) = node; - if (node != NUMA_NO_NODE) - set_cpu_numa_node(cpu, node); + set_cpu_numa_node(cpu, node); } void __cpuinit numa_clear_node(int cpu) -- cgit v1.2.3 From 954f857187033ee3d3704a8206715cf354c38898 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Fri, 22 Feb 2013 15:11:49 -0800 Subject: Revert "x86, mm: Make spurious_fault check explicitly check explicitly check the PRESENT bit" I got a report for a minor regression introduced by commit 027ef6c87853b ("mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP"). So the problem is, pageattr creates kernel pagetables (pte and pmds) that breaks pte_present/pmd_present and the patch above exposed this invariant breakage for pmd_present. The same problem already existed for the pte and pte_present and it was fixed by commit 660a293ea9be709 ("x86, mm: Make spurious_fault check explicitly check the PRESENT bit") (if it wasn't for that commit, it wouldn't even be a regression). That fix avoids the pagefault to use pte_present. I could follow through by stopping using pmd_present/pmd_huge too. However I think it's more robust to fix pageattr and to clear the PSE/GLOBAL bitflags too in addition to the present bitflag. So the kernel page fault can keep using the regular pte_present/pmd_present/pmd_huge. The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are sharing the same bit, and in the pmd case we pretend _PAGE_PSE to be set only in present pmds (to facilitate split_huge_page final tlb flush). Signed-off-by: Andrea Arcangeli Cc: Andi Kleen Cc: Shaohua Li Cc: "H. Peter Anvin" Cc: Mel Gorman Cc: Hugh Dickins Cc: Andrew Morton Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar --- arch/x86/mm/fault.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) (limited to 'arch/x86/mm') diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fb674fd3fc2..2b97525246d 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -939,14 +939,8 @@ spurious_fault(unsigned long error_code, unsigned long address) if (pmd_large(*pmd)) return spurious_fault_check(error_code, (pte_t *) pmd); - /* - * Note: don't use pte_present() here, since it returns true - * if the _PAGE_PROTNONE bit is set. However, this aliases the - * _PAGE_GLOBAL bit, which for kernel pages give false positives - * when CONFIG_DEBUG_PAGEALLOC is used. - */ pte = pte_offset_kernel(pmd, address); - if (!(pte_flags(*pte) & _PAGE_PRESENT)) + if (!pte_present(*pte)) return 0; ret = spurious_fault_check(error_code, pte); -- cgit v1.2.3 From a8aed3e0752b4beb2e37cbed6df69faae88268da Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Fri, 22 Feb 2013 15:11:51 -0800 Subject: x86/mm/pageattr: Prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge Without this patch any kernel code that reads kernel memory in non present kernel pte/pmds (as set by pageattr.c) will crash. With this kernel code: static struct page *crash_page; static unsigned long *crash_address; [..] crash_page = alloc_pages(GFP_KERNEL, 9); crash_address = page_address(crash_page); if (set_memory_np((unsigned long)crash_address, 1)) printk("set_memory_np failure\n"); [..] The kernel will crash if inside the "crash tool" one would try to read the memory at the not present address. crash> p crash_address crash_address = $8 = (long unsigned int *) 0xffff88023c000000 crash> rd 0xffff88023c000000 [ *lockup* ] The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE shares the same bit, and pageattr leaves _PAGE_GLOBAL set on a kernel pte which is then mistaken as _PAGE_PROTNONE (so pte_present returns true by mistake and the kernel fault then gets confused and loops). With THP the same can happen after we taught pmd_present to check _PAGE_PROTNONE and _PAGE_PSE in commit 027ef6c87853b0a9df5317 ("mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP"). THP has the same problem with _PAGE_GLOBAL as the 4k pages, but it also has a problem with _PAGE_PSE, which must be cleared too. After the patch is applied copy_user correctly returns -EFAULT and doesn't lockup anymore. crash> p crash_address crash_address = $9 = (long unsigned int *) 0xffff88023c000000 crash> rd 0xffff88023c000000 rd: read error: kernel virtual address: ffff88023c000000 type: "64-bit KVADDR" Signed-off-by: Andrea Arcangeli Cc: Andi Kleen Cc: Shaohua Li Cc: "H. Peter Anvin" Cc: Mel Gorman Cc: Hugh Dickins Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Ingo Molnar --- arch/x86/mm/pageattr.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 47 insertions(+), 3 deletions(-) (limited to 'arch/x86/mm') diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index a718e0d2350..2713be4bca4 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -444,6 +444,19 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, pgprot_val(req_prot) &= ~pgprot_val(cpa->mask_clr); pgprot_val(req_prot) |= pgprot_val(cpa->mask_set); + /* + * Set the PSE and GLOBAL flags only if the PRESENT flag is + * set otherwise pmd_present/pmd_huge will return true even on + * a non present pmd. The canon_pgprot will clear _PAGE_GLOBAL + * for the ancient hardware that doesn't support it. + */ + if (pgprot_val(new_prot) & _PAGE_PRESENT) + pgprot_val(new_prot) |= _PAGE_PSE | _PAGE_GLOBAL; + else + pgprot_val(new_prot) &= ~(_PAGE_PSE | _PAGE_GLOBAL); + + new_prot = canon_pgprot(new_prot); + /* * old_pte points to the large page base address. So we need * to add the offset of the virtual address: @@ -489,7 +502,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, * The address is aligned and the number of pages * covers the full page. */ - new_pte = pfn_pte(pte_pfn(old_pte), canon_pgprot(new_prot)); + new_pte = pfn_pte(pte_pfn(old_pte), new_prot); __set_pmd_pte(kpte, address, new_pte); cpa->flags |= CPA_FLUSHTLB; do_split = 0; @@ -540,16 +553,35 @@ static int split_large_page(pte_t *kpte, unsigned long address) #ifdef CONFIG_X86_64 if (level == PG_LEVEL_1G) { pfninc = PMD_PAGE_SIZE >> PAGE_SHIFT; - pgprot_val(ref_prot) |= _PAGE_PSE; + /* + * Set the PSE flags only if the PRESENT flag is set + * otherwise pmd_present/pmd_huge will return true + * even on a non present pmd. + */ + if (pgprot_val(ref_prot) & _PAGE_PRESENT) + pgprot_val(ref_prot) |= _PAGE_PSE; + else + pgprot_val(ref_prot) &= ~_PAGE_PSE; } #endif + /* + * Set the GLOBAL flags only if the PRESENT flag is set + * otherwise pmd/pte_present will return true even on a non + * present pmd/pte. The canon_pgprot will clear _PAGE_GLOBAL + * for the ancient hardware that doesn't support it. + */ + if (pgprot_val(ref_prot) & _PAGE_PRESENT) + pgprot_val(ref_prot) |= _PAGE_GLOBAL; + else + pgprot_val(ref_prot) &= ~_PAGE_GLOBAL; + /* * Get the target pfn from the original entry: */ pfn = pte_pfn(*kpte); for (i = 0; i < PTRS_PER_PTE; i++, pfn += pfninc) - set_pte(&pbase[i], pfn_pte(pfn, ref_prot)); + set_pte(&pbase[i], pfn_pte(pfn, canon_pgprot(ref_prot))); if (address >= (unsigned long)__va(0) && address < (unsigned long)__va(max_low_pfn_mapped << PAGE_SHIFT)) @@ -659,6 +691,18 @@ repeat: new_prot = static_protections(new_prot, address, pfn); + /* + * Set the GLOBAL flags only if the PRESENT flag is + * set otherwise pte_present will return true even on + * a non present pte. The canon_pgprot will clear + * _PAGE_GLOBAL for the ancient hardware that doesn't + * support it. + */ + if (pgprot_val(new_prot) & _PAGE_PRESENT) + pgprot_val(new_prot) |= _PAGE_GLOBAL; + else + pgprot_val(new_prot) &= ~_PAGE_GLOBAL; + /* * We need to keep the pfn from the existing PTE, * after all we're only going to change it's attributes -- cgit v1.2.3