aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2015-07-29mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()Oleg Nesterov
Add the additional "vm_flags_t vm_flags" argument to do_mmap_pgoff(), rename it to do_mmap(), and re-introduce do_mmap_pgoff() as a simple wrapper on top of do_mmap(). Perhaps we should update the callers of do_mmap_pgoff() and kill it later. This way mpx_mmap() can simply call do_mmap(vm_flags => VM_MPX) and do not play with vm internals. After this change mmap_region() has a single user outside of mmap.c, arch/tile/mm/elf.c:arch_setup_additional_pages(). It would be nice to change arch/tile/ and unexport mmap_region(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm: mark most vm_operations_struct constKirill A. Shutemov
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct structs should be constant. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29ARM: mm: do not use virt_to_idmap() for NOMMU systemsVitaly Andrianov
The "ARM: mm: Introduce virt_to_idmap() with an arch hook" defines arch_virt_to_idmap hook in arch/arm/mm/idmap.c. That breaks systems w/o MMU because that file is not built for them. This patch fixes this bug. Signed-off-by: Vitaly Andrianov <vitalya@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29Revert "kernel/watchdog: move NMI function header declarations from ↵Stephen Rothwell
watchdog.h to nmi.h" This reverts commit 7e6e3bfa2820257d3b722a69bedded0917d543de.
2015-07-29Merge branch 'akpm-current/current'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'clk/clk-next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'pinctrl/for-next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'tty/tty-next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'xen-tip/linux-next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'kvms390/next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'kvm-arm/next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'kvm/linux-next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'rcu/rcu/next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'clockevents/clockevents/next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'tip/auto-latest'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'spi/for-next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'security/next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'drm/drm-next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'crypto/master'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'net-next/master'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'slave-dma/next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'pm/linux-next'Stephen Rothwell
2015-07-29Merge remote-tracking branch 'kbuild/for-next'Stephen Rothwell
2015-07-29kdump, vmcoreinfo: report actual value of phys_baseHATAYAMA Daisuke
Currently, VMCOREINFO note information reports the virtual address of phys_base that is assigned to symbol phys_base. But this doesn't make sense because to refer to phys_base, it's necessary to get the value of phys_base itself we are now about to refer to. Userland tools related to kdump such as makedumpfile and crash utility so far have made some efforts to calculate phys_base on crash dump formats generated by mechanisms running outside Linux kernel, such as virtual machine hypervisor such as qemu dump, which ordinary users use via virsh dump, or ones implemented on vendor specific firmware. That is, find a kernel data whose virtual and physical addresses are available via its note information and calculate phys_base from it. However, such data structure is not the one prepared for phys_base purpose. There's no guarantee that other crash dump mechanisms include such information that can be used to calculate phys_base similarly. To get VMCOREINFO in vmcore, it's easy to use strings and grep commands like this; VMCOREINFO consists of simple string: $ strings vmcore-3.10.0-121.el7.x86_64 | grep -E ".*VMCOREINFO.*" -A 100 VMCOREINFO OSRELEASE=3.10.0-121.el7.x86_64 PAGESIZE=4096 ... This is also useful to get value of phys_base in kdump 2nd kernel contained in vmcore using the above-mentioned external crash dump mechanism; kdump 2nd kernel is an inherently relocated kernel. This commit doesn't remove VMCOREINFO_SYMBOL(phys_base) line because makedumpfile refers to it and if removing it, old versions makedumpfile doesn't work well. Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Dave Anderson <anderson@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29kexec: split kexec_load syscall from kexec core codedyoung@redhat.com
There are two kexec load syscalls, kexec_load another and kexec_file_load. kexec_file_load has been splited as kernel/kexec_file.c. In this patch I split kexec_load syscall code to kernel/kexec.c. And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and use kexec_file_load only, or vice verse. The original requirement is from Ted Ts'o, he want kexec kernel signature being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use kexec_load syscall can bypass the checking. Vivek Goyal proposed to create a common kconfig option so user can compile in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects KEXEC_CORE so that old config files still work. Because there's general code need CONFIG_KEXEC_CORE, so I updated all the architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects KEXEC_CORE in arch Kconfig. Also updated general kernel code with to kexec_load syscall. Signed-off-by: Dave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm: define MADV_FREE for some archesMinchan Kim
Most architectures use asm-generic, but alpha, mips, parisc, xtensa need their own definitions. This patch defines MADV_FREE for them so it should fix build break for their architectures. Maybe, I should split and feed piecies to arch maintainers but included here for mmotm convenience. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29arm64: add pmd_[dirty|mkclean] for THPMinchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29arm: add pmd_mkclean for THPMinchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29powerpc: add pmd_[dirty|mkclean] for THPMinchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29sparc-add-pmd_-for-thp-fixAndrew Morton
mm-fix-huge-zero-page-accounting-in-smaps-report.patch already added pmd-dirty() Cc: Minchan Kim <minchan@kernel.org> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29sparc: add pmd_[dirty|mkclean] for THPMinchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29x86-add-pmd_-for-thp-fixAndrew Morton
mm-fix-huge-zero-page-accounting-in-smaps-report.patch also added pmd_dirty() Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29x86: add pmd_[dirty|mkclean] for THPMinchan Kim
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm-srcu-ify-shrinkers-fix-fixAndrew Morton
all architectures Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm-srcu-ify-shrinkers-fixDavidlohr Bueso
temp Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mem-hotplug: handle node hole when initializing numa_meminfo.Tang Chen
When parsing SRAT, all memory ranges are added into numa_meminfo. In numa_init(), before entering numa_cleanup_meminfo(), all possible memory ranges are in numa_meminfo. And numa_cleanup_meminfo() removes all ranges over max_pfn or empty. But, this only works if the nodes are continuous. Let's have a look at the following example: We have an SRAT like this: SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff] SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff] SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff] SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug On boot, only node 0,1,2,3 exist. And the numa_meminfo will look like this: numa_meminfo.nr_blks = 9 1. on node 0: [0, 60000000] 2. on node 0: [100000000, 20000000000] 3. on node 1: [20000000000, 40000000000] 4. on node 4: [40000000000, 60000000000] 5. on node 5: [60000000000, 80000000000] 6. on node 2: [80000000000, a0000000000] 7. on node 3: [a0000000000, a0800000000] 8. on node 6: [c0000000000, a0800000000] 9. on node 7: [e0000000000, a0800000000] And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the end address is over max_pfn, which is a0800000000. But 4 and 5 are not removed because their end addresses are less then max_pfn. But in fact, node 4 and 5 don't exist. In a word, numa_cleanup_meminfo() is not able to handle holes between nodes. Since memory ranges in node 4 and 5 are in numa_meminfo, in numa_register_memblks(), node 4 and 5 will be mistakenly set to online. If you run lscpu, it will show: NUMA node0 CPU(s): 0-14,128-142 NUMA node1 CPU(s): 15-29,143-157 NUMA node2 CPU(s): NUMA node3 CPU(s): NUMA node4 CPU(s): 62-76,190-204 NUMA node5 CPU(s): 78-92,206-220 In this patch, we use memblock_overlaps_region() to check if ranges in numa_meminfo overlap with ranges in memory_block. Since memory_block contains all available memory at boot time, if they overlap, it means the ranges exist. If not, then remove them from numa_meminfo. After this patch, lscpu will show: NUMA node0 CPU(s): 0-14,128-142 NUMA node1 CPU(s): 15-29,143-157 NUMA node4 CPU(s): 62-76,190-204 NUMA node5 CPU(s): 78-92,206-220 Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29genalloc-add-name-arg-to-gen_pool_get-and-devm_gen_pool_create-v2Vladimir Zapolskiy
* Added the same change in gen_pool_get() argument list to account a recently added client in arm/mach-socfpga/pm.c Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()Vladimir Zapolskiy
This change modifies gen_pool_get() and devm_gen_pool_create() client interfaces adding one more argument "name" of a gen_pool object. Due to implementation gen_pool_get() is capable to retrieve only one gen_pool associated with a device even if multiple gen_pools are created, fortunately right at the moment it is sufficient for the clients, hence provide NULL as a valid argument on both producer devm_gen_pool_create() and consumer gen_pool_get() sides. Because only one created gen_pool per device is addressable, explicitly add a restriction to devm_gen_pool_create() to create only one gen_pool per device, this implies two possible error codes returned by the function, account it on client side (only misc/sram). This completes client side changes related to genalloc updates. Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm: send one IPI per CPU to TLB flush all entries after unmapping pagesMel Gorman
An IPI is sent to flush remote TLBs when a page is unmapped that was potentially accesssed by other CPUs. There are many circumstances where this happens but the obvious one is kswapd reclaiming pages belonging to a running process as kswapd and the task are likely running on separate CPUs. On small machines, this is not a significant problem but as machine gets larger with more cores and more memory, the cost of these IPIs can be high. This patch uses a simple structure that tracks CPUs that potentially have TLB entries for pages being unmapped. When the unmapping is complete, the full TLB is flushed on the assumption that a refill cost is lower than flushing individual entries. Architectures wishing to do this must give the following guarantee. If a clean page is unmapped and not immediately flushed, the architecture must guarantee that a write to that linear address from a CPU with a cached TLB entry will trap a page fault. This is essentially what the kernel already depends on but the window is much larger with this patch applied and is worth highlighting. The architecture should consider whether the cost of the full TLB flush is higher than sending an IPI to flush each individual entry. An additional architecture helper called flush_tlb_local is required. It's a trivial wrapper with some accounting in the x86 case. The impact of this patch depends on the workload as measuring any benefit requires both mapped pages co-located on the LRU and memory pressure. The case with the biggest impact is multiple processes reading mapped pages taken from the vm-scalability test suite. The test case uses NR_CPU readers of mapped files that consume 10*RAM. Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 159.62 ( 0.00%) 120.68 ( 24.40%) Ops lru-file-mmap-read-time_range 30.59 ( 0.00%) 2.80 ( 90.85%) Ops lru-file-mmap-read-time_stddv 6.70 ( 0.00%) 0.64 ( 90.38%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 581.00 611.43 System 5804.93 4111.76 Elapsed 161.03 122.12 This is showing that the readers completed 24.40% faster with 29% less system CPU time. From vmstats, it is known that the vanilla kernel was interrupted roughly 900K times per second during the steady phase of the test and the patched kernel was interrupts 180K times per second. The impact is lower on a single socket machine. 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 Ops lru-file-mmap-read-elapsed 25.33 ( 0.00%) 20.38 ( 19.54%) Ops lru-file-mmap-read-time_range 0.91 ( 0.00%) 1.44 (-58.24%) Ops lru-file-mmap-read-time_stddv 0.28 ( 0.00%) 0.47 (-65.34%) 4.2.0-rc1 4.2.0-rc1 vanilla flushfull-v7 User 58.09 57.64 System 111.82 76.56 Elapsed 27.29 22.55 It's still a noticeable improvement with vmstat showing interrupts went from roughly 500K per second to 45K per second. The patch will have no impact on workloads with no memory pressure or have relatively few mapped pages. It will have an unpredictable impact on the workload running on the CPU being flushed as it'll depend on how many TLB entries need to be refilled and how long that takes. Worst case, the TLB will be completely cleared of active entries when the target PFNs were not resident at all. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29x86, mm: trace when an IPI is about to be sentMel Gorman
When unmapping pages it is necessary to flush the TLB. If that page was accessed by another CPU then an IPI is used to flush the remote CPU. That is a lot of IPIs if kswapd is scanning and unmapping >100K pages per second. There already is a window between when a page is unmapped and when it is TLB flushed. This series increases the window so multiple pages can be flushed using a single IPI. This should be safe or the kernel is hosed already. Patch 1 simply made the rest of the series easier to write as ftrace could identify all the senders of TLB flush IPIS. Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI to flush the entire TLB. Patch 3 tracks when there potentially are writable TLB entries that need to be batched differently Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes The performance impact is documented in the changelogs but in the optimistic case on a 4-socket machine the full series reduces interrupts from 900K interrupts/second to 60K interrupts/second. This patch (of 4): It is easy to trace when an IPI is received to flush a TLB but harder to detect what event sent it. This patch makes it easy to identify the source of IPIs being transmitted for TLB flushes on x86. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm-mmap-add-mmap-flag-to-request-vm_lockonfault-v4Eric B Munson
Add missing MAP_LOCKONFAULT to tile Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm: mmap: add mmap flag to request VM_LOCKONFAULTEric B Munson
The cost of faulting in all memory to be locked can be very high when working with large mappings. If only portions of the mapping will be used this can incur a high penalty for locking. Now that we have the new VMA flag for the locked but not present state, expose it as an mmap option like MAP_LOCKED -> VM_LOCKED. Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm: mlock: introduce VM_LOCKONFAULT and add mlock flags to enable itEric B Munson
The cost of faulting in all memory to be locked can be very high when working with large mappings. If only portions of the mapping will be used this can incur a high penalty for locking. For the example of a large file, this is the usage pattern for a large statical language model (probably applies to other statical or graphical models as well). For the security example, any application transacting in data that cannot be swapped out (credit card data, medical records, etc). This patch introduces the ability to request that pages are not pre-faulted, but are placed on the unevictable LRU when they are finally faulted in. This can be done area at a time via the mlock2(MLOCK_ONFAULT) or the mlockall(MCL_ONFAULT) system calls. These calls can be undone via munlock2(MLOCK_ONFAULT) or munlockall2(MCL_ONFAULT). Applying the VM_LOCKONFAULT flag to a mapping with pages that are already present required the addition of a function in gup.c to pin all pages which are present in an address range. It borrows heavily from __mm_populate(). To keep accounting checks out of the page fault path, users are billed for the entire mapping lock as if MLOCK_LOCKED was used. Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm-mlock-add-new-mlock-munlock-and-munlockall-system-calls-fix-2Heiko Carstens
can we just remove the s390 bits which cause the breakage? I will wire up the syscalls as soon as the patch set gets merged. Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm: fix up new locking syscallsMark Brown
Extend the arm64 compat syscall table for the new syscalls to fix build errors introduced by "mm: mlock: add new mlock, munlock, and munlockall system calls": ../arch/arm64/include/asm/unistd32.h:801:1: error: array index in initializer exceeds array bounds ../arch/arm64/include/asm/unistd32.h:801:1: error: (near initialization for 'compat_sys_call_table') ../arch/arm64/include/asm/unistd32.h:803:1: error: array index in initializer exceeds array bounds ../arch/arm64/include/asm/unistd32.h:803:1: error: (near initialization for 'compat_sys_call_table') ../arch/arm64/include/asm/unistd32.h:805:1: error: array index in initializer exceeds array bounds ../arch/arm64/include/asm/unistd32.h:805:1: error: (near initialization for 'compat_sys_call_table') Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Eric B Munson <emunson@akamai.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29mm: mlock: Add new mlock, munlock, and munlockall system callsEric B Munson
With the refactored mlock code, introduce new system calls for mlock, munlock, and munlockall. The new calls will allow the user to specify what lock states are being added or cleared. mlock2 and munlock2 are trivial at the moment, but a follow on patch will add a new mlock state making them useful. munlock2 addresses a limitation of the current implementation. If a user calls mlockall(MCL_CURRENT | MCL_FUTURE) and then later decides that MCL_FUTURE should be removed, they would have to call munlockall() followed by mlockall(MCL_CURRENT) which could potentially be very expensive. The new munlockall2 system call allows a user to simply clear the MCL_FUTURE flag. Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Guenter Roeck <linux@roeck-us.net> Cc: linux-alpha@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: adi-buildroot-devel@lists.sourceforge.net Cc: linux-cris-kernel@axis.com Cc: linux-ia64@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@linux-mips.org Cc: linux-am33-list@redhat.com Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-xtensa@linux-xtensa.org Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org
2015-07-29mm: mlock: add new mlock, munlock, and munlockall system callsEric B Munson
With the refactored mlock code, introduce new system calls for mlock, munlock, and munlockall. The new calls will allow the user to specify what lock states are being added or cleared. mlock2 and munlock2 are trivial at the moment, but a follow on patch will add a new mlock state making them useful. munlock2 addresses a limitation of the current implementation. If a user calls mlockall(MCL_CURRENT | MCL_FUTURE) and then later decides that MCL_FUTURE should be removed, they would have to call munlockall() followed by mlockall(MCL_CURRENT) which could potentially be very expensive. The new munlockall2 system call allows a user to simply clear the MCL_FUTURE flag. Signed-off-by: Eric B Munson <emunson@akamai.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29userfaultfd: activate syscall fixStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29userfaultfd: activate syscallAndrea Arcangeli
This activates the userfaultfd syscall. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2015-07-29kernel/watchdog: move NMI function header declarations from watchdog.h to nmi.hGuenter Roeck
The kernel's NMI watchdog has nothing to do with the watchdog subsystem. Its header declarations should be in linux/nmi.h, not linux/watchdog.h. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>