aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/vdso
AgeCommit message (Collapse)Author
2015-03-26x86/vdso: Fix the build on GCC5Jiri Slaby
commit e893286918d2cde3a94850d8f7101cd1039e0c62 upstream. On gcc5 the kernel does not link: ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670. Because prior GCC versions always emitted NOPs on ALIGN directives, but gcc5 started omitting them. .LSTARTFDEDLSI1 says: /* HACK: The dwarf2 unwind routines will subtract 1 from the return address to get an address in the middle of the presumed call instruction. Since we didn't get here via a call, we need to include the nop before the real start to make up for it. */ .long .LSTART_sigreturn-1-. /* PC-relative start address */ But commit 69d0627a7f6e ("x86 vDSO: reorder vdso32 code") from 2.6.25 replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before __kernel_sigreturn. Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN". So fix this by adding to that point at least a single NOP and make the function ALIGN possibly with more NOPs then. Kudos for reporting and diagnosing should go to Richard. Reported-by: Richard Biener <rguenther@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.cz Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-01-16x86_64, vdso: Fix the vdso address randomization algorithmAndy Lutomirski
commit 394f56fe480140877304d342dec46d50dc823d46 upstream. The theory behind vdso randomization is that it's mapped at a random offset above the top of the stack. To avoid wasting a page of memory for an extra page table, the vdso isn't supposed to extend past the lowest PMD into which it can fit. Other than that, the address should be a uniformly distributed address that meets all of the alignment requirements. The current algorithm is buggy: the vdso has about a 50% probability of being at the very end of a PMD. The current algorithm also has a decent chance of failing outright due to incorrect handling of the case where the top of the stack is near the top of its PMD. This fixes the implementation. The paxtest estimate of vdso "randomisation" improves from 11 bits to 18 bits. (Disclaimer: I don't know what the paxtest code is actually calculating.) It's worth noting that this algorithm is inherently biased: the vdso is more likely to end up near the end of its PMD than near the beginning. Ideally we would either nix the PMD sharing requirement or jointly randomize the vdso and the stack to reduce the bias. In the mean time, this is a considerable improvement with basically no risk of compatibility issues, since the allowed outputs of the algorithm are unchanged. As an easy test, doing this: for i in `seq 10000` do grep -P vdso /proc/self/maps |cut -d- -f1 done |sort |uniq -d used to produce lots of output (1445 lines on my most recent run). A tiny subset looks like this: 7fffdfffe000 7fffe01fe000 7fffe05fe000 7fffe07fe000 7fffe09fe000 7fffe0bfe000 7fffe0dfe000 Note the suspicious fe000 endings. With the fix, I get a much more palatable 76 repeated addresses. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-08-07Revert "x86-64, modify_ldt: Make support for 16-bit segments a runtime option"H. Peter Anvin
commit 7ed6fb9b5a5510e4ef78ab27419184741169978a upstream. This reverts commit fa81511bb0bbb2b1aace3695ce869da9762624ff in preparation of merging in the proper fix (espfix64). Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07x86-64, modify_ldt: Make support for 16-bit segments a runtime optionLinus Torvalds
commit fa81511bb0bbb2b1aace3695ce869da9762624ff upstream. Checkin: b3b42ac2cbae x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels disabled 16-bit segments on 64-bit kernels due to an information leak. However, it does seem that people are genuinely using Wine to run old 16-bit Windows programs on Linux. A proper fix for this ("espfix64") is coming in the upcoming merge window, but as a temporary fix, create a sysctl to allow the administrator to re-enable support for 16-bit segments. It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If you hit this issue and care about your old Windows program more than you care about a kernel stack address information leak, you can do echo 1 > /proc/sys/abi/ldt16 as root (add it to your startup scripts), and you should be ok. The sysctl table is only added if you have COMPAT support enabled on x86-64, but I assume anybody who runs old windows binaries very much does that ;) Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-01-20Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu, amd: Fix a shadowed variable situation um, x86: Fix vDSO build x86: Delete non-required instances of include <linux/init.h> x86, realmode: Pointer walk cleanups, pull out invariant use of __pa() x86/traps: Clean up error exception handler definitions
2014-01-12seqlock: Use raw_ prefix instead of _no_lockdepJohn Stultz
Linus disliked the _no_lockdep() naming, so instead use the more-consistent raw_* prefix to the non-lockdep enabled seqcount methods. This also adds raw_ methods for the write operations as well, which will be utilized in a following patch. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Krzysztof Hałasa <khalasa@piap.pl> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Willy Tarreau <w@1wt.eu> Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-06x86: Delete non-required instances of include <linux/init.h>Paul Gortmaker
None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. [ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2013-11-06seqcount: Add lockdep functionality to seqcount/seqlock structuresJohn Stultz
Currently seqlocks and seqcounts don't support lockdep. After running across a seqcount related deadlock in the timekeeping code, I used a less-refined and more focused variant of this patch to narrow down the cause of the issue. This is a first-pass attempt to properly enable lockdep functionality on seqlocks and seqcounts. Since seqcounts are used in the vdso gettimeofday code, I've provided non-lockdep accessors for those needs. I've also handled one case where there were nested seqlock writers and there may be more edge cases. Comments and feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-3-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-18remove sched notifier for cross-cpu migrationsMarcelo Tosatti
Linux as a guest on KVM hypervisor, the only user of the pvclock vsyscall interface, does not require notification on task migration because: 1. cpu ID number maps 1:1 to per-CPU pvclock time info. 2. per-CPU pvclock time info is updated if the underlying CPU changes. 3. that version is increased whenever underlying CPU changes. Which is sufficient to guarantee nanoseconds counter is calculated properly. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-06-19x86/vdso: Convert use of typedef ctl_table to struct ctl_tableJoe Perches
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/a756fa0060e8eea25e8c1863c2764e86c2823617.1371177118.git.joe@perches.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-15timers/x86/hpet: Use HPET_COUNTER to specify the hpet counter in vread_hpet()Satoru Takeuchi
vread_hpet() uses "0xf0" as the offset of the hpet counter. To clarify the meaning of this code, it should use symbolic name, HPET_COUNTER, instead. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-13Merge tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Marcelo Tosatti: "Considerable KVM/PPC work, x86 kvmclock vsyscall support, IA32_TSC_ADJUST MSR emulation, amongst others." Fix up trivial conflict in kernel/sched/core.c due to cross-cpu migration notifier added next to rq migration call-back. * tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits) KVM: emulator: fix real mode segment checks in address linearization VMX: remove unneeded enable_unrestricted_guest check KVM: VMX: fix DPL during entry to protected mode x86/kexec: crash_vmclear_local_vmcss needs __rcu kvm: Fix irqfd resampler list walk KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary KVM: MMU: optimize for set_spte KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation KVM: PPC: bookehv: Add guest computation mode for irq delivery KVM: PPC: Make EPCR a valid field for booke64 and bookehv KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation KVM: PPC: e500: Add emulation helper for getting instruction ea KVM: PPC: bookehv64: Add support for interrupt handling KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler KVM: PPC: booke: Fix get_tb() compile error on 64-bit KVM: PPC: e500: Silence bogus GCC warning in tlb code ...
2012-12-11mm: use vm_unmapped_area() on x86_64 architectureMichel Lespinasse
Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-27x86: vdso: pvclock gettime supportMarcelo Tosatti
Improve performance of time system calls when using Linux pvclock, by reading time info from fixmap visible copy of pvclock data. Originally from Jeremy Fitzhardinge. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-09-24time: Convert x86_64 to using new update_vsyscallJohn Stultz
Switch x86_64 to using sub-ns precise vsyscall Cc: Tony Luck <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Turner <pjt@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-06-07x86, cpu: Rename checking_wrmsrl() to wrmsrl_safe()H. Peter Anvin
Rename checking_wrmsrl() to wrmsrl_safe(), to match the naming convention used by all the other MSR access functions/macros. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-03-29Merge branch 'x86-x32-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x32 support for x86-64 from Ingo Molnar: "This tree introduces the X32 binary format and execution mode for x86: 32-bit data space binaries using 64-bit instructions and 64-bit kernel syscalls. This allows applications whose working set fits into a 32 bits address space to make use of 64-bit instructions while using a 32-bit address space with shorter pointers, more compressed data structures, etc." Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c} * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) x32: Fix alignment fail in struct compat_siginfo x32: Fix stupid ia32/x32 inversion in the siginfo format x32: Add ptrace for x32 x32: Switch to a 64-bit clock_t x32: Provide separate is_ia32_task() and is_x32_task() predicates x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls x86/x32: Fix the binutils auto-detect x32: Warn and disable rather than error if binutils too old x32: Only clear TIF_X32 flag once x32: Make sure TS_COMPAT is cleared for x32 tasks fs: Remove missed ->fds_bits from cessation use of fd_set structs internally fs: Fix close_on_exec pointer in alloc_fdtable x32: Drop non-__vdso weak symbols from the x32 VDSO x32: Fix coding style violations in the x32 VDSO code x32: Add x32 VDSO support x32: Allow x32 to be configured x32: If configured, add x32 system calls to system call tables x32: Handle process creation x32: Signal-related system calls x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> ...
2012-03-29Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner. * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ia64: vsyscall: Add missing paranthesis alarmtimer: Don't call rtc_timer_init() when CONFIG_RTC_CLASS=n x86: vdso: Put declaration before code x86-64: Inline vdso clock_gettime helpers x86-64: Simplify and optimize vdso clock_gettime monotonic variants kernel-time: fix s/then/than/ spelling errors time: remove no_sync_cmos_clock time: Avoid scary backtraces when warning of > 11% adj alarmtimer: Make sure we initialize the rtctimer ntp: Fix leap-second hrtimer livelock x86, tsc: Skip refined tsc calibration on systems with reliable TSC rtc: Provide flag for rtc devices that don't support UIE ia64: vsyscall: Use seqcount instead of seqlock x86: vdso: Use seqcount instead of seqlock x86: vdso: Remove bogus locking in update_vsyscall_tz() time: Remove bogus comments time: Fix change_clocksource locking time: x86: Fix race switching from vsyscall to non-vsyscall clock
2012-03-23coredump: remove VM_ALWAYSDUMP flagJason Baron
The motivation for this patchset was that I was looking at a way for a qemu-kvm process, to exclude the guest memory from its core dump, which can be quite large. There are already a number of filter flags in /proc/<pid>/coredump_filter, however, these allow one to specify 'types' of kernel memory, not specific address ranges (which is needed in this case). Since there are no more vma flags available, the first patch eliminates the need for the 'VM_ALWAYSDUMP' flag. The flag is used internally by the kernel to mark vdso and vsyscall pages. However, it is simple enough to check if a vma covers a vdso or vsyscall page without the need for this flag. The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new 'VM_NODUMP' flag, which can be set by userspace using new madvise flags: 'MADV_DONTDUMP', and unset via 'MADV_DODUMP'. The core dump filters continue to work the same as before unless 'MADV_DONTDUMP' is set on the region. The qemu code which implements this features is at: http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch In my testing the qemu core dump shrunk from 383MB -> 13MB with this patch. I also believe that the 'MADV_DONTDUMP' flag might be useful for security sensitive apps, which might want to select which areas are dumped. This patch: The VM_ALWAYSDUMP flag is currently used by the coredump code to indicate that a vma is part of a vsyscall or vdso section. However, we can determine if a vma is in one these sections by checking it against the gate_vma and checking for a non-NULL return value from arch_vma_name(). Thus, freeing a valuable vma bit. Signed-off-by: Jason Baron <jbaron@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-23x86-64: Inline vdso clock_gettime helpersAndy Lutomirski
This is about a 3% speedup on Sandy Bridge. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-23x86-64: Simplify and optimize vdso clock_gettime monotonic variantsAndy Lutomirski
We used to store the wall-to-monotonic offset and the realtime base. It's faster to precompute the monotonic base. This is about a 3% speedup on Sandy Bridge for CLOCK_MONOTONIC. It's much more impressive for CLOCK_MONOTONIC_COARSE. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-15x86: vdso: Use seqcount instead of seqlockThomas Gleixner
The update of the vdso data happens under xtime_lock, so adding a nested lock is pointless. Just use a seqcount to sync the readers. Reviewed-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-03-15time: x86: Fix race switching from vsyscall to non-vsyscall clockJohn Stultz
When switching from a vsyscall capable to a non-vsyscall capable clocksource, there was a small race, where the last vsyscall gettimeofday before the switch might return a invalid time value using the new non-vsyscall enabled clocksource values after the switch is complete. This is due to the vsyscall code checking the vclock_mode once outside of the seqcount protected section. After it reads the vclock mode, it doesn't re-check that the sampled clock data that is obtained in the seqcount critical section still matches. The fix is to sample vclock_mode inside the protected section, and as long as it isn't VCLOCK_NONE, return the calculated value. If it has changed and is now VCLOCK_NONE, fall back to the syscall gettime calculation. v2: * Cleanup checks as suggested by tglx * Also fix same issue present in gettimeofday path CC: Andy Lutomirski <luto@amacapital.net> CC: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
2012-02-22x32: Drop non-__vdso weak symbols from the x32 VDSOH. Peter Anvin
Drop the legacy weak symbols that don't carry the __vdso prefix from the x32 VDSO. This is a new ABI and we don't need to support that legacy; the actual libc will export the proper symbols. Suggested-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/4F42E171.9080005@mit.edu Cc: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-02-21x32: Fix coding style violations in the x32 VDSO codeH. Peter Anvin
Move the prototype for x32_setup_additional_pages() to a header file, and adjust the coding style to match standard. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: H. J. Lu <hjl.tools@gmail.com>
2012-02-20x32: Add x32 VDSO supportH. J. Lu
Add support for the x32 VDSO. The x32 VDSO takes advantage of the similarity between the x86-64 and the x32 ABIs to contain the same content, only the container is different, as the x32 VDSO obviously is an x32 shared object. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-10-28Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, amd: Include linux/elf.h since we use stuff from asm/elf.h x86: cache_info: Update calculation of AMD L3 cache indices x86: cache_info: Kill the atomic allocation in amd_init_l3_cache() x86: cache_info: Kill the moronic shadow struct x86: cache_info: Remove bogus free of amd_l3_cache data x86, amd: Include elf.h explicitly, prepare the code for the module.h split x86-32, amd: Move va_align definition to unbreak 32-bit build x86, amd: Move BSP code to cpu_dev helper x86: Add a BSP cpu_dev helper x86, amd: Avoid cache aliasing penalties on AMD family 15h
2011-08-23Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, vdso: On system call restart after SYSENTER, use int $0x80 x86, UV: Remove UV delay in starting slave cpus x86, olpc: Wait for last byte of EC command to be accepted
2011-08-23x86-32, vdso: On system call restart after SYSENTER, use int $0x80H. Peter Anvin
When we enter a 32-bit system call via SYSENTER or SYSCALL, we shuffle the arguments to match the int $0x80 calling convention. This was probably a design mistake, but it's what it is now. This causes errors if the system call as to be restarted. For SYSENTER, we have to invoke the instruction from the vdso as the return address is hardcoded. Accordingly, we can simply replace the jump in the vdso with an int $0x80 instruction and use the slower entry point for a post-restart. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFztZ=r5wa0x26KJQxvZOaQq8s2v3u50wCyJcA-Sc4g8gQ@mail.gmail.com Cc: <stable@kernel.org>
2011-08-05x86, amd: Avoid cache aliasing penalties on AMD family 15hBorislav Petkov
This patch provides performance tuning for the "Bulldozer" CPU. With its shared instruction cache there is a chance of generating an excessive number of cache cross-invalidates when running specific workloads on the cores of a compute module. This excessive amount of cross-invalidations can be observed if cache lines backed by shared physical memory alias in bits [14:12] of their virtual addresses, as those bits are used for the index generation. This patch addresses the issue by clearing all the bits in the [14:12] slice of the file mapping's virtual address at generation time, thus forcing those bits the same for all mappings of a single shared library across processes and, in doing so, avoids instruction cache aliases. It also adds the command line option "align_va_addr=(32|64|on|off)" with which virtual address alignment can be enabled for 32-bit or 64-bit x86 individually, or both, or be completely disabled. This change leaves virtual region address allocation on other families and/or vendors unaffected. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-08-04x86-64: Pad vDSO to a page boundaryAndy Lutomirski
This avoids an information leak to userspace. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/a63380a3c58a0506a2f5a18ba1b12dbde1f25e58.1312378163.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-21x86-64, vdso: Do not allocate memory for the vDSOAndy Lutomirski
We can map the vDSO straight from kernel data, saving a few page allocations. As an added bonus, the deleted code contained a memory leak. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/2c4ed5c2c2e93603790229e0c3403ae506ccc0cb.1311277573.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-07-18x86, vdso: Drop now wrong commentBorislav Petkov
Now that 1b3f2a72bbcfdf92e368a44448c45eb639b05b5e is in, it is very important that the below lying comment be removed! :-) Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20110718191054.GA18359@liondog.tnic Acked-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14x86-64: Move vread_tsc and vread_hpet into the vDSOAndy Lutomirski
The vsyscall page now consists entirely of trap instructions. Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13x86-64: Add --no-undefined to vDSO buildAndy Lutomirski
This gives much nicer diagnostics when something goes wrong. It's supported at least as far back as binutils 2.15. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/de0b50920469ff6359c529526e7639fdd36fa83c.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-13x86-64: Allow alternative patching in the vDSOAndy Lutomirski
This code is short enough and different enough from the module loader that it's not worth trying to share anything. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/e73112e4381fff29e31b882c2d0856822edaea53.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-06-05x86-64: Remove kernel.vsyscall64 sysctlAndy Lutomirski
It's unnecessary overhead in code that's supposed to be highly optimized. Removing it allows us to remove one of the two syscall instructions in the vsyscall page. The only sensible use for it is for UML users, and it doesn't fully address inconsistent vsyscall results on UML. The real fix for UML is to stop using vsyscalls entirely. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Jesper Juhl <jj@chaosbits.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: Mikael Pettersson <mikpe@it.uu.se> Cc: Andi Kleen <andi@firstfloor.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Louis Rilling <Louis.Rilling@kerlabs.com> Cc: Valdis.Kletnieks@vt.edu Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/973ae803fe76f712da4b2740e66dccf452d3b1e4.1307292171.git.luto@mit.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-26x86: vdso: Remove unused variableThomas Gleixner
Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@mit.edu>
2011-05-24x86-64: Optimize vDSO time()Andy Lutomirski
This function just reads a 64-bit variable that's updated atomically, so we don't need any locks. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C40e2700f8cda4d511e5910be1e633025d28b36c2.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24x86-64: Add time to vDSOAndy Lutomirski
The only fast implementation of time(2) we expose is through the vsyscall page and we want to get userspace to stop using the vsyscall page. So make it available through the vDSO as well. This is essentially a cut-n-paste job. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3Cbf963bac5207de4b29613f27c42705e4371812a8.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24x86-64: Turn off -pg and turn on -foptimize-sibling-calls for vDSOAndy Lutomirski
The vDSO isn't part of the kernel, so profiling and kernel backtraces don't really matter. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C23087b738c037342abb53f2f07b9bef89ceaeea3.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24x86-64: Vclock_gettime(CLOCK_MONOTONIC) can't ever see nsec < 0Andy Lutomirski
vclock_gettime's do_monotonic helper can't ever generate a negative nsec value, so it doesn't need to check whether it's negative. In the CLOCK_MONOTONIC_COARSE case, ns can't ever exceed 2e9-1, so we can avoid the loop entirely. This saves a single easily-predicted branch. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3Cd6d528d32c7a21618057cfc9005942a0fe5cb54a.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24x86-64: Clean up vdso/kernel shared variablesAndy Lutomirski
Variables that are shared between the vdso and the kernel are currently a bit of a mess. They are each defined with their own magic, they are accessed differently in the kernel, the vsyscall page, and the vdso, and one of them (vsyscall_clock) doesn't even really exist. This changes them all to use a common mechanism. All of them are delcared in vvar.h with a fixed address (validated by the linker script). In the kernel (as before), they look like ordinary read-write variables. In the vsyscall page and the vdso, they are accessed through a new macro VVAR, which gives read-only access. The vdso is now loaded verbatim into memory without any fixups. As a side bonus, access from the vdso is faster because a level of indirection is removed. While we're at it, pack jiffies and vgetcpu_mode into the same cacheline. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Borislav Petkov <bp@amd64.org> Link: http://lkml.kernel.org/r/%3C7357882fbb51fa30491636a7b6528747301b7ee9.1306156808.git.luto%40mit.edu%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-03-23mm: arch: rename in_gate_area_no_task to in_gate_area_no_mmStephen Wilson
Now that gate vma's are referenced with respect to a particular mm and not a particular task it only makes sense to propagate the change to this predicate as well. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-23mm: arch: make in_gate_area take an mm_struct instead of a task_structStephen Wilson
Morally, the question of whether an address lies in a gate vma should be asked with respect to an mm, not a particular task. Moreover, dropping the dependency on task_struct will help make existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-03-23mm: arch: make get_gate_vma take an mm_struct instead of a task_structStephen Wilson
Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-12-13x86, gcc-4.6: Use gcc -m options when building vdsoH. Peter Anvin
The vdso Makefile passes linker-style -m options not to the linker but to gcc. This happens to work with earlier gcc, but fails with gcc 4.6. Pass gcc-style -m options, instead. Note: all currently supported versions of gcc supports -m32, so there is no reason to conditionalize it any more. Reported-by: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-*@git.kernel.org> Cc: <stable@kernel.org>
2010-08-06Merge branches 'x86-cleanups-for-linus', 'x86-vmware-for-linus', ↵Linus Torvalds
'x86-mtrr-for-linus', 'x86-apic-for-linus', 'x86-fpu-for-linus' and 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Clean up arch/x86/kernel/cpu/mtrr/cleanup.c: use ";" not "," to terminate statements * 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, vmware: Preset lpj values when on VMware. * 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mtrr: Use stop machine context to rendezvous all the cpu's * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/apic/es7000_32: Remove unused variable * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Avoid unnecessary __clear_user() and xrstor in signal handling * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, vdso: Unmap vdso pages
2010-08-06Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Ioremap: fix wrong physical address handling in PAT code x86, tlb: Clean up and correct used type x86, iomap: Fix wrong page aligned size calculation in ioremapping code x86, mm: Create symbolic index into address_markers array x86, ioremap: Fix normal ram range check x86, ioremap: Fix incorrect physical address handling in PAE mode x86-64, mm: Initialize VDSO earlier on 64 bits x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages
2010-08-02x86, vdso: Unmap vdso pagesShaohua Li
We mapped vdso pages but never unmapped them and the virtual address is lost after exiting from the function, so unmap vdso pages here. Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <20100802004934.GA2505@sli10-desk.sh.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>