eas/kernel.git - [no description]

Age	Commit message (Collapse)	Author
2014-10-15	arm64: enable context tracking	Larry Bassel
	Backport of the following patch to 3.14 LSK: commit 6c81fe7925cc4c42de49e17be21eb86d1173c3a7 Author: Larry Bassel <larry.bassel@linaro.org> Date: Fri May 30 12:34:15 2014 -0700 arm64: enable context tracking Make calls to ct_user_enter when the kernel is exited and ct_user_exit when the kernel is entered (in el0_da, el0_ia, el0_svc, el0_irq and all of the "error" paths). These macros expand to function calls which will only work properly if el0_sync and related code has been rearranged (in a previous patch of this series). The calls to ct_user_exit are made after hw debugging has been enabled (enable_dbg_and_irq). The call to ct_user_enter is made at the beginning of the kernel_exit macro. This patch is based on earlier work by Kevin Hilman. Save/restore optimizations were also done by Kevin. Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Kevin Hilman <khilman@linaro.org> Tested-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Larry Bassel <larry.bassel@linaro.org> Signed-off-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Larry Bassel <larry.bassel@linaro.org>
2014-10-15	arm64: adjust el0_sync so that a function can be called	Larry Bassel
	Backport of the following patch to 3.14 LSK: commit 6ab6463aeb5fbc75fa3227befb508fc33b34dbf1 Author: Larry Bassel <larry.bassel@linaro.org> Date: Fri May 30 20:34:14 2014 +0100 arm64: adjust el0_sync so that a function can be called To implement the context tracker properly on arm64, a function call needs to be made after debugging and interrupts are turned on, but before the lr is changed to point to ret_to_user(). If the function call is made after the lr is changed the function will not return to the correct place. For similar reasons, defer the setting of x0 so that it doesn't need to be saved around the function call (save far_el1 in x26 temporarily instead). Acked-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Kevin Hilman <khilman@linaro.org> Tested-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Larry Bassel <larry.bassel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Larry Bassel <larry.bassel@linaro.org>
2014-10-15	arm64: Support arch_irq_work_raise() via self IPIs	Larry Bassel
	Backport of the following patch to LSK 3.14: commit eb631bb5bf5b042202aaaee4a8dd8f863ba2a900 Author: Larry Bassel <larry.bassel@linaro.org> Date: Mon May 12 16:48:51 2014 +0100 arm64: Support arch_irq_work_raise() via self IPIs Support for arch_irq_work_raise() was missing from arm64 (a prerequisite for FULL_NOHZ). This patch is based on the arm32 patch ARM 7872/1. commit bf18525fd793101df42a1344ecc48b49b62e48c9 Author: Stephen Boyd <sboyd@codeaurora.org> Date: Tue Oct 29 20:32:56 2013 +0100 ARM: 7872/1: Support arch_irq_work_raise() via self IPIs By default, IRQ work is run from the tick interrupt (see irq_work_run() in update_process_times()). When we're in full NOHZ mode, restarting the tick requires the use of IRQ work and if the only place we run IRQ work is in the tick interrupt we have an unbreakable cycle. Implement arch_irq_work_raise() via self IPIs to break this cycle and get the tick started again. Note that we implement this via IPIs which are only available on SMP builds. This shouldn't be a problem because full NOHZ is only supported on SMP builds anyway. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Kevin Hilman <khilman@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Larry Bassel <larry.bassel@linaro.org> Reviewed-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Larry Bassel <larry.bassel@linaro.org>
2014-10-10	Merge tag 'v3.14.21' into linux-linaro-lsk-v3.14	Mark Brown
	This is the 3.14.21 stable release
2014-10-09	mm: per-thread vma caching	Davidlohr Bueso
	commit 615d6e8756c87149f2d4c1b93d471bca002bd849 upstream. This patch is a continuation of efforts trying to optimize find_vma(), avoiding potentially expensive rbtree walks to locate a vma upon faults. The original approach (https://lkml.org/lkml/2013/11/1/410), where the largest vma was also cached, ended up being too specific and random, thus further comparison with other approaches were needed. There are two things to consider when dealing with this, the cache hit rate and the latency of find_vma(). Improving the hit-rate does not necessarily translate in finding the vma any faster, as the overhead of any fancy caching schemes can be too high to consider. We currently cache the last used vma for the whole address space, which provides a nice optimization, reducing the total cycles in find_vma() by up to 250%, for workloads with good locality. On the other hand, this simple scheme is pretty much useless for workloads with poor locality. Analyzing ebizzy runs shows that, no matter how many threads are running, the mmap_cache hit rate is less than 2%, and in many situations below 1%. The proposed approach is to replace this scheme with a small per-thread cache, maximizing hit rates at a very low maintenance cost. Invalidations are performed by simply bumping up a 32-bit sequence number. The only expensive operation is in the rare case of a seq number overflow, where all caches that share the same address space are flushed. Upon a miss, the proposed replacement policy is based on the page number that contains the virtual address in question. Concretely, the following results are seen on an 80 core, 8 socket x86-64 box: 1) System bootup: Most programs are single threaded, so the per-thread scheme does improve ~50% hit rate by just adding a few more slots to the cache. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 50.61% \| 19.90 \| \| patched \| 73.45% \| 13.58 \| +----------------+----------+------------------+ 2) Kernel build: This one is already pretty good with the current approach as we're dealing with good locality. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 75.28% \| 11.03 \| \| patched \| 88.09% \| 9.31 \| +----------------+----------+------------------+ 3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload. +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 70.66% \| 17.14 \| \| patched \| 91.15% \| 12.57 \| +----------------+----------+------------------+ 4) Ebizzy: There's a fair amount of variation from run to run, but this approach always shows nearly perfect hit rates, while baseline is just about non-existent. The amounts of cycles can fluctuate between anywhere from ~60 to ~116 for the baseline scheme, but this approach reduces it considerably. For instance, with 80 threads: +----------------+----------+------------------+ \| caching scheme \| hit-rate \| cycles (billion) \| +----------------+----------+------------------+ \| baseline \| 1.06% \| 91.54 \| \| patched \| 99.97% \| 14.18 \| +----------------+----------+------------------+ [akpm@linux-foundation.org: fix nommu build, per Davidlohr] [akpm@linux-foundation.org: document vmacache_valid() logic] [akpm@linux-foundation.org: attempt to untangle header files] [akpm@linux-foundation.org: add vmacache_find() BUG_ON] [hughd@google.com: add vmacache_valid_mm() (from Oleg)] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: adjust and enhance comments] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-09	Merge remote-tracking branch 'lsk/v3.14/topic/kvm' into linux-linaro-lsk-v3.14	Mark Brown

2014-10-08	Merge remote-tracking branch 'lsk/v3.14/topic/gicv3' into linux-linaro-lsk-v3.14	Mark Brown

2014-10-08	Merge branch 'lsk/v3.14/topic/kvm' into lsk/lsk-with-kvm-v3.14v3.14/topic/kvm	Christoffer Dall
	Conflicts: arch/arm64/include/asm/debug-monitors.h
2014-10-08	mm: export symbol dependencies of is_zero_pfn()	Ard Biesheuvel
	In order to make the static inline function is_zero_pfn() callable by modules, export its symbol dependencies 'zero_pfn' and (for s390 and mips) 'zero_page_mask'. We need this for KVM, as CONFIG_KVM is a tristate for all supported architectures except ARM and arm64, and testing a pfn whether it refers to the zero page is required to correctly distinguish the zero page from other special RAM ranges that may also have the PG_reserved bit set, but need to be treated as MMIO memory. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 0b70068e47e8f0c813a902dc3d6def601fd15acb) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-08	Merge branch 'lsk/v3.14/topic/gic-v3' into lsk/lsk-with-kvm-v3.14	Christoffer Dall

2014-10-06	Merge tag 'v3.14.20' into linux-linaro-lsk-v3.14	Mark Brown
	This is the 3.14.20 stable release
2014-10-05	ARM: DRA7: Add support for soc_is_dra74x() and soc_is_dra72x() variants	Rajendra Nayak
	commit af438fec6cb99fc2e2faf8b16b865af26ce722e6 upstream. Use the corresponding compatibles to identify the devices. Signed-off-by: Rajendra Nayak <rnayak@ti.com> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Acked-by: Nishanth Menon <nm@ti.com> Tested-by: Nishanth Menon <nm@ti.com> Signed-off-by: Paul Walmsley <paul@pwsan.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU	Venkatesh Srinivas
	commit 24223657806a0ebd0ae5c9caaf7b021091889cf2 upstream. CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com Cc: zheng.z.yan@intel.com Cc: eranian@google.com Cc: ak@linux.intel.com Cc: linux-kernel@vger.kernel.org [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> [backport by whissi] Cc: Thomas D <whissi@whissi.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds	John David Anglin
	commit d26a7730b5874a5fa6779c62f4ad7c5065a94723 upstream. In spite of what the GCC manual says, the -mfast-indirect-calls has never been supported in the 64-bit parisc compiler. Indirect calls have always been done using function descriptors irrespective of the -mfast-indirect-calls option. Recently, it was noticed that a function descriptor was always requested when the -mfast-indirect-calls option was specified. This caused problems when the option was used in application code and doesn't make any sense because the whole point of the option is to avoid using a function descriptor for indirect calls. Fixing this broke 64-bit kernel builds. I will fix GCC but for now we need the attached change. This results in the same kernel code as before. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	parisc: Implement new LWS CAS supporting 64 bit operations.	Guy Martin
	commit 89206491201cbd1571009b36292af781cef74c1b upstream. The current LWS cas only works correctly for 32bit. The new LWS allows for CAS operations of variable size. Signed-off-by: Guy Martin <gmsoft@tuxicoman.be> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	powerpc: Add smp_mb()s to arch_spin_unlock_wait()	Michael Ellerman
	commit 78e05b1421fa41ae8457701140933baa5e7d9479 upstream. Similar to the previous commit which described why we need to add a barrier to arch_spin_is_locked(), we have a similar problem with spin_unlock_wait(). We need a barrier on entry to ensure any spinlock we have previously taken is visibly locked prior to the load of lock->slock. It's also not clear if spin_unlock_wait() is intended to have ACQUIRE semantics. For now be conservative and add a barrier on exit to give it ACQUIRE semantics. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	powerpc: Add smp_mb() to arch_spin_is_locked()	Michael Ellerman
	commit 51d7d5205d3389a32859f9939f1093f267409929 upstream. The kernel defines the function spin_is_locked(), which can be used to check if a spinlock is currently locked. Using spin_is_locked() on a lock you don't hold is obviously racy. That is, even though you may observe that the lock is unlocked, it may become locked at any time. There is (at least) one exception to that, which is if two locks are used as a pair, and the holder of each checks the status of the other before doing any update. Assuming A and B are two locks, and COUNTER is a shared non-atomic value: The first CPU does: spin_lock(A) if spin_is_locked(B) # nothing else smp_mb() LOAD r = COUNTER r++ STORE COUNTER = r spin_unlock(A) And the second CPU does: spin_lock(B) if spin_is_locked(A) # nothing else smp_mb() LOAD r = COUNTER r++ STORE COUNTER = r spin_unlock(B) Although this is a strange locking construct, it should work. It seems to be understood, but not documented, that spin_is_locked() is not a memory barrier, so in the examples above and below the caller inserts its own memory barrier before acting on the result of spin_is_locked(). For now we assume spin_is_locked() is implemented as below, and we break it out in our examples: bool spin_is_locked(LOCK) { LOAD l = LOCK return l.locked } Our intuition is that there should be no problem even if the two code sequences run simultaneously such as: CPU 0 CPU 1 ================================================== spin_lock(A) spin_lock(B) LOAD b = B LOAD a = A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(A) spin_unlock(B) If one CPU gets the lock before the other then it will do the update and the other CPU will back off: CPU 0 CPU 1 ================================================== spin_lock(A) LOAD b = B spin_lock(B) if b.locked # false LOAD a = A else if a.locked # true smp_mb() # nothing LOAD r1 = COUNTER spin_unlock(B) r1++ STORE COUNTER = r1 spin_unlock(A) However in reality spin_lock() itself is not indivisible. On powerpc we implement it as a load-and-reserve and store-conditional. Ignoring the retry logic for the lost reservation case, it boils down to: spin_lock(LOCK) { LOAD l = LOCK l.locked = true STORE LOCK = l ACQUIRE_BARRIER } The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as defined in memory-barriers.txt: This acts as a one-way permeable barrier. It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the ACQUIRE operation with respect to the other components of the system. On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is also know as "lightweight sync", or "sync 1". As described in Power ISA v2.07 section B.2.1.1, in this scenario the lwsync is not the barrier itself. It instead causes the LOAD of LOCK to act as the barrier, preventing any loads or stores in the locked region from occurring prior to the load of LOCK. Whether this behaviour is in accordance with the definition of ACQUIRE semantics in memory-barriers.txt is open to discussion, we may switch to a different barrier in future. What this means in practice is that the following can occur: CPU 0 CPU 1 ================================================== LOAD a = A LOAD b = B a.locked = true b.locked = true LOAD b = B LOAD a = A STORE A = a STORE B = b if b.locked # false if a.locked # false else else smp_mb() smp_mb() LOAD r1 = COUNTER LOAD r2 = COUNTER r1++ r2++ STORE COUNTER = r1 STORE COUNTER = r2 # Lost update spin_unlock(A) spin_unlock(B) That is, the load of B can occur prior to the store that makes A visibly locked. And similarly for CPU 1. The result is both CPUs hold their lock and believe the other lock is unlocked. The easiest fix for this is to add a full memory barrier to the start of spin_is_locked(), so adding to our previous definition would give us: bool spin_is_locked(LOCK) { smp_mb() LOAD l = LOCK return l.locked } The new barrier orders the store to the lock we are locking vs the load of the other lock: CPU 0 CPU 1 ================================================== LOAD a = A LOAD b = B a.locked = true b.locked = true STORE A = a STORE B = b smp_mb() smp_mb() LOAD b = B LOAD a = A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(A) spin_unlock(B) Although the above example is theoretical, there is code similar to this example in sem_lock() in ipc/sem.c. This commit in addition to the next commit appears to be a fix for crashes we are seeing in that code where we believe this race happens in practice. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	powerpc/perf: Fix ABIv2 kernel backtraces	Anton Blanchard
	commit 85101af13bb854a6572fa540df7c7201958624b9 upstream. ABIv2 kernels are failing to backtrace through the kernel. An example: 39.30% readseek2_proce [kernel.kallsyms] [k] find_get_entry \| --- find_get_entry __GI___libc_read The problem is in valid_next_sp() where we check that the new stack pointer is at least STACK_FRAME_OVERHEAD below the previous one. ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes with no paramter save area. STACK_FRAME_OVERHEAD is in theory the minimum stack frame size, but we over 240 uses of it, some of which assume that it includes space for the parameter area. We need to work through all our stack defines and rationalise them but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using in valid_next_sp(). This fixes the issue: 30.64% readseek2_proce [kernel.kallsyms] [k] find_get_entry \| --- find_get_entry pagecache_get_page generic_file_read_iter new_sync_read vfs_read sys_read syscall_exit __GI___libc_read Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	sched: Fix unreleased llc_shared_mask bit during CPU hotplug	Wanpeng Li
	commit 03bd4e1f7265548832a76e7919a81f3137c44fd1 upstream. The following bug can be triggered by hot adding and removing a large number of xen domain0's vcpus repeatedly: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group PGD 5a9d5067 PUD 13067 PMD 0 Oops: 0000 [#3] SMP [...] Call Trace: load_balance ? _raw_spin_unlock_irqrestore idle_balance __schedule schedule schedule_timeout ? lock_timer_base schedule_timeout_uninterruptible msleep lock_device_hotplug_sysfs online_store dev_attr_store sysfs_write_file vfs_write SyS_write system_call_fastpath Last level cache shared mask is built during CPU up and the build_sched_domain() routine takes advantage of it to setup the sched domain CPU topology. However, llc_shared_mask is not released during CPU disable, which leads to an invalid sched domainCPU topology. This patch fix it by releasing the llc_shared_mask correctly during CPU disable. Yasuaki also reported that this can happen on real hardware: https://lkml.org/lkml/2014/7/22/1018 His case is here: == Here is an example on my system. My system has 4 sockets and each socket has 15 cores and HT is enabled. In this case, each core of sockes is numbered as follows: \| CPU# Socket#0 \| 0-14 , 60-74 Socket#1 \| 15-29, 75-89 Socket#2 \| 30-44, 90-104 Socket#3 \| 45-59, 105-119 Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-44 and 90-104. When hot-removing socket#2 and #3, each core of sockets is numbered as follows: \| CPU# Socket#0 \| 0-14 , 60-74 Socket#1 \| 15-29, 75-89 But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains having 0x3fff80000001fffc0000000. After that, when hot-adding socket#2 and #3, each core of sockets is numbered as follows: \| CPU# Socket#0 \| 0-14 , 60-74 Socket#1 \| 15-29, 75-89 Socket#2 \| 30-59 Socket#3 \| 90-119 Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-59 and 90-104. So the mask has the wrong value. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Tested-by: Linn Crosetto <linn@hp.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	x86/kaslr: Avoid the setup_data area when picking location	Kees Cook
	commit 0cacbfbeb5077b63d5d3cf6df88b14ac12ad584b upstream. The KASLR location-choosing logic needs to avoid the setup_data list memory areas as well. Without this, it would be possible to have the ASLR position stomp on the memory, ultimately causing the boot to fail. Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Pavel Machek <pavel@ucw.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140911161931.GA12001@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8	Dave Young
	commit 3eddc69ffeba092d288c386646bfa5ec0fce25fd upstream. 3.16 kernel boot fail with earlyprintk=efi, it keeps scrolling at the bottom line of screen. Bisected, the first bad commit is below: commit 86dfc6f339886559d80ee0d4bd20fe5ee90450f0 Author: Lv Zheng <lv.zheng@intel.com> Date: Fri Apr 4 12:38:57 2014 +0800 ACPICA: Tables: Fix table checksums verification before installation. I did some debugging by enabling both serial and efi earlyprintk, below is some debug dmesg, seems early_ioremap fails in scroll up function due to no free slot, see below dmesg output: WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:116 __early_ioremap+0x90/0x1c4() __early_ioremap(ed00c800, 00000c80) not found slot Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.17.0-rc1+ #204 Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.15 05/09/2013 Call Trace: dump_stack+0x4e/0x7a warn_slowpath_common+0x75/0x8e ? __early_ioremap+0x90/0x1c4 warn_slowpath_fmt+0x47/0x49 __early_ioremap+0x90/0x1c4 ? sprintf+0x46/0x48 early_ioremap+0x13/0x15 early_efi_map+0x24/0x26 early_efi_scroll_up+0x6d/0xc0 early_efi_write+0x1b0/0x214 call_console_drivers.constprop.21+0x73/0x7e console_unlock+0x151/0x3b2 ? vprintk_emit+0x49f/0x532 vprintk_emit+0x521/0x532 ? console_unlock+0x383/0x3b2 printk+0x4f/0x51 acpi_os_vprintf+0x2b/0x2d acpi_os_printf+0x43/0x45 acpi_info+0x5c/0x63 ? __acpi_map_table+0x13/0x18 ? acpi_os_map_iomem+0x21/0x147 acpi_tb_print_table_header+0x177/0x186 acpi_tb_install_table_with_override+0x4b/0x62 acpi_tb_install_standard_table+0xd9/0x215 ? early_ioremap+0x13/0x15 ? __acpi_map_table+0x13/0x18 acpi_tb_parse_root_table+0x16e/0x1b4 acpi_initialize_tables+0x57/0x59 acpi_table_init+0x50/0xce acpi_boot_table_init+0x1e/0x85 setup_arch+0x9b7/0xcc4 start_kernel+0x94/0x42d ? early_idt_handlers+0x120/0x120 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0xf3/0x100 Quote reply from Lv.zheng about the early ioremap slot usage in this case: """ In early_efi_scroll_up(), 2 mapping entries will be used for the src/dst screen buffer. In drivers/acpi/acpica/tbutils.c, we've improved the early table loading code in acpi_tb_parse_root_table(). We now need 2 mapping entries: 1. One mapping entry is used for RSDT table mapping. Each RSDT entry contains an address for another ACPI table. 2. For each entry in RSDP, we need another mapping entry to map the table to perform necessary check/override before installing it. When acpi_tb_parse_root_table() prints something through EFI earlyprintk console, we'll have 4 mapping entries used. The current 4 slots setting of early_ioremap() seems to be too small for such a use case. """ Thus increase the slot to 8 in this patch to fix this issue. boot-time mappings become 512 page with this patch. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	x86/xen: don't copy bogus duplicate entries into kernel page tables	Stefan Bader
	commit 0b5a50635fc916cf46e3de0b819a61fc3f17e7ee upstream. When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded modules exceeds 512 MiB, then loading modules fails with a warning (and hence a vmalloc allocation failure) because the PTEs for the newly-allocated vmalloc address space are not zero. WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128 vmap_page_range_noflush+0x2a1/0x360() This is caused by xen_setup_kernel_pagetables() copying level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present entries. Without KASLR, the normal kernel image size only covers the first half of level2_kernel_pgt and module space starts after that. L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel [256..511]->module [511]->level2_fixmap_pgt[ 0..505]->module This allows 512 MiB of of module vmalloc space to be used before having to use the corrupted level2_fixmap_pgt entries. With KASLR enabled, the kernel image uses the full PUD range of 1G and module space starts in the level2_fixmap_pgt. So basically: L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel [511]->level2_fixmap_pgt[0..505]->module And now no module vmalloc space can be used without using the corrupt level2_fixmap_pgt entries. Fix this by properly converting the level2_fixmap_pgt entries to MFNs, and setting level1_fixmap_pgt as read-only. A number of comments were also using the the wrong L3 offset for level2_kernel_pgt. These have been corrected. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	KVM: s390/mm: try a cow on read only pages for key ops	Christian Borntraeger
	commit ab3f285f227fec62868037e9b1b1fd18294a83b8 upstream. The PFMF instruction handler blindly wrote the storage key even if the page was mapped R/O in the host. Lets try a COW before continuing and bail out in case of errors. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	MIPS: mcount: Adjust stack pointer for static trace in MIPS32	Markos Chandras
	commit 8a574cfa2652545eb95595d38ac2a0bb501af0ae upstream. Every mcount() call in the MIPS 32-bit kernel is done as follows: [...] move at, ra jal _mcount addiu sp, sp, -8 [...] but upon returning from the mcount() function, the stack pointer is not adjusted properly. This is explained in details in 58b69401c797 (MIPS: Function tracer: Fix broken function tracing). Commit ad8c396936e3 ("MIPS: Unbreak function tracer for 64-bit kernel.) fixed the stack manipulation for 64-bit but it didn't fix it completely for MIPS32. Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7792/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	MIPS: ZBOOT: add missing <linux/string.h> include	Aurelien Jarno
	commit 29593fd5a8149462ed6fad0d522234facdaee6c8 upstream. Commit dc4d7b37 (MIPS: ZBOOT: gather string functions into string.c) moved the string related functions into a separate file, which might cause the following build error, depending on the configuration: \| CC arch/mips/boot/compressed/decompress.o \| In file included from linux/arch/mips/boot/compressed/../../../../lib/decompress_unxz.c:234:0, \| from linux/arch/mips/boot/compressed/decompress.c:67: \| linux/arch/mips/boot/compressed/../../../../lib/xz/xz_dec_stream.c: In function 'fill_temp': \| linux/arch/mips/boot/compressed/../../../../lib/xz/xz_dec_stream.c:162:2: error: implicit declaration of function 'memcpy' [-Werror=implicit-function-declaration] \| cc1: some warnings being treated as errors \| linux/scripts/Makefile.build:308: recipe for target 'arch/mips/boot/compressed/decompress.o' failed \| make[6]: *** [arch/mips/boot/compressed/decompress.o] Error 1 \| linux/arch/mips/Makefile:308: recipe for target 'vmlinuz' failed It does not fail with the standard configuration, as when CONFIG_DYNAMIC_DEBUG is not enabled <linux/string.h> gets included in include/linux/dynamic_debug.h. There might be other ways for it to get indirectly included. We can't add the include directly in xz_dec_stream.c as some architectures might want to use a different version for the boot/ directory (see for example arch/x86/boot/string.h). Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/7420/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM: 8178/1: fix set_tls for !CONFIG_KUSER_HELPERS	Nathan Lynch
	commit 9cc6d9e5daaa147a9a3e31557efcb331989e77be upstream. Joachim Eastwood reports that commit fbfb872f5f41 "ARM: 8148/1: flush TLS and thumbee register state during exec" causes a boot-time crash on a Cortex-M4 nommu system: Freeing unused kernel memory: 68K (281e5000 - 281f6000) Unhandled exception: IPSR = 00000005 LR = fffffff1 CPU: 0 PID: 1 Comm: swapper Not tainted 3.17.0-rc6-00313-gd2205fa30aa7 #191 task: 29834000 ti: 29832000 task.ti: 29832000 PC is at flush_thread+0x2e/0x40 LR is at flush_thread+0x21/0x40 pc : [<2800954a>] lr : [<2800953d>] psr: 4100000b sp : 29833d60 ip : 00000000 fp : 00000001 r10: 00003cf8 r9 : 29b1f000 r8 : 00000000 r7 : 29b0bc00 r6 : 29834000 r5 : 29832000 r4 : 29832000 r3 : ffff0ff0 r2 : 29832000 r1 : 00000000 r0 : 282121f0 xPSR: 4100000b CPU: 0 PID: 1 Comm: swapper Not tainted 3.17.0-rc6-00313-gd2205fa30aa7 #191 [<2800afa5>] (unwind_backtrace) from [<2800a327>] (show_stack+0xb/0xc) [<2800a327>] (show_stack) from [<2800a963>] (__invalid_entry+0x4b/0x4c) The problem is that set_tls is attempting to clear the TLS location in the kernel-user helper page, which isn't set up on V7M. Fix this by guarding the write to the kuser helper page with a CONFIG_KUSER_HELPERS ifdef. Fixes: fbfb872f5f41 ARM: 8148/1: flush TLS and thumbee register state during exec Reported-by: Joachim Eastwood <manabian@gmail.com> Tested-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM: 8165/1: alignment: don't break misaligned NEON load/store	Robin Murphy
	commit 5ca918e5e3f9df4634077c06585c42bc6a8d699a upstream. The alignment fixup incorrectly decodes faulting ARM VLDn/VSTn instructions (where the optional alignment hint is given but incorrect) as LDR/STR, leading to register corruption. Detect these and correctly treat them as unhandled, so that userspace gets the fault it expects. Reported-by: Simon Hosie <simon.hosie@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM: 8148/1: flush TLS and thumbee register state during exec	Nathan Lynch
	commit fbfb872f5f417cea48760c535e0ff027c88b507a upstream. The TPIDRURO and TPIDRURW registers need to be flushed during exec; otherwise TLS information is potentially leaked. TPIDRURO in particular needs careful treatment. Since flush_thread basically needs the same code used to set the TLS in arm_syscall, pull that into a common set_tls helper in tls.h and use it in both places. Similarly, TEEHBR needs to be cleared during exec as well. Clearing its save slot in thread_info isn't right as there is no guarantee that a thread switch will occur before the new program runs. Just setting the register directly is sufficient. Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM: 8133/1: use irq_set_affinity with force=false when migrating irqs	Sudeep Holla
	commit a040803a9d6b8c1876d3487a5cb69602ebcbb82c upstream. Since commit 1dbfa187dad ("ARM: irq migration: force migration off CPU going down") the ARM interrupt migration code on cpu offline calls irqchip.irq_set_affinity() with the argument force=true. At the point of this change the argument had no effect because it was not used by any interrupt chip driver and there was no semantics defined. This changed with commit 01f8fa4f01d8 ("genirq: Allow forcing cpu affinity of interrupts") which made the force argument useful to route interrupts to not yet online cpus without checking the target cpu against the cpu online mask. The following commit ffde1de64012 ("irqchip: gic: Support forced affinity setting") implemented this for the GIC interrupt controller. As a consequence the ARM cpu offline irq migration fails if CPU0 is offlined, because CPU0 is still set in the affinity mask and the validataion against cpu online mask is skipped to the force argument being true. The following first_cpu(mask) selection always selects CPU0 as the target. Solve the issue by calling irq_set_affinity() with force=false from the CPU offline irq migration code so the GIC driver validates the affinity mask against CPU online mask and therefore removes CPU0 from the possible target candidates. Tested on TC2 hotpluging CPU0 in and out. Without this patch the system locks up as the IRQs are not migrated away from CPU0. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM: dts: dra7-evm: Fix spi1 mux documentation	Nishanth Menon
	commit 68e4d9e58dbae2fb178e8b74806f521adb16f0d3 upstream. While auditing the various pin ctrl configurations using the following command: grep PIN_ arch/arm/boot/dts/dra7-evm.dts\|(while read line; do v=`echo "$line" \| sed -e "s/\s\s*/\|/g" \| cut -d '\|' -f1 \| cut -d 'x' -f2\|tr [a-z] [A-Z]`; HEX=`echo "obase=16;ibase=16;4A003400+$v"\| bc`; echo "$HEX ===> $line"; done) against DRA75x/74x NDA TRM revision S(SPRUHI2S August 2014), documentation errors were found for spi1 pinctrl. Fix the same. Fixes: 6e58b8f1daaf1af ("ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board") Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM: dts: DRA7: fix interrupt-cells for GPIO	Nishanth Menon
	commit e49d519c456f4fb6f4a0473bc04ba30bb805fce5 upstream. GPIO modules are also interrupt sources. However, they require both the GPIO number and IRQ type to function properly. By declaring that GPIO uses interrupt-cells=<1>, we essentially do not allow users of the nodes to use the interrupt property appropritely. With this change, the following now works: interrupt-parent = <&gpio6>; interrupts = <5 IRQ_TYPE_LEVEL_LOW>; Fixes: 6e58b8f1daaf ('ARM: dts: DRA7: Add the dts files for dra7 SoC and dra7-evm board') Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM: DRA7: hwmod: Add dra74x and dra72x specific ocp interface lists	Rajendra Nayak
	commit f7f7a29bf0cf25af23f37e5b5bf1368b85705286 upstream. To deal with IPs which are specific to dra74x and dra72x, maintain seperate ocp interface lists, while keeping the common list for all common IPs. Move USB OTG SS4 to dra74x only list since its unavailable in dra72x and is giving an abort during boot. The dra72x only list is empty for now and a placeholder for future hwmod additions which are specific to dra72x. Fixes: d904b38df0db13 ("ARM: DRA7: hwmod: Add SYSCONFIG for usb_otg_ss") Reported-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Rajendra Nayak <rnayak@ti.com> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com> Tested-by: Nishanth Menon <nm@ti.com> [paul@pwsan.com: fixed comment style to conform with CodingStyle] Signed-off-by: Paul Walmsley <paul@pwsan.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM: 8128/1: abort: don't clear the exclusive monitors	Mark Rutland
	commit 85868313177700d20644263a782351262d2aff84 upstream. The ARMv6 and ARMv7 early abort handlers clear the exclusive monitors upon entry to the kernel, but this is redundant: - We clear the monitors on every exception return since commit 200b812d0084 ("Clear the exclusive monitor when returning from an exception"), so this is not necessary to ensure the monitors are cleared before returning from a fault handler. - Any dummy STREX will target a temporary scratch area in memory, and may succeed or fail without corrupting useful data. Its status value will not be used. - Any other STREX in the kernel must be preceded by an LDREX, which will initialise the monitors consistently and will not depend on the earlier state of the monitors. Therefore we have no reason to care about the initial state of the exclusive monitors when a data abort is taken, and clearing the monitors prior to exception return (as we already do) is sufficient. This patch removes the redundant clearing of the exclusive monitors from the early abort handlers. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	xtensa: fix a6 and a7 handling in fast_syscall_xtensa	Max Filippov
	commit d1b6ba82a50cecf94be540a3a153aa89d97511a0 upstream. Remove restoring a6 on some return paths and instead modify and restore it in a single place, using symbolic name. Correctly restore a7 from PT_AREG7 in case of illegal a6 value. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss	Max Filippov
	commit 7128039fe2dd3d59da9e4ffa036f3aaa3ba87b9f upstream. Current definition of TLBTEMP_BASE_2 is always 32K above the TLBTEMP_BASE_1, whereas fast_second_level_miss handler for the TLBTEMP region analyzes virtual address bit (PAGE_SHIFT + DCACHE_ALIAS_ORDER) to determine TLBTEMP region where the fault happened. The size of the TLBTEMP region is also checked incorrectly: not 64K, but twice data cache way size (whicht may as well be less than the instruction cache way size). Fix TLBTEMP_BASE_2 to be TLBTEMP_BASE_1 + data cache way size. Provide TLBTEMP_SIZE that is a greater of doubled data cache way size or the instruction cache way size, and use it to determine if the second level TLB miss occured in the TLBTEMP region. Practical occurence of page faults in the TLBTEMP area is extremely rare, this code can be tested by deletion of all w[di]tlb instructions in the tlbtemp_mapping region. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	xtensa: fix access to THREAD_RA/THREAD_SP/THREAD_DS	Max Filippov
	commit 52247123749cc3cbc30168b33ad8c69515c96d23 upstream. With SMP and a lot of debug options enabled task_struct::thread gets out of reach of s32i/l32i instructions with base pointing at task_struct, breaking build with the following messages: arch/xtensa/kernel/entry.S: Assembler messages: arch/xtensa/kernel/entry.S:1002: Error: operand 3 of 'l32i.n' has invalid value '1048' arch/xtensa/kernel/entry.S:1831: Error: operand 3 of 's32i.n' has invalid value '1040' arch/xtensa/kernel/entry.S:1832: Error: operand 3 of 's32i.n' has invalid value '1044' Change base to point to task_struct::thread in such cases. Don't use a10 in _switch_to to save/restore prev pointer as a2 is not clobbered. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	xtensa: fix address checks in dma_{alloc,free}_coherent	Alan Douglas
	commit 1ca49463c44c970b1ab1d71b0f268bfdf8427a7e upstream. Virtual address is translated to the XCHAL_KSEG_CACHED region in the dma_free_coherent, but is checked to be in the 0...XCHAL_KSEG_SIZE range. Change check for end of the range from 'addr >= X' to 'addr > X - 1' to handle the case of X == 0. Replace 'if (C) BUG();' construct with 'BUG_ON(C);'. Signed-off-by: Alan Douglas <adouglas@cadence.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	xtensa: replace IOCTL code definitions with constants	Max Filippov
	commit f61bf8e7d19e0a3456a7a9ed97c399e4353698dc upstream. This fixes userspace code that builds on other architectures but fails on xtensa due to references to structures that other architectures don't refer to. E.g. this fixes the following issue with python-2.7.8: python-2.7.8/Modules/termios.c:861:25: error: invalid application of 'sizeof' to incomplete type 'struct serial_multiport_struct' {"TIOCSERGETMULTI", TIOCSERGETMULTI}, python-2.7.8/Modules/termios.c:870:25: error: invalid application of 'sizeof' to incomplete type 'struct serial_multiport_struct' {"TIOCSERSETMULTI", TIOCSERSETMULTI}, python-2.7.8/Modules/termios.c:900:24: error: invalid application of 'sizeof' to incomplete type 'struct tty_struct' {"TIOCTTYGSTRUCT", TIOCTTYGSTRUCT}, Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	arm64: ptrace: fix compat hardware watchpoint reporting	Will Deacon
	commit 27d7ff273c2aad37b28f6ff0cab2cfa35b51e648 upstream. I'm not sure what I was on when I wrote this, but when iterating over the hardware watchpoint array (hbp_watch_array), our index is off by ARM_MAX_BRP, so we walk off the end of our thread_struct... ... except, a dodgy condition in the loop means that it never executes at all (bp cannot be NULL). This patch fixes the code so that we remove the bp check and use the correct index for accessing the watchpoint structures. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU	Pranavkumar Sawargaonkar
	commit f6edbbf36da3a27b298b66c7955fc84e1dcca305 upstream. X-Gene u-boot runs in EL2 mode with MMU enabled hence we might have stale EL2 tlb enteris when we enable EL2 MMU on each host CPU. This can happen on any ARM/ARM64 board running bootloader in Hyp-mode (or EL2-mode) with MMU enabled. This patch ensures that we flush all Hyp-mode (or EL2-mode) TLBs on each host CPU before enabling Hyp-mode (or EL2-mode) MMU. Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org> Signed-off-by: Anup Patel <anup.patel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	arm/arm64: KVM: Complete WFI/WFE instructions	Christoffer Dall
	commit 05e0127f9e362b36aa35f17b1a3d52bca9322a3a upstream. The architecture specifies that when the processor wakes up from a WFE or WFI instruction, the instruction is considered complete, however we currrently return to EL1 (or EL0) at the WFI/WFE instruction itself. While most guests may not be affected by this because their local exception handler performs an exception returning setting the event bit or with an interrupt pending, some guests like UEFI will get wedged due this little mishap. Simply skip the instruction when we have completed the emulation. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	arm64: use irq_set_affinity with force=false when migrating irqs	Sudeep Holla
	commit 3d8afe3099ebc602848aa7f09235cce3a9a023ce upstream. The arm64 interrupt migration code on cpu offline calls irqchip.irq_set_affinity() with the argument force=true. Originally this argument had no effect because it was not used by any interrupt chip driver and there was no semantics defined. This changed with commit 01f8fa4f01d8 ("genirq: Allow forcing cpu affinity of interrupts") which made the force argument useful to route interrupts to not yet online cpus without checking the target cpu against the cpu online mask. The following commit ffde1de64012 ("irqchip: gic: Support forced affinity setting") implemented this for the GIC interrupt controller. As a consequence the cpu offline irq migration fails if CPU0 is offlined, because CPU0 is still set in the affinity mask and the validation against cpu online mask is skipped to the force argument being true. The following first_cpu(mask) selection always selects CPU0 as the target. Commit 601c942176d8("arm64: use cpu_online_mask when using forced irq_set_affinity") intended to fix the above mentioned issue but introduced another issue where affinity can be migrated to a wrong CPU due to unconditional copy of cpu_online_mask. As with for arm, solve the issue by calling irq_set_affinity() with force=false from the CPU offline irq migration code so the GIC driver validates the affinity mask against CPU online mask and therefore removes CPU0 from the possible target candidates. Also revert the changes done in the commit 601c942176d8 as it's no longer needed. Tested on Juno platform. Fixes: 601c942176d8("arm64: use cpu_online_mask when using forced irq_set_affinity") Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-05	arm64: flush TLS registers during exec	Will Deacon
	commit eb35bdd7bca29a13c8ecd44e6fd747a84ce675db upstream. Nathan reports that we leak TLS information from the parent context during an exec, as we don't clear the TLS registers when flushing the thread state. This patch updates the flushing code so that we: (1) Unconditionally zero the tpidr_el0 register (since this is fully context switched for native tasks and zeroed for compat tasks) (2) Zero the tp_value state in thread_info before clearing the tpidrr0_el0 register for compat tasks (since this is only writable by the set_tls compat syscall and therefore not fully switched). A missing compiler barrier is also added to the compat set_tls syscall. Acked-by: Nathan Lynch <Nathan_Lynch@mentor.com> Reported-by: Nathan Lynch <Nathan_Lynch@mentor.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-10-03	ARM: 7990/1: asm: rename logical shift macros push pull into lspush lspull	Victor Kamensky
	Renames logical shift macros, 'push' and 'pull', defined in arch/arm/include/asm/assembler.h, into 'lspush' and 'lspull'. That eliminates name conflict between 'push' logical shift macro and 'push' instruction mnemonic. That allows assembler.h to be included in .S files that use 'push' instruction. Suggested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> (cherry picked from commit d98b90ea22b0a28d9d787769704a9cf1ea5a513a) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03	arm64: barriers: wire up new barrier options	Will Deacon
	Now that all callers of the barrier macros are updated to pass the mandatory options, update the macros so the option is actually used. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> (cherry picked from commit 493e68747e07b69da3d746352525a1ebd6b61d82) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03	arm: kvm: fix CPU hotplug	Vladimir Murzin
	On some platforms with no power management capabilities, the hotplug implementation is allowed to return from a smp_ops.cpu_die() call as a function return. Upon a CPU onlining event, the KVM CPU notifier tries to reinstall the hyp stub, which fails on platform where no reset took place following a hotplug event, with the message: CPU1: smp_ops.cpu_die() returned, trying to resuscitate CPU1: Booted secondary processor Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x80409540 unexpected data abort in Hyp mode at: 0x80401fe8 unexpected HVC/SVC trap in Hyp mode at: 0x805c6170 since KVM code is trying to reinstall the stub on a system where it is already configured. To prevent this issue, this patch adds a check in the KVM hotplug notifier that detects if the HYP stub really needs re-installing when a CPU is onlined and skips the installation call if the stub is already in place, which means that the CPU has not been reset. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> (cherry picked from commit 37a34ac1d4775aafbc73b9db53c7daebbbc67e6a) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03	arm/arm64: KVM: Report correct FSC for unsupported fault types	Christoffer Dall
	When we catch something that's not a permission fault or a translation fault, we log the unsupported FSC in the kernel log, but we were masking off the bottom bits of the FSC which was not very helpful. Also correctly report the FSC for data and instruction faults rather than telling people it was a DFCS, which doesn't exist in the ARM ARM. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> (cherry picked from commit 0496daa5cf99741ce8db82686b4c7446a37feabb) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03	arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc	Joel Schopp
	The current aarch64 calculation for VTTBR_BADDR_MASK masks only 39 bits and not all the bits in the PA range. This is clearly a bug that manifests itself on systems that allocate memory in the higher address space range. [ Modified from Joel's original patch to be based on PHYS_MASK_SHIFT instead of a hard-coded value and to move the alignment check of the allocation to mmu.c. Also added a comment explaining why we hardcode the IPA range and changed the stage-2 pgd allocation to be based on the 40 bit IPA range instead of the maximum possible 48 bit PA range. - Christoffer ] Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Joel Schopp <joel.schopp@amd.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> (cherry picked from commit dbff124e29fa24aff9705b354b5f4648cd96e0bb) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03	arm/arm64: KVM: vgic: make number of irqs a configurable attribute	Marc Zyngier
	In order to make the number of interrupts configurable, use the new fancy device management API to add KVM_DEV_ARM_VGIC_GRP_NR_IRQS as a VGIC configurable attribute. Userspace can now specify the exact size of the GIC (by increments of 32 interrupts). Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit a98f26f183801685ef57333de4bafd4bbc692c7c) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-10-03	arm/arm64: KVM: vgic: delay vgic allocation until init time	Marc Zyngier
	It is now quite easy to delay the allocation of the vgic tables until we actually require it to be up and running (when the first vcpu is kicking around, or someones tries to access the GIC registers). This allow us to allocate memory for the exact number of CPUs we have. As nobody configures the number of interrupts just yet, use a fallback to VGIC_NR_IRQS_LEGACY. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> (cherry picked from commit 4956f2bc1fdee4bc336532f3f34635a8534cedfd) Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>