aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-08-25Merge tag 'v4.1.5-rt5-lno1' of ↵lsk-v4.1-15.08-rtKevin Hilman
git://git.linaro.org/people/anders.roxell/linux-rt into linux-linaro-lsk-v4.1-rt Linux 4.1.5-rt5 Changes since v4.1.3-rt3 - don't disable preemption in dump_stack(). We should not see a backtrace on a production kernel but then it should not increase the latency if trigger one. * tag 'v4.1.5-rt5-lno1' of git://git.linaro.org/people/anders.roxell/linux-rt: (270 commits) localversion: Add RT specific localversion file workqueue: Prevent deadlock/stall on RT md: disable bcache rt,ntp: Move call to schedule_delayed_work() to helper thread memcontrol: Prevent scheduling while atomic in cgroup code cgroups: use simple wait in css_release() i915: bogus warning from i915 when running on PREEMPT_RT drm/i915: drop trace_i915_gem_ring_dispatch on rt gpu/i915: don't open code these things cpufreq: drop K8's driver from beeing selected mmc: sdhci: don't provide hard irq handler mmci: Remove bogus local_irq_save() i2c/omap: drop the lock hard irq context leds: trigger: disable CPU trigger on -RT arch/arm64: Add lazy preempt support powerpc: Add support for lazy preemption arm: Add support for lazy preemption x86: Support for lazy preemption sched: Add support for lazy preemption rcu: make RCU_BOOST default on RT ...
2015-08-17Merge branch 'v4.1.5-rt5' into v4.1-rtAnders Roxell
2015-08-17Merge branch 'v4.1.3-rt3' into v4.1.5-rt5Anders Roxell
Used the "ours" merge strategy to throw away the previous -rt releases
2015-08-17localversion: Add RT specific localversion fileThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-8vdw4bfcsds27cvox6rpb334@git.kernel.org
2015-08-17workqueue: Prevent deadlock/stall on RTThomas Gleixner
Austin reported a XFS deadlock/stall on RT where scheduled work gets never exececuted and tasks are waiting for each other for ever. The underlying problem is the modification of the RT code to the handling of workers which are about to go to sleep. In mainline a worker thread which goes to sleep wakes an idle worker if there is more work to do. This happens from the guts of the schedule() function. On RT this must be outside and the accessed data structures are not protected against scheduling due to the spinlock to rtmutex conversion. So the naive solution to this was to move the code outside of the scheduler and protect the data structures by the pool lock. That approach turned out to be a little naive as we cannot call into that code when the thread blocks on a lock, as it is not allowed to block on two locks in parallel. So we dont call into the worker wakeup magic when the worker is blocked on a lock, which causes the deadlock/stall observed by Austin and Mike. Looking deeper into that worker code it turns out that the only relevant data structure which needs to be protected is the list of idle workers which can be woken up. So the solution is to protect the list manipulation operations with preempt_enable/disable pairs on RT and call unconditionally into the worker code even when the worker is blocked on a lock. The preemption protection is safe as there is nothing which can fiddle with the list outside of thread context. Reported-and_tested-by: Austin Schuh <austin@peloton-tech.com> Reported-and_tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2015-08-17md: disable bcacheSebastian Andrzej Siewior
It uses anon semaphores |drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’: |drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration] | up_read_non_owner(&dc->writeback_lock); | ^ |drivers/md/bcache/request.c: In function ‘request_write’: |drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration] | down_read_non_owner(&dc->writeback_lock); | ^ either we get rid of those or we have to introduce them… Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17rt,ntp: Move call to schedule_delayed_work() to helper threadSteven Rostedt
The ntp code for notify_cmos_timer() is called from a hard interrupt context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks that have been converted to mutexes, thus calling schedule_delayed_work() from interrupt is not safe. Add a helper thread that does the call to schedule_delayed_work and wake up that thread instead of calling schedule_delayed_work() directly. This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls schedule_delayed_work() directly in irq context. Note: There's a few places in the kernel that do this. Perhaps the RT code should have a dedicated thread that does the checks. Just register a notifier on boot up for your check and wake up the thread when needed. This will be a todo. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-08-17memcontrol: Prevent scheduling while atomic in cgroup codeMike Galbraith
mm, memcg: make refill_stock() use get_cpu_light() Nikita reported the following memcg scheduling while atomic bug: Call Trace: [e22d5a90] [c0007ea8] show_stack+0x4c/0x168 (unreliable) [e22d5ad0] [c0618c04] __schedule_bug+0x94/0xb0 [e22d5ae0] [c060b9ec] __schedule+0x530/0x550 [e22d5bf0] [c060bacc] schedule+0x30/0xbc [e22d5c00] [c060ca24] rt_spin_lock_slowlock+0x180/0x27c [e22d5c70] [c00b39dc] res_counter_uncharge_until+0x40/0xc4 [e22d5ca0] [c013ca88] drain_stock.isra.20+0x54/0x98 [e22d5cc0] [c01402ac] __mem_cgroup_try_charge+0x2e8/0xbac [e22d5d70] [c01410d4] mem_cgroup_charge_common+0x3c/0x70 [e22d5d90] [c0117284] __do_fault+0x38c/0x510 [e22d5df0] [c011a5f4] handle_pte_fault+0x98/0x858 [e22d5e50] [c060ed08] do_page_fault+0x42c/0x6fc [e22d5f40] [c000f5b4] handle_page_fault+0xc/0x80 What happens: refill_stock() get_cpu_var() drain_stock() res_counter_uncharge() res_counter_uncharge_until() spin_lock() <== boom Fix it by replacing get/put_cpu_var() with get/put_cpu_light(). Reported-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17cgroups: use simple wait in css_release()Sebastian Andrzej Siewior
To avoid: |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914 |in_atomic(): 1, irqs_disabled(): 0, pid: 92, name: rcuc/11 |2 locks held by rcuc/11/92: | #0: (rcu_callback){......}, at: [<ffffffff810e037e>] rcu_cpu_kthread+0x3de/0x940 | #1: (rcu_read_lock_sched){......}, at: [<ffffffff81328390>] percpu_ref_call_confirm_rcu+0x0/0xd0 |Preemption disabled at:[<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0 |CPU: 11 PID: 92 Comm: rcuc/11 Not tainted 3.18.7-rt0+ #1 | ffff8802398cdf80 ffff880235f0bc28 ffffffff815b3a12 0000000000000000 | 0000000000000000 ffff880235f0bc48 ffffffff8109aa16 0000000000000000 | ffff8802398cdf80 ffff880235f0bc78 ffffffff815b8dd4 000000000000df80 |Call Trace: | [<ffffffff815b3a12>] dump_stack+0x4f/0x7c | [<ffffffff8109aa16>] __might_sleep+0x116/0x190 | [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60 | [<ffffffff8108d2cd>] queue_work_on+0x6d/0x1d0 | [<ffffffff8110c881>] css_release+0x81/0x90 | [<ffffffff8132844e>] percpu_ref_call_confirm_rcu+0xbe/0xd0 | [<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0 | [<ffffffff810e03e5>] rcu_cpu_kthread+0x445/0x940 | [<ffffffff81098a2d>] smpboot_thread_fn+0x18d/0x2d0 | [<ffffffff810948d8>] kthread+0xe8/0x100 | [<ffffffff815b9c3c>] ret_from_fork+0x7c/0xb0 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17i915: bogus warning from i915 when running on PREEMPT_RTClark Williams
The i915 driver has a 'WARN_ON(!in_interrupt())' in the display handler, which whines constanly on the RT kernel (since the interrupt is actually handled in a threaded handler and not actual interrupt context). Change the WARN_ON to WARN_ON_NORT Tested-by: Joakim Hernberg <jhernberg@alchemy.lu> Signed-off-by: Clark Williams <williams@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17drm/i915: drop trace_i915_gem_ring_dispatch on rtSebastian Andrzej Siewior
This tracepoint is responsible for: |[<814cc358>] __schedule_bug+0x4d/0x59 |[<814d24cc>] __schedule+0x88c/0x930 |[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50 |[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50 |[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250 |[<814d27d9>] schedule+0x29/0x70 |[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278 |[<814d3786>] rt_spin_lock+0x26/0x30 |[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915] |[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915] |[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915] |[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915] |[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60 |[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180 |[<8101d063>] ? native_sched_clock+0x13/0x80 |[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915] |[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm] |[<8101d0d9>] ? sched_clock+0x9/0x10 |[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915] |[<810f1c18>] ? rb_commit+0x68/0xa0 |[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0 |[<81197467>] do_vfs_ioctl+0x97/0x540 |[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130 |[<811979a1>] sys_ioctl+0x91/0xb0 |[<814db931>] tracesys+0xe1/0xe6 Chris Wilson does not like to move i915_trace_irq_get() out of the macro |No. This enables the IRQ, as well as making a number of |very expensively serialised read, unconditionally. so it is gone now on RT. Reported-by: Joakim Hernberg <jbh@alchemy.lu> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17gpu/i915: don't open code these thingsSebastian Andrzej Siewior
The opencode part is gone in 1f83fee0 ("drm/i915: clear up wedged transitions") the owner check is still there. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17cpufreq: drop K8's driver from beeing selectedSebastian Andrzej Siewior
Ralf posted a picture of a backtrace from | powernowk8_target_fn() -> transition_frequency_fidvid() and then at the | end: | 932 policy = cpufreq_cpu_get(smp_processor_id()); | 933 cpufreq_cpu_put(policy); crashing the system on -RT. I assumed that policy was a NULL pointer but was rulled out. Since Ralf can't do any more investigations on this and I have no machine with this, I simply switch it off. Reported-by: Ralf Mardorf <ralf.mardorf@alice-dsl.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17mmc: sdhci: don't provide hard irq handlerSebastian Andrzej Siewior
the sdhci code provides both irq handlers: the primary and the thread handler. Initially it was meant for the primary handler to be very short. The result is not that on -RT we have the primrary handler grabing locks and this isn't really working. As a hack for now I just push both handler into the threaded mode. Reported-By: Michal Šmucr <msmucr@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17mmci: Remove bogus local_irq_save()Thomas Gleixner
On !RT interrupt runs with interrupts disabled. On RT it's in a thread, so no need to disable interrupts at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17i2c/omap: drop the lock hard irq contextSebastian Andrzej Siewior
The lock is taken while reading two registers. On RT the first lock is taken in hard irq where it might sleep and in the threaded irq. The threaded irq runs in oneshot mode so the hard irq does not run until the thread the completes so there is no reason to grab the lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17leds: trigger: disable CPU trigger on -RTSebastian Andrzej Siewior
as it triggers: |CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141 |[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20) |[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c) |[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170) |[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38) |[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c) |[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c) |[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c) |[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) |[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234) |[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0) |[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380) Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17arch/arm64: Add lazy preempt supportAnders Roxell
arm64 is missing support for PREEMPT_RT. The main feature which is lacking is support for lazy preemption. The arch-specific entry code, thread information structure definitions, and associated data tables have to be extended to provide this support. Then the Kconfig file has to be extended to indicate the support is available, and also to indicate that support for full RT preemption is now available. Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-08-17powerpc: Add support for lazy preemptionThomas Gleixner
Implement the powerpc pieces for lazy preempt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17arm: Add support for lazy preemptionThomas Gleixner
Implement the arm pieces for lazy preempt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17x86: Support for lazy preemptionThomas Gleixner
Implement the x86 pieces for lazy preempt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17sched: Add support for lazy preemptionThomas Gleixner
It has become an obsession to mitigate the determinism vs. throughput loss of RT. Looking at the mainline semantics of preemption points gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER tasks. One major issue is the wakeup of tasks which are right away preempting the waking task while the waking task holds a lock on which the woken task will block right after having preempted the wakee. In mainline this is prevented due to the implicit preemption disable of spin/rw_lock held regions. On RT this is not possible due to the fully preemptible nature of sleeping spinlocks. Though for a SCHED_OTHER task preempting another SCHED_OTHER task this is really not a correctness issue. RT folks are concerned about SCHED_FIFO/RR tasks preemption and not about the purely fairness driven SCHED_OTHER preemption latencies. So I introduced a lazy preemption mechanism which only applies to SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the existing preempt_count each tasks sports now a preempt_lazy_count which is manipulated on lock acquiry and release. This is slightly incorrect as for lazyness reasons I coupled this on migrate_disable/enable so some other mechanisms get the same treatment (e.g. get_cpu_light). Now on the scheduler side instead of setting NEED_RESCHED this sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and therefor allows to exit the waking task the lock held region before the woken task preempts. That also works better for cross CPU wakeups as the other side can stay in the adaptive spinning loop. For RT class preemption there is no change. This simply sets NEED_RESCHED and forgoes the lazy preemption counter. Initial test do not expose any observable latency increasement, but history shows that I've been proven wrong before :) The lazy preemption mode is per default on, but with CONFIG_SCHED_DEBUG enabled it can be disabled via: # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features and reenabled via # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features The test results so far are very machine and workload dependent, but there is a clear trend that it enhances the non RT workload performance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17rcu: make RCU_BOOST default on RTSebastian Andrzej Siewior
Since it is no longer invoked from the softirq people run into OOM more often if the priority of the RCU thread is too low. Making boosting default on RT should help in those case and it can be switched off if someone knows better. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17rcu: Eliminate softirq processing from rcutreePaul E. McKenney
Running RCU out of softirq is a problem for some workloads that would like to manage RCU core processing independently of other softirq work, for example, setting kthread priority. This commit therefore moves the RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread named rcuc. The SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17rcu: Disable RCU_FAST_NO_HZ on RTThomas Gleixner
This uses a timer_list timer from the irq disabled guts of the idle code. Disable it for now to prevent wreckage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17perf: Make swevent hrtimer run in irq instead of softirqYong Zhang
Otherwise we get a deadlock like below: [ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003 [ 1044.042752] INFO: lockdep is turned off. [ 1044.042754] Modules linked in: [ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G W 3.4.0-rc2-rt3-23676-ga723175-dirty #29 [ 1044.042759] Call Trace: [ 1044.042761] <IRQ> [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80 [ 1044.042770] [<ffffffff8168978c>] __schedule+0x83c/0xa70 [ 1044.042775] [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0 [ 1044.042779] [<ffffffff81689a5e>] schedule+0x2e/0xa0 [ 1044.042782] [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0 [ 1044.042786] [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40 [ 1044.042790] [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40 [ 1044.042794] [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50 [ 1044.042798] [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40 [ 1044.042802] [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10 [ 1044.042805] [<ffffffff8111c568>] event_sched_out+0x118/0x1d0 [ 1044.042809] [<ffffffff8111c649>] group_sched_out+0x29/0x90 [ 1044.042813] [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200 [ 1044.042817] [<ffffffff8111c343>] remote_function+0x63/0x70 [ 1044.042821] [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120 [ 1044.042826] [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40 [ 1044.042831] [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80 [ 1044.042833] <EOI> [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042840] [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70 [ 1044.042844] [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70 [ 1044.042848] [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200 [ 1044.042853] [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042857] [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0 [ 1044.042862] [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200 [ 1044.042865] [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210 [ 1044.042869] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042873] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042877] [<ffffffff8106b596>] kthread+0xb6/0xc0 [ 1044.042881] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042886] [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10 [ 1044.042889] [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110 [ 1044.042894] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042897] [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe [ 1044.042900] [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0 [ 1044.042902] [<ffffffff8168d990>] ? gs_change+0xb/0xb Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-08-17lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionalsJosh Cartwright
"lockdep: Selftest: Only do hardirq context test for raw spinlock" disabled the execution of certain tests with PREEMPT_RT_FULL, but did not prevent the tests from still being defined. This leads to warnings like: ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function] ... Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT_FULL conditionals. Signed-off-by: Josh Cartwright <josh.cartwright@ni.com> Signed-off-by: Xander Huff <xander.huff@ni.com> Acked-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17lockdep: selftest: Only do hardirq context test for raw spinlockYong Zhang
On -rt there is no softirq context any more and rwlock is sleepable, disable softirq context test and rwlock+irq test. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Yong Zhang <yong.zhang@windriver.com> Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17crypto: Convert crypto notifier chain to SRCUPeter Zijlstra
The crypto notifier deadlocks on RT. Though this can be a real deadlock on mainline as well due to fifo fair rwsems. The involved parties here are: [ 82.172678] swapper/0 S 0000000000000001 0 1 0 0x00000000 [ 82.172682] ffff88042f18fcf0 0000000000000046 ffff88042f18fc80 ffffffff81491238 [ 82.172685] 0000000000011cc0 0000000000011cc0 ffff88042f18c040 ffff88042f18ffd8 [ 82.172688] 0000000000011cc0 0000000000011cc0 ffff88042f18ffd8 0000000000011cc0 [ 82.172689] Call Trace: [ 82.172697] [<ffffffff81491238>] ? _raw_spin_unlock_irqrestore+0x6c/0x7a [ 82.172701] [<ffffffff8148fd3f>] schedule+0x64/0x66 [ 82.172704] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0 [ 82.172708] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c [ 82.172713] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141 [ 82.172716] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f [ 82.172719] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182 [ 82.172722] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e [ 82.172726] [<ffffffff811debfd>] crypto_wait_for_test+0x49/0x6b [ 82.172728] [<ffffffff811ded32>] crypto_register_alg+0x53/0x5a [ 82.172730] [<ffffffff811ded6c>] crypto_register_algs+0x33/0x72 [ 82.172734] [<ffffffff81ad7686>] ? aes_init+0x12/0x12 [ 82.172737] [<ffffffff81ad76ea>] aesni_init+0x64/0x66 [ 82.172741] [<ffffffff81000318>] do_one_initcall+0x7f/0x13b [ 82.172744] [<ffffffff81ac4d34>] kernel_init+0x199/0x22c [ 82.172747] [<ffffffff81ac44ef>] ? loglevel+0x31/0x31 [ 82.172752] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10 [ 82.172755] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13 [ 82.172759] [<ffffffff81ac4b9b>] ? start_kernel+0x3ca/0x3ca [ 82.172761] [<ffffffff814987c0>] ? gs_change+0x13/0x13 [ 82.174186] cryptomgr_test S 0000000000000001 0 41 2 0x00000000 [ 82.174189] ffff88042c971980 0000000000000046 ffffffff81d74830 0000000000000292 [ 82.174192] 0000000000011cc0 0000000000011cc0 ffff88042c96eb80 ffff88042c971fd8 [ 82.174195] 0000000000011cc0 0000000000011cc0 ffff88042c971fd8 0000000000011cc0 [ 82.174195] Call Trace: [ 82.174198] [<ffffffff8148fd3f>] schedule+0x64/0x66 [ 82.174201] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0 [ 82.174204] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c [ 82.174206] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141 [ 82.174209] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f [ 82.174212] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182 [ 82.174215] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e [ 82.174218] [<ffffffff811e4883>] cryptomgr_notify+0x280/0x385 [ 82.174221] [<ffffffff814943de>] notifier_call_chain+0x6b/0x98 [ 82.174224] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12 [ 82.174227] [<ffffffff810677cd>] __blocking_notifier_call_chain+0x70/0x8d [ 82.174230] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16 [ 82.174234] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50 [ 82.174236] [<ffffffff811dd7a1>] crypto_alg_mod_lookup+0x3e/0x74 [ 82.174238] [<ffffffff811dd949>] crypto_alloc_base+0x36/0x8f [ 82.174241] [<ffffffff811e9408>] cryptd_alloc_ablkcipher+0x6e/0xb5 [ 82.174243] [<ffffffff811dd591>] ? kzalloc.clone.5+0xe/0x10 [ 82.174246] [<ffffffff8103085d>] ablk_init_common+0x1d/0x38 [ 82.174249] [<ffffffff8103852a>] ablk_ecb_init+0x15/0x17 [ 82.174251] [<ffffffff811dd8c6>] __crypto_alloc_tfm+0xc7/0x114 [ 82.174254] [<ffffffff811e0caa>] ? crypto_lookup_skcipher+0x1f/0xe4 [ 82.174256] [<ffffffff811e0dcf>] crypto_alloc_ablkcipher+0x60/0xa5 [ 82.174258] [<ffffffff811e5bde>] alg_test_skcipher+0x24/0x9b [ 82.174261] [<ffffffff8106d96d>] ? finish_task_switch+0x3f/0xfa [ 82.174263] [<ffffffff811e6b8e>] alg_test+0x16f/0x1d7 [ 82.174267] [<ffffffff811e45ac>] ? cryptomgr_probe+0xac/0xac [ 82.174269] [<ffffffff811e45d8>] cryptomgr_test+0x2c/0x47 [ 82.174272] [<ffffffff81061161>] kthread+0x7e/0x86 [ 82.174275] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa [ 82.174278] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10 [ 82.174281] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13 [ 82.174284] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c [ 82.174287] [<ffffffff814987c0>] ? gs_change+0x13/0x13 [ 82.174329] cryptomgr_probe D 0000000000000002 0 47 2 0x00000000 [ 82.174332] ffff88042c991b70 0000000000000046 ffff88042c991bb0 0000000000000006 [ 82.174335] 0000000000011cc0 0000000000011cc0 ffff88042c98ed00 ffff88042c991fd8 [ 82.174338] 0000000000011cc0 0000000000011cc0 ffff88042c991fd8 0000000000011cc0 [ 82.174338] Call Trace: [ 82.174342] [<ffffffff8148fd3f>] schedule+0x64/0x66 [ 82.174344] [<ffffffff814901ad>] __rt_mutex_slowlock+0x85/0xbe [ 82.174347] [<ffffffff814902d2>] rt_mutex_slowlock+0xec/0x159 [ 82.174351] [<ffffffff81089c4d>] rt_mutex_fastlock.clone.8+0x29/0x2f [ 82.174353] [<ffffffff81490372>] rt_mutex_lock+0x33/0x37 [ 82.174356] [<ffffffff8108a0f2>] __rt_down_read+0x50/0x5a [ 82.174358] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12 [ 82.174360] [<ffffffff8108a11c>] rt_down_read+0x10/0x12 [ 82.174363] [<ffffffff810677b5>] __blocking_notifier_call_chain+0x58/0x8d [ 82.174366] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16 [ 82.174369] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50 [ 82.174372] [<ffffffff811debd6>] crypto_wait_for_test+0x22/0x6b [ 82.174374] [<ffffffff811decd3>] crypto_register_instance+0xb4/0xc0 [ 82.174377] [<ffffffff811e9b76>] cryptd_create+0x378/0x3b6 [ 82.174379] [<ffffffff811de512>] ? __crypto_lookup_template+0x5b/0x63 [ 82.174382] [<ffffffff811e4545>] cryptomgr_probe+0x45/0xac [ 82.174385] [<ffffffff811e4500>] ? crypto_alloc_pcomp+0x1b/0x1b [ 82.174388] [<ffffffff81061161>] kthread+0x7e/0x86 [ 82.174391] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa [ 82.174394] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10 [ 82.174398] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13 [ 82.174401] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c [ 82.174403] [<ffffffff814987c0>] ? gs_change+0x13/0x13 cryptomgr_test spawns the cryptomgr_probe thread from the notifier call. The probe thread fires the same notifier as the test thread and deadlocks on the rwsem on RT. Now this is a potential deadlock in mainline as well, because we have fifo fair rwsems. If another thread blocks with a down_write() on the notifier chain before the probe thread issues the down_read() it will block the probe thread and the whole party is dead locked. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17net: Add a mutex around devnet_rename_seqSebastian Andrzej Siewior
On RT write_seqcount_begin() disables preemption and device_rename() allocates memory with GFP_KERNEL and grabs later the sysfs_mutex mutex. Serialize with a mutex and add use the non preemption disabling __write_seqcount_begin(). To avoid writer starvation, let the reader grab the mutex and release it when it detects a writer in progress. This keeps the normal case (no reader on the fly) fast. [ tglx: Instead of replacing the seqcount by a mutex, add the mutex ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner
The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17net: Another local_irq_disable/kmalloc headacheThomas Gleixner
Replace it by a local lock. Though that's pretty inefficient :( Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17net: Remove preemption disabling in netif_rx()Priyanka Jain
1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17scsi: qla2xxx: Use local_irq_save_nort() in qla2x00_pollJohn Kacur
RT triggers the following: [ 11.307652] [<ffffffff81077b27>] __might_sleep+0xe7/0x110 [ 11.307663] [<ffffffff8150e524>] rt_spin_lock+0x24/0x60 [ 11.307670] [<ffffffff8150da78>] ? rt_spin_lock_slowunlock+0x78/0x90 [ 11.307703] [<ffffffffa0272d83>] qla24xx_intr_handler+0x63/0x2d0 [qla2xxx] [ 11.307736] [<ffffffffa0262307>] qla2x00_poll+0x67/0x90 [qla2xxx] Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled. Therefore we use local_irq_save_nort() instead which saves flags without disabling interrupts. This fix needs to be applied to v3.0-rt, v3.2-rt and v3.4-rt Suggested-by: Thomas Gleixner Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: David Sommerseth <davids@redhat.com> Link: http://lkml.kernel.org/r/1335523726-10024-1-git-send-email-jkacur@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17hotplug: Use set_cpus_allowed_ptr() in sync_unplug_thread()Mike Galbraith
do_set_cpus_allowed() is not safe vs ->sched_class change. crash> bt PID: 11676 TASK: ffff88026f979da0 CPU: 22 COMMAND: "sync_unplug/22" #0 [ffff880274d25bc8] machine_kexec at ffffffff8103b41c #1 [ffff880274d25c18] crash_kexec at ffffffff810d881a #2 [ffff880274d25cd8] oops_end at ffffffff81525818 #3 [ffff880274d25cf8] do_invalid_op at ffffffff81003096 #4 [ffff880274d25d90] invalid_op at ffffffff8152d3de [exception RIP: set_cpus_allowed_rt+18] RIP: ffffffff8109e012 RSP: ffff880274d25e48 RFLAGS: 00010202 RAX: ffffffff8109e000 RBX: ffff88026f979da0 RCX: ffff8802770cb6e8 RDX: 0000000000000000 RSI: ffffffff81add700 RDI: ffff88026f979da0 RBP: ffff880274d25e78 R8: ffffffff816112e0 R9: 0000000000000001 R10: 0000000000000001 R11: 0000000000011940 R12: ffff88026f979da0 R13: ffff8802770cb6d0 R14: ffff880274d25fd8 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffff880274d25e60] do_set_cpus_allowed at ffffffff8108e65f #6 [ffff880274d25e80] sync_unplug_thread at ffffffff81058c08 #7 [ffff880274d25ed8] kthread at ffffffff8107cad6 #8 [ffff880274d25f50] ret_from_fork at ffffffff8152bbbc crash> task_struct ffff88026f979da0 | grep class sched_class = 0xffffffff816111e0 <fair_sched_class+64>, Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17cpu_down: move migrate_enable() backTiejun Chen
Commit 08c1ab68, "hotplug-use-migrate-disable.patch", intends to use migrate_enable()/migrate_disable() to replace that combination of preempt_enable() and preempt_disable(), but actually in !CONFIG_PREEMPT_RT_FULL case, migrate_enable()/migrate_disable() are still equal to preempt_enable()/preempt_disable(). So that followed cpu_hotplug_begin()/cpu_unplug_begin(cpu) would go schedule() to trigger schedule_debug() like this: _cpu_down() | + migrate_disable() = preempt_disable() | + cpu_hotplug_begin() or cpu_unplug_begin() | + schedule() | + __schedule() | + preempt_disable(); | + __schedule_bug() is true! So we should move migrate_enable() as the original scheme. Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
2015-08-17kernel/hotplug: restore original cpu mask oncpu/downSebastian Andrzej Siewior
If a task which is allowed to run only on CPU X puts CPU Y down then it will be allowed on all CPUs but the on CPU Y after it comes back from kernel. This patch ensures that we don't lose the initial setting unless the CPU the task is running is going down. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17kernel/cpu: fix cpu down problem if kthread's cpu is going downSebastian Andrzej Siewior
If kthread is pinned to CPUx and CPUx is going down then we get into trouble: - first the unplug thread is created - it will set itself to hp->unplug. As a result, every task that is going to take a lock, has to leave the CPU. - the CPU_DOWN_PREPARE notifier are started. The worker thread will start a new process for the "high priority worker". Now kthread would like to take a lock but since it can't leave the CPU it will never complete its task. We could fire the unplug thread after the notifier but then the cpu is no longer marked "online" and the unplug thread will run on CPU0 which was fixed before :) So instead the unplug thread is started and kept waiting until the notfier complete their work. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17cpu hotplug: Document why PREEMPT_RT uses a spinlockSteven Rostedt
The patch: cpu: Make hotplug.lock a "sleeping" spinlock on RT Tasks can block on hotplug.lock in pin_current_cpu(), but their state might be != RUNNING. So the mutex wakeup will set the state unconditionally to RUNNING. That might cause spurious unexpected wakeups. We could provide a state preserving mutex_lock() function, but this is semantically backwards. So instead we convert the hotplug.lock() to a spinlock for RT, which has the state preserving semantics already. Fixed a bug where the hotplug lock on PREEMPT_RT can be called after a task set its state to TASK_UNINTERRUPTIBLE and before it called schedule. If the hotplug_lock used a mutex, and there was contention, the current task's state would be turned to TASK_RUNNABLE and the schedule call will not sleep. This caused unexpected results. Although the patch had a description of the change, the code had no comments about it. This causes confusion to those that review the code, and as PREEMPT_RT is held in a quilt queue and not git, it's not as easy to see why a change was made. Even if it was in git, the code should still have a comment for something as subtle as this. Document the rational for using a spinlock on PREEMPT_RT in the hotplug lock code. Reported-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17cpu/rt: Rework cpu down for PREEMPT_RTSteven Rostedt
Bringing a CPU down is a pain with the PREEMPT_RT kernel because tasks can be preempted in many more places than in non-RT. In order to handle per_cpu variables, tasks may be pinned to a CPU for a while, and even sleep. But these tasks need to be off the CPU if that CPU is going down. Several synchronization methods have been tried, but when stressed they failed. This is a new approach. A sync_tsk thread is still created and tasks may still block on a lock when the CPU is going down, but how that works is a bit different. When cpu_down() starts, it will create the sync_tsk and wait on it to inform that current tasks that are pinned on the CPU are no longer pinned. But new tasks that are about to be pinned will still be allowed to do so at this time. Then the notifiers are called. Several notifiers will bring down tasks that will enter these locations. Some of these tasks will take locks of other tasks that are on the CPU. If we don't let those other tasks continue, but make them block until CPU down is done, the tasks that the notifiers are waiting on will never complete as they are waiting for the locks held by the tasks that are blocked. Thus we still let the task pin the CPU until the notifiers are done. After the notifiers run, we then make new tasks entering the pinned CPU sections grab a mutex and wait. This mutex is now a per CPU mutex in the hotplug_pcp descriptor. To help things along, a new function in the scheduler code is created called migrate_me(). This function will try to migrate the current task off the CPU this is going down if possible. When the sync_tsk is created, all tasks will then try to migrate off the CPU going down. There are several cases that this wont work, but it helps in most cases. After the notifiers are called and if a task can't migrate off but enters the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex until the CPU down is complete. Then the scheduler will force the migration anyway. Also, I found that THREAD_BOUND need to also be accounted for in the pinned CPU, and the migrate_disable no longer treats them special. This helps fix issues with ksoftirqd and workqueue that unbind on CPU down. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17cpu: Make hotplug.lock a "sleeping" spinlock on RTSteven Rostedt
Tasks can block on hotplug.lock in pin_current_cpu(), but their state might be != RUNNING. So the mutex wakeup will set the state unconditionally to RUNNING. That might cause spurious unexpected wakeups. We could provide a state preserving mutex_lock() function, but this is semantically backwards. So instead we convert the hotplug.lock() to a spinlock for RT, which has the state preserving semantics already. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Link: http://lkml.kernel.org/r/1330702617.25686.265.camel@gandalf.stny.rr.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17seqlock: Prevent rt starvationThomas Gleixner
If a low prio writer gets preempted while holding the seqlock write locked, a high prio reader spins forever on RT. To prevent this let the reader grab the spinlock, so it blocks and eventually boosts the writer. This way the writer can proceed and endless spinning is prevented. For seqcount writers we disable preemption over the update code path. Thanks to Al Viro for distangling some VFS code to make that possible. Nicholas Mc Guire: - spin_lock+unlock => spin_unlock_wait - __write_seqcount_begin => __raw_write_seqcount_begin Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17random: Make it work on rtThomas Gleixner
Delegate the random insertion to the forced threaded interrupt handler. Store the return IP of the hard interrupt handler in the irq descriptor and feed it into the random generator as a source of entropy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17cpumask: Disable CONFIG_CPUMASK_OFFSTACK for RTThomas Gleixner
We can't deal with the cpumask allocations which happen in atomic context (see arch/x86/kernel/apic/io_apic.c) on RT right now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17acpi/rt: Convert acpi_gbl_hardware lock back to a raw_spinlock_tSteven Rostedt
We hit the following bug with 3.6-rt: [ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002 [ 5.898991] no locks held by swapper/3/0. [ 5.898993] Modules linked in: [ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.898997] Call Trace: [ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002 [ 5.899037] no locks held by swapper/7/0. [ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899040] Modules linked in: [ 5.899041] [ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.899047] Call Trace: [ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0 [ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51 [ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e [ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30 [ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20 [ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50 [ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ? As the acpi code disables interrupts in acpi_idle_enter_bm, and calls code that grabs the acpi lock, it causes issues as the lock is currently in RT a sleeping lock. The lock was converted from a raw to a sleeping lock due to some previous issues, and tests that showed it didn't seem to matter. Unfortunately, it did matter for one of our boxes. This patch converts the lock back to a raw lock. I've run this code on a few of my own machines, one being my laptop that uses the acpi quite extensively. I've been able to suspend and resume without issues. [ tglx: Made the change exclusive for acpi_gbl_hardware_lock ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: John Kacur <jkacur@gmail.com> Cc: Clark Williams <clark@redhat.com> Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17dm: Make rt awareThomas Gleixner
Use the BUG_ON_NORT variant for the irq_disabled() checks. RT has interrupts legitimately enabled here as we cant deadlock against the irq thread due to the "sleeping spinlocks" conversion. Reported-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17crypto: Reduce preempt disabled regions, more algosSebastian Andrzej Siewior
Don Estabrook reported | kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100() | kernel: WARNING: CPU: 2 PID: 858 at kernel/sched/core.c:2462 migrate_enable+0x17b/0x200() | kernel: WARNING: CPU: 3 PID: 865 at kernel/sched/core.c:2428 migrate_disable+0xed/0x100() and his backtrace showed some crypto functions which looked fine. The problem is the following sequence: glue_xts_crypt_128bit() { blkcipher_walk_virt(); /* normal migrate_disable() */ glue_fpu_begin(); /* get atomic */ while (nbytes) { __glue_xts_crypt_128bit(); blkcipher_walk_done(); /* with nbytes = 0, migrate_enable() * while we are atomic */ }; glue_fpu_end() /* no longer atomic */ } and this is why the counter get out of sync and the warning is printed. The other problem is that we are non-preemptible between glue_fpu_begin() and glue_fpu_end() and the latency grows. To fix this, I shorten the FPU off region and ensure blkcipher_walk_done() is called with preemption enabled. This might hurt the performance because we now enable/disable the FPU state more often but we gain lower latency and the bug is gone. Reported-by: Don Estabrook <don.estabrook@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17x86: crypto: Reduce preempt disabled regionsPeter Zijlstra
Restrict the preempt disabled regions to the actual floating point operations and enable preemption for the administrative actions. This is necessary on RT to avoid that kfree and other operations are called with preemption disabled. Reported-and-tested-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-17sas-ata/isci: dont't disable interrupts in qc_issue handlerPaul Gortmaker
On 3.14-rt we see the following trace on Canoe Pass for SCSI_ISCI "Intel(R) C600 Series Chipset SAS Controller" when the sas qc_issue handler is run: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:905 in_atomic(): 0, irqs_disabled(): 1, pid: 432, name: udevd CPU: 11 PID: 432 Comm: udevd Not tainted 3.14.28-rt22 #2 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.01.0002.082220131453 08/22/2013 ffff880fab500000 ffff880fa9f239c0 ffffffff81a2d273 0000000000000000 ffff880fa9f239d8 ffffffff8107f023 ffff880faac23dc0 ffff880fa9f239f0 ffffffff81a33cc0 ffff880faaeb1400 ffff880fa9f23a40 ffffffff815de891 Call Trace: [<ffffffff81a2d273>] dump_stack+0x4e/0x7a [<ffffffff8107f023>] __might_sleep+0xe3/0x160 [<ffffffff81a33cc0>] rt_spin_lock+0x20/0x50 [<ffffffff815de891>] isci_task_execute_task+0x171/0x2f0 <----- [<ffffffff815cfecb>] sas_ata_qc_issue+0x25b/0x2a0 [<ffffffff81606363>] ata_qc_issue+0x1f3/0x370 [<ffffffff8160c600>] ? ata_scsi_invalid_field+0x40/0x40 [<ffffffff8160c8f5>] ata_scsi_translate+0xa5/0x1b0 [<ffffffff8160efc6>] ata_sas_queuecmd+0x86/0x280 [<ffffffff815ce446>] sas_queuecommand+0x196/0x230 [<ffffffff81081fad>] ? get_parent_ip+0xd/0x50 [<ffffffff815b05a4>] scsi_dispatch_cmd+0xb4/0x210 [<ffffffff815b7744>] scsi_request_fn+0x314/0x530 and gdb shows: (gdb) list * isci_task_execute_task+0x171 0xffffffff815ddfb1 is in isci_task_execute_task (drivers/scsi/isci/task.c:138). 133 dev_dbg(&ihost->pdev->dev, "%s: num=%d\n", __func__, num); 134 135 for_each_sas_task(num, task) { 136 enum sci_status status = SCI_FAILURE; 137 138 spin_lock_irqsave(&ihost->scic_lock, flags); <----- 139 idev = isci_lookup_device(task->dev); 140 io_ready = isci_device_io_ready(idev, task); 141 tag = isci_alloc_tag(ihost); 142 spin_unlock_irqrestore(&ihost->scic_lock, flags); (gdb) In addition to the scic_lock, the function also contains locking of the task_state_lock -- which is clearly not a candidate for raw lock conversion. As can be seen by the comment nearby, we really should be running the qc_issue code with interrupts enabled anyway. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-08-17scsi/fcoe: Make RT aware.Thomas Gleixner
Do not disable preemption while taking sleeping locks. All user look safe for migrate_diable() only. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>