aboutsummaryrefslogtreecommitdiff
path: root/kernel/rcutree.c
AgeCommit message (Collapse)Author
2012-05-14Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the v3.5 RCU tree from Paul E. McKenney: 1) A set of improvements and fixes to the RCU_FAST_NO_HZ feature (with more on the way for 3.6). Posted to LKML: https://lkml.org/lkml/2012/4/23/324 (commits 1-3 and 5), https://lkml.org/lkml/2012/4/16/611 (commit 4), https://lkml.org/lkml/2012/4/30/390 (commit 6), and https://lkml.org/lkml/2012/5/4/410 (commit 7, combined with the other commits for the convenience of the tester). 2) Changes to make rcu_barrier() avoid disrupting execution of CPUs that have no RCU callbacks. Posted to LKML: https://lkml.org/lkml/2012/4/23/322. 3) A couple of commits that improve the efficiency of the interaction between preemptible RCU and the scheduler, these two being all that survived an abortive attempt to allow preemptible RCU's __rcu_read_lock() to be inlined. The full set was posted to LKML at https://lkml.org/lkml/2012/4/14/143, and the first and third patches of that set remain. 4) Lai Jiangshan's algorithmic implementation of SRCU, which includes call_srcu() and srcu_barrier(). A major feature of this new implementation is that synchronize_srcu() no longer disturbs the execution of other CPUs. This work is based on earlier implementations by Peter Zijlstra and Paul E. McKenney. Posted to LKML: https://lkml.org/lkml/2012/2/22/82. 5) A number of miscellaneous bug fixes and improvements which were posted to LKML at: https://lkml.org/lkml/2012/4/23/353 with subsequent updates posted to LKML. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-11Merge branches 'barrier.2012.05.09a', 'fixes.2012.04.26a', ↵Paul E. McKenney
'inline.2012.05.02b' and 'srcu.2012.05.07b' into HEAD barrier: Reduce the amount of disturbance by rcu_barrier() to the rest of the system. This branch also includes improvements to RCU_FAST_NO_HZ, which are included here due to conflicts. fixes: Miscellaneous fixes. inline: Remaining changes from an abortive attempt to inline preemptible RCU's __rcu_read_lock(). These are (1) making exit_rcu() avoid unnecessary work and (2) avoiding having preemptible RCU record a blocked thread when the scheduler declines to do a context switch. srcu: Lai Jiangshan's algorithmic implementation of SRCU, including call_srcu().
2012-05-09rcu: Make rcu_barrier() less disruptivePaul E. McKenney
The rcu_barrier() primitive interrupts each and every CPU, registering a callback on every CPU. Once all of these callbacks have been invoked, rcu_barrier() knows that every callback that was registered before the call to rcu_barrier() has also been invoked. However, there is no point in registering a callback on a CPU that currently has no callbacks, most especially if that CPU is in a deep idle state. This commit therefore makes rcu_barrier() avoid interrupting CPUs that have no callbacks. Doing this requires reworking the handling of orphaned callbacks, otherwise callbacks could slip through rcu_barrier()'s net by being orphaned from a CPU that rcu_barrier() had not yet interrupted to a CPU that rcu_barrier() had already interrupted. This reworking was needed anyway to take a first step towards weaning RCU from the CPU_DYING notifier's use of stop_cpu(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-05-02rcu: Move PREEMPT_RCU preemption to switch_to() invocationPaul E. McKenney
Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler. This is inefficient because enqueuing is required only if there is a context switch, and entry to the scheduler does not guarantee a context switch. The commit therefore moves the enqueuing to immediately precede the call to switch_to() from the scheduler. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-24rcu: Make RCU_FAST_NO_HZ account for pauses out of idlePaul E. McKenney
Both Steven Rostedt's new idle-capable trace macros and the RCU_NONIDLE() macro can cause RCU to momentarily pause out of idle without the rest of the system being involved. This can cause rcu_prepare_for_idle() to run through its state machine too quickly, which can in turn result in needless scheduling-clock interrupts. This commit therefore adds code to enable rcu_prepare_for_idle() to distinguish between an initial entry to idle on the one hand (which needs to advance the rcu_prepare_for_idle() state machine) and an idle reentry due to idle-capable trace macros and RCU_NONIDLE() on the other hand (which should avoid advancing the rcu_prepare_for_idle() state machine). Additional state is maintained to allow the timer to be correctly reposted when returning after a momentary pause out of idle, and even more state is maintained to detect when new non-lazy callbacks have been enqueued (which may require re-evaluation of the approach to idleness). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24rcu: Document why rcu_blocking_is_gp() is safePaul E. McKenney
The rcu_blocking_is_gp() function tests to see if there is only one online CPU, and if so, synchronize_sched() and friends become no-ops. However, for larger systems, num_online_cpus() scans a large vector, and might be preempted while doing so. While preempted, any number of CPUs might come online and go offline, potentially resulting in num_online_cpus() returning 1 when there never had only been one CPU online. This could result in a too-short RCU grace period, which could in turn result in total failure, except that the only way that the grace period is too short is if there is an RCU read-side critical section spanning it. For RCU-sched and RCU-bh (which are the only cases using rcu_blocking_is_gp()), RCU read-side critical sections have either preemption or bh disabled, which prevents CPUs from going offline. This in turn prevents actual failures from occurring. This commit therefore adds a large block comment to rcu_blocking_is_gp() documenting why it is safe. This commit also moves rcu_blocking_is_gp() into kernel/rcutree.c, which should help prevent unwary developers from mistaking it for a generally useful function. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-24rcu: Reduce cache-miss initialization latencies for large systemsPaul E. McKenney
Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper limit of 16 on the leaf-level fanout for the rcu_node tree. This was needed to reduce lock contention that was induced by the synchronization of scheduling-clock interrupts, which was in turn needed to improve energy efficiency for moderate-sized lightly loaded servers. However, reducing the leaf-level fanout means that there are more leaf-level rcu_node structures in the tree, which in turn means that RCU's grace-period initialization incurs more cache misses. This is not a problem on moderate-sized servers with only a few tens of CPUs, but becomes a major source of real-time latency spikes on systems with many hundreds of CPUs. In addition, the workloads running on these large systems tend to be CPU-bound, which eliminates the energy-efficiency advantages of synchronizing scheduling-clock interrupts. Therefore, these systems need maximal values for the rcu_node leaf-level fanout. This commit addresses this problem by introducing a new kernel parameter named RCU_FANOUT_LEAF that directly controls the leaf-level fanout. This parameter defaults to 16 to handle the common case of a moderate sized lightly loaded servers, but may be set higher on larger systems. Reported-by: Mike Galbraith <efault@gmx.de> Reported-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-04-17rcu: Permit call_rcu() from CPU_DYING notifiersPaul E. McKenney
As of: 29494be71afe ("rcu,cleanup: simplify the code when cpu is dying") RCU adopts callbacks from the dying CPU in its CPU_DYING notifier, which means that any callbacks posted by later CPU_DYING notifiers are ignored until the CPU comes back online. A WARN_ON_ONCE() was added to __call_rcu() by: e56014000816 ("rcu: Simplify offline processing") to check for this condition. Although this condition did not trigger (at least as far as I know) during -next testing, it did recently trigger in mainline: https://lkml.org/lkml/2012/4/2/34 What is needed longer term is for RCU's CPU_DEAD notifier to adopt any callbacks that were posted by CPU_DYING notifiers, however, the Linux kernel has been running with this sort of thing happening for quite some time. So the only thing that qualifies as a regression is the WARN_ON_ONCE(), which this commit removes. Making RCU's CPU_DEAD notifier adopt callbacks posted by CPU_DYING notifiers is a topic for the 3.5 release of the Linux kernel. Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Stop spurious warnings from synchronize_sched_expeditedHugh Dickins
synchronize_sched_expedited() is spamming CONFIG_DEBUG_PREEMPT=y users with an unintended warning from the cpu_is_offline() check: use raw_smp_processor_id() instead of smp_processor_id() there. Because the warning is under a get_online_cpus(), it is not possible for any CPUs to go offline, though it is quite possible that the task might migrate between the raw_smp_processor_id() and the check of cpu_is_offline(). This is not a problem because the task cannot migrate from an offline CPU to an online one or vice versa. The point of the check is to verify that synchronize_sched_expedited() is not called from an offline CPU, for example, from a CPU_DYING notifier, or, more important, from an outgoing CPU making its way from its CPU_DYING notifiers to the idle loop. Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Add RCU_NONIDLE() for idle-loop RCU read-side critical sectionsPaul E. McKenney
RCU, RCU-bh, and RCU-sched read-side critical sections are forbidden in the inner idle loop, that is, between the rcu_idle_enter() and the rcu_idle_exit() -- RCU will happily ignore any such read-side critical sections. However, things like powertop need tracepoints in the inner idle loop. This commit therefore provides an RCU_NONIDLE() macro that can be used to wrap code in the idle loop that requires RCU read-side critical sections. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
2012-02-21rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit()Paul E. McKenney
Use of RCU in the idle loop is incorrect, quite a few instances of just that have made their way into mainline, primarily event tracing. The problem with RCU read-side critical sections on CPUs that RCU believes to be idle is that RCU is completely ignoring the CPU, along with any attempts and RCU read-side critical sections. The approaches of eliminating the offending uses and of pushing the definition of idle down beyond the offending uses have both proved impractical. The new approach is to encapsulate offending uses of RCU with rcu_idle_exit() and rcu_idle_enter(), but this requires nesting for code that is invoked both during idle and and during normal execution. Therefore, this commit modifies rcu_idle_enter() and rcu_idle_exit() to permit nesting. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
2012-02-21rcu: Call out dangers of expedited RCU primitivesPaul E. McKenney
The expedited RCU primitives can be quite useful, but they have some high costs as well. This commit updates and creates docbook comments calling out the costs, and updates the RCU documentation as well. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Rework detection of use of RCU by offline CPUsPaul E. McKenney
Because newly offlined CPUs continue executing after completing the CPU_DYING notifiers, they legitimately enter the scheduler and use RCU while appearing to be offline. This calls for a more sophisticated approach as follows: 1. RCU marks the CPU online during the CPU_UP_PREPARE phase. 2. RCU marks the CPU offline during the CPU_DEAD phase. 3. Diagnostics regarding use of read-side RCU by offline CPUs use RCU's accounting rather than the cpu_online_map. (Note that __call_rcu() still uses cpu_online_map to detect illegal invocations within CPU_DYING notifiers.) 4. Offline CPUs are prevented from hanging the system by force_quiescent_state(), which pays attention to cpu_online_map. Some additional work (in a later commit) will be needed to guarantee that force_quiescent_state() waits a full jiffy before assuming that a CPU is offline, for example, when called from idle entry. (This commit also makes the one-jiffy wait explicit, since the old-style implicit wait can now be defeated by RCU_FAST_NO_HZ and by rcutorture.) This approach avoids the false positives encountered when attempting to use more exact classification of CPU online/offline state. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Move synchronize_sched_expedited() to rcutree.cPaul E. McKenney
Now that TREE_RCU and TREE_PREEMPT_RCU no longer do anything different for the single-CPU case, there is no need for multiple definitions of synchronize_sched_expedited(). It is no longer in any sense a plug-in, so move it from kernel/rcutree_plugin.h to kernel/rcutree.c. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Check for illegal use of RCU from offlined CPUsPaul E. McKenney
Although it is legal to use RCU during early boot, it is anything but legal to use RCU at runtime from an offlined CPU. After all, RCU explicitly ignores offlined CPUs. This commit therefore adds checks for runtime use of RCU from offlined CPUs. These checks are not perfect, in particular, they can be subverted through use of things like rcu_dereference_raw(). Note that it is not possible to put checks in rcu_read_lock() and friends due to the fact that these primitives are used in code that might be used under either RCU or lock-based protection, which means that checking rcu_read_lock() gets you fat piles of false positives. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Print scheduling-clock information on RCU CPU stall-warning messagesPaul E. McKenney
There have been situations where RCU CPU stall warnings were caused by issues in scheduling-clock timer initialization. To make it easier to track these down, this commit causes the RCU CPU stall-warning messages to print out the number of scheduling-clock interrupts taken in the current grace period for each stalled CPU. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Set RCU CPU stall times via sysfsPaul E. McKenney
The default CONFIG_RCU_CPU_STALL_TIMEOUT value of 60 seconds has served Linux users well for production use for quite some time. However, for debugging, there will be more than three minutes between subsequent stall-warning messages. This can be an annoyingly long wait if you are trying to work out where the offending infinite loop is hiding. Therefore, this commit provides a rcu_cpu_stall_timeout sysfs parameter that may be adjusted at boot time and at runtime to speed up debugging. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Remove #ifdef CONFIG_SMP from TREE_RCUPaul E. McKenney
Now that both TINY_RCU and TINY_PREEMPT_RCU have been in place for awhile, it is time to remove UP support from TREE_RCU, which is what this commit does. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Check for idle-loop entry while in RCU read-side critical sectionPaul E. McKenney
The inner idle loop is an extended quiescent state for all flavors of RCU, but there have been recent bug involving use of RCU read-side primitives from within the idle loop. Therefore, this commit enlists lockdep-RCU to detect attempts to enter the inner idle loop while in an RCU read-side critical section, emitting a lockdep-RCU splat if so. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Clean up straggling rcu_preempt_needs_cpu() namePaul E. McKenney
The recent updates to RCU_CPU_FAST_NO_HZ have an rcu_needs_cpu() that does more than just check for callbacks, so get the name for rcu_preempt_needs_cpu() consistent with that change, now calling it rcu_preempt_cpu_has_callbacks(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Remove single-rcu_node optimization in rcu_start_gp()Paul E. McKenney
The grace-period initialization sequence in rcu_start_gp() has a special case for systems where the rcu_node tree is a single rcu_node structure. This made sense some years ago when systems were smaller and up to 64 CPUs could share a single rcu_node structure, but now that large systems are common and a given leaf rcu_node structure can support only 16 CPUs (due to lock contention on the rcu_node's ->lock field), this optimization is almost never taken. And even the small mobile platforms that might make use of it might rather have the kernel text reduction. Therefore, this commit removes the check for single-rcu_node trees. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2012-02-21rcu: Don't make callbacks go through second full grace periodPaul E. McKenney
RCU's current CPU-offline code path dumps all of the outgoing CPU's callbacks onto the RCU_NEXT_TAIL portion of the surviving CPU's callback list. This means that all the ready-to-invoke callbacks from the outgoing CPU must wait for another full RCU grace period. This was just fine when CPU-hotplug events were rare, but there is increasing evidence that users are planning to make increasing use of CPU hotplug. Therefore, this commit changes the callback-dumping procedure so that callbacks that are ready to invoke are moved to the RCU_DONE_TAIL portion of the surviving CPU's callback list. This avoids running these callbacks through a second unnecessary grace period. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Check for callback invocation from offline CPUsPaul E. McKenney
Because quiescent states are now reported from offline CPUs in CPU_DYING state, there is some possibility that such a CPU might note the end of a grace period and attempt to start invoking callbacks. This would be a very bad thing, and is supposed to be prevented by the fact that the CPU_DYING CPU gets rid of all its callbacks before reporting the quiescent state. However, there is other CPU-offline code in the kernel, and it is quite possible that someone will invoke RCU core processing from that code. Therefore, this commit adds a warning for this case. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Simplify offline processingPaul E. McKenney
Move ->qsmaskinit and blkd_tasks[] manipulation to the CPU_DYING notifier. This simplifies the code by eliminating a potential deadlock and by reducing the responsibilities of force_quiescent_state(). Also rename functions to make their connection to the CPU-hotplug stages explicit. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Avoid waking up CPUs having only kfree_rcu() callbacksPaul E. McKenney
When CONFIG_RCU_FAST_NO_HZ is enabled, RCU will allow a given CPU to enter dyntick-idle mode even if it still has RCU callbacks queued. RCU avoids system hangs in this case by scheduling a timer for several jiffies in the future. However, if all of the callbacks on that CPU are from kfree_rcu(), there is no reason to wake the CPU up, as it is not a problem to defer freeing of memory. This commit therefore tracks the number of callbacks on a given CPU that are from kfree_rcu(), and avoids scheduling the timer if all of a given CPU's callbacks are from kfree_rcu(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Add diagnostic for misaligned rcu_head structuresPaul E. McKenney
The push for energy efficiency will require that RCU tag rcu_head structures to indicate whether or not their invocation is time critical. This tagging is best carried out in the bottom bits of the ->next pointers in the rcu_head structures. This tagging requires that the rcu_head structures be properly aligned, so this commit adds the required diagnostics. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-02-21rcu: Add lockdep-RCU checks for simple self-deadlockPaul E. McKenney
It is illegal to have a grace period within a same-flavor RCU read-side critical section, so this commit adds lockdep-RCU checks to splat when such abuse is encountered. This commit does not detect more elaborate RCU deadlock situations. These situations might be a job for lockdep enhancements. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Augment rcu_batch_end tracing for idle and callback statePaul E. McKenney
The current rcu_batch_end event trace records only the name of the RCU flavor and the total number of callbacks that remain queued on the current CPU. This is insufficient for testing and tuning the new dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along with whether or not any of the callbacks that were ready to invoke at the beginning of rcu_do_batch() are still queued. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Remove redundant rcu_cpu_stall_suppress declarationPaul E. McKenney
No point in having two identical rcu_cpu_stall_suppress declarations, so remove the more obscure of the two. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Keep invoking callbacks if CPU otherwise idlePaul E. McKenney
The rcu_do_batch() function that invokes callbacks for TREE_RCU and TREE_PREEMPT_RCU normally throttles callback invocation to avoid degrading scheduling latency. However, as long as the CPU would otherwise be idle, there is no downside to continuing to invoke any callbacks that have passed through their grace periods. In fact, processing such callbacks in a timely manner has the benefit of increasing the probability that the CPU can enter the power-saving dyntick-idle mode. Therefore, this commit allows callback invocation to continue beyond the preset limit as long as the scheduler does not have some other task to run and as long as context is that of the idle task or the relevant RCU kthread. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Irq nesting is always 0 on rcu_enter_idle_commonFrederic Weisbecker
Because tasks don't nest, the ->dyntick_nesting must always be zero upon entry to rcu_idle_enter_common(). Therefore, pass "0" rather than the counter itself. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Don't check irq nesting from rcu idle entry/exitFrederic Weisbecker
Because tasks do not nest, rcu_idle_enter() and rcu_idle_exit() do not need to check for nesting. This commit therefore moves nesting checks from rcu_idle_enter_common() to rcu_irq_exit() and from rcu_idle_exit_common() to rcu_irq_enter(). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Permit dyntick-idle with callbacks pendingPaul E. McKenney
The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering dyntick-idle state if they have RCU callbacks pending. Unfortunately, this has the side-effect of often preventing them from entering this state, especially if at least one other CPU is not in dyntick-idle state. However, the resulting per-tick wakeup is wasteful in many cases: if the CPU has already fully responded to the current RCU grace period, there will be nothing for it to do until this grace period ends, which will frequently take several jiffies. This commit therefore permits a CPU that has done everything that the current grace period has asked of it (rcu_pending() == 0) even if it still as RCU callbacks pending. However, such a CPU posts a timer to wake it up several jiffies later (6 jiffies, based on experience with grace-period lengths). This wakeup is required to handle situations that can result in all CPUs being in dyntick-idle mode, thus failing to ever complete the current grace period. If a CPU wakes up before the timer goes off, then it cancels that timer, thus avoiding spurious wakeups. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Identify dyntick-idle CPUs on first force_quiescent_state() passPaul E. McKenney
Fixes and workarounds for a number of issues (for example, that in df4012edc) make it safe to once again detect dyntick-idle CPUs on the first pass of force_quiescent_state(), so this commit makes that change. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Remove dynticks false positives and RCU failuresPaul E. McKenney
Assertions in rcu_init_percpu_data() unknowingly relied on outgoing CPUs being turned off before reaching the idle loop. Unfortunately, when running under kvm/qemu on x86, CPUs really can get to idle before begin shut off. These CPUs are then born in dyntick-idle mode from an RCU perspective, which results in splats in rcu_init_percpu_data() and in RCU wrongly ignoring those CPUs despite them being active. This in turn can cause RCU to end grace periods prematurely, potentially freeing up memory that the newly onlined CPUs were still using. This is most decidedly not what we need to see in an RCU implementation. This commit therefore replaces the assertions in rcu_init_percpu_data() with code that forces RCU's dyntick-idle view of newly onlined CPUs to match reality. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Eliminate RCU_FAST_NO_HZ grace-period hangPaul E. McKenney
With the new implementation of RCU_FAST_NO_HZ, it was possible to hang RCU grace periods as follows: o CPU 0 attempts to go idle, cycles several times through the rcu_prepare_for_idle() loop, then goes dyntick-idle when RCU needs nothing more from it, while still having at least on RCU callback pending. o CPU 1 goes idle with no callbacks. Both CPUs can then stay in dyntick-idle mode indefinitely, preventing the RCU grace period from ever completing, possibly hanging the system. This commit therefore prevents CPUs that have RCU callbacks from entering dyntick-idle mode. This approach also eliminates the need for the end-of-grace-period IPIs used previously. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Make RCU use the new is_idle_task() APIPaul E. McKenney
Change from direct comparison of ->pid with zero to is_idle_task(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11rcu: Fix idle-task checksPaul E. McKenney
RCU has traditionally relied on idle_cpu() to determine whether a given CPU is running in the context of an idle task, but commit 908a3283 (Fix idle_cpu()) has invalidated this approach. After commit 908a3283, idle_cpu() will return true if the current CPU is currently running the idle task, and will be doing so for the foreseeable future. RCU instead needs to know whether or not the current CPU is currently running the idle task, regardless of what the near future might bring. This commit therefore switches from idle_cpu() to "current->pid != 0". Reported-by: Wu Fengguang <fengguang.wu@intel.com> Suggested-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Allow dyntick-idle mode for CPUs with callbacksPaul E. McKenney
Currently, RCU does not permit a CPU to enter dyntick-idle mode if that CPU has any RCU callbacks queued. This means that workloads for which each CPU wakes up and does some RCU updates every few ticks will never enter dyntick-idle mode. This can result in significant unnecessary power consumption, so this patch permits a given to enter dyntick-idle mode if it has callbacks, but only if that same CPU has completed all current work for the RCU core. We determine use rcu_pending() to determine whether a given CPU has completed all current work for the RCU core. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Add more information to the wrong-idle-task complaintPaul E. McKenney
The current code just complains if the current task is not the idle task. This commit therefore adds printing of the identity of the idle task. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11rcu: Deconfuse dynticks entry-exit tracingPaul E. McKenney
The trace_rcu_dyntick() trace event did not print both the old and the new value of the nesting level, and furthermore printed only the low-order 32 bits of it. This could result in some confusion when interpreting trace-event dumps, so this commit prints both the old and the new value, prints the full 64 bits, and also selects the process-entry/exit increment to print nicely in hexadecimal. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11rcu: Detect illegal rcu dereference in extended quiescent stateFrederic Weisbecker
Report that none of the rcu read lock maps are held while in an RCU extended quiescent state (the section between rcu_idle_enter() and rcu_idle_exit()). This helps detect any use of rcu_dereference() and friends from within the section in idle where RCU is not allowed. This way we can guarantee an extended quiescent window where the CPU can be put in dyntick idle mode or can simply aoid to be part of any global grace period completion while in the idle loop. Uses of RCU from such mode are totally ignored by RCU, hence the importance of these checks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11rcu: Omit self-awaken when setting up expedited grace periodThomas Gleixner
When setting up an expedited grace period, if there were no readers, the task will awaken itself. This commit removes this useless self-awakening. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-12-11rcu: Disable preemption in rcu_is_cpu_idle()Paul E. McKenney
Because rcu_is_cpu_idle() is to be used to check for extended quiescent states in RCU-preempt read-side critical sections, it cannot assume that preemption is disabled. And preemption must be disabled when accessing the dyntick-idle state, because otherwise the following sequence of events could occur: 1. Task A on CPU 1 enters rcu_is_cpu_idle() and picks up the pointer to CPU 1's per-CPU variables. 2. Task B preempts Task A and starts running on CPU 1. 3. Task A migrates to CPU 2. 4. Task B blocks, leaving CPU 1 idle. 5. Task A continues execution on CPU 2, accessing CPU 1's dyntick-idle information using the pointer fetched in step 1 above, and finds that CPU 1 is idle. 6. Task A therefore incorrectly concludes that it is executing in an extended quiescent state, possibly issuing a spurious splat. Therefore, this commit disables preemption within the rcu_is_cpu_idle() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11rcu: Track idleness independent of idle tasksPaul E. McKenney
Earlier versions of RCU used the scheduling-clock tick to detect idleness by checking for the idle task, but handled idleness differently for CONFIG_NO_HZ=y. But there are now a number of uses of RCU read-side critical sections in the idle task, for example, for tracing. A more fine-grained detection of idleness is therefore required. This commit presses the old dyntick-idle code into full-time service, so that rcu_idle_enter(), previously known as rcu_enter_nohz(), is always invoked at the beginning of an idle loop iteration. Similarly, rcu_idle_exit(), previously known as rcu_exit_nohz(), is always invoked at the end of an idle-loop iteration. This allows the idle task to use RCU everywhere except between consecutive rcu_idle_enter() and rcu_idle_exit() calls, in turn allowing architecture maintainers to specify exactly where in the idle loop that RCU may be used. Because some of the userspace upcall uses can result in what looks to RCU like half of an interrupt, it is not possible to expect that the irq_enter() and irq_exit() hooks will give exact counts. This patch therefore expands the ->dynticks_nesting counter to 64 bits and uses two separate bitfields to count process/idle transitions and interrupt entry/exit transitions. It is presumed that userspace upcalls do not happen in the idle loop or from usermode execution (though usermode might do a system call that results in an upcall). The counter is hard-reset on each process/idle transition, which avoids the interrupt entry/exit error from accumulating. Overflow is avoided by the 64-bitness of the ->dyntick_nesting counter. This commit also adds warnings if a non-idle task asks RCU to enter idle state (and these checks will need some adjustment before applying Frederic's OS-jitter patches (http://lkml.org/lkml/2011/10/7/246). In addition, validation of ->dynticks and ->dynticks_nesting is added. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-12-11rcu: ->signaled better named ->fqs_statePaul E. McKenney
The ->signaled field was named before complications in the form of dyntick-idle mode and offlined CPUs. These complications have required that force_quiescent_state() be implemented as a state machine, instead of simply unconditionally sending reschedule IPIs. Therefore, this commit renames ->signaled to ->fqs_state to catch up with the new force_quiescent_state() reality. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-10-31kernel: Map most files to use export.h instead of module.hPaul Gortmaker
The changed files were only including linux/module.h for the EXPORT_SYMBOL infrastructure, and nothing else. Revector them onto the isolated export header for faster compile times. Nothing to see here but a whole lot of instances of: -#include <linux/module.h> +#include <linux/export.h> This commit is only changing the kernel dir; next targets will probably be mm, fs, the arch dirs, etc. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-28rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp()Paul E. McKenney
It is possible for the CPU that noted the end of the prior grace period to not need a new one, and therefore to decide to propagate ->completed throughout the rcu_node tree without starting another grace period. However, in so doing, it releases the root rcu_node structure's lock, which can allow some other CPU to start another grace period. The first CPU will be propagating ->completed in parallel with the second CPU initializing the rcu_node tree for the new grace period. In theory this is harmless, but in practice we need to keep things simple. This commit therefore moves the propagation of ->completed to rcu_report_qs_rsp(), and refrains from marking the old grace period as having been completed until it has finished doing this. This prevents anyone from starting a new grace period concurrently with marking the old grace period as having been completed. Of course, the optimization where a CPU needing a new grace period doesn't bother marking the old one completed is still in effect: In that case, the marking happens implicitly as part of initializing the new grace period. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent statesPaul E. McKenney
The purpose of rcu_needs_cpu_flush() was to iterate on pushing the current grace period in order to help the current CPU enter dyntick-idle mode. However, this can result in failures if the CPU starts entering dyntick-idle mode, but then backs out. In this case, the call to rcu_pending() from rcu_needs_cpu_flush() might end up announcing a non-existing quiescent state. This commit therefore removes rcu_needs_cpu_flush() in favor of letting the dyntick-idle machinery at the end of the softirq handler push the loop along via its call to rcu_pending(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2011-09-28rcu: Wire up RCU_BOOST_PRIO for rcutreeMike Galbraith
RCU boost threads start life at RCU_BOOST_PRIO, while others remain at RCU_KTHREAD_PRIO. While here, change thread names to match other kthreads, and adjust rcu_yield() to not override the priority set by the user. This last change sets the stage for runtime changes to priority in the -rt tree. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>