aboutsummaryrefslogtreecommitdiff
path: root/kernel/sched/core.c
AgeCommit message (Collapse)Author
2012-05-08Merge branch 'smp/threadalloc' into smp/hotplugThomas Gleixner
Reason: Pull in the separate branch which was created so arch/tile can base further work on it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-03userns: Convert sched_set_affinity and sched_set_scheduler's permission checksEric W. Biederman
- Compare kuids with uid_eq - kuid are uniuqe across all user namespaces so there is no longer the need for a user_namespace comparison. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-02rcu: Move PREEMPT_RCU preemption to switch_to() invocationPaul E. McKenney
Currently, PREEMPT_RCU readers are enqueued upon entry to the scheduler. This is inefficient because enqueuing is required only if there is a context switch, and entry to the scheduler does not guarantee a context switch. The commit therefore moves the enqueuing to immediately precede the call to switch_to() from the scheduler. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-26sched: Fix OOPS when build_sched_domains() percpu allocation failshe, bo
Under extreme memory used up situations, percpu allocation might fail. We hit it when system goes to suspend-to-ram, causing a kworker panic: EIP: [<c124411a>] build_sched_domains+0x23a/0xad0 Kernel panic - not syncing: Fatal exception Pid: 3026, comm: kworker/u:3 3.0.8-137473-gf42fbef #1 Call Trace: [<c18cc4f2>] panic+0x66/0x16c [...] [<c1244c37>] partition_sched_domains+0x287/0x4b0 [<c12a77be>] cpuset_update_active_cpus+0x1fe/0x210 [<c123712d>] cpuset_cpu_inactive+0x1d/0x30 [...] With this fix applied build_sched_domains() will return -ENOMEM and the suspend attempt fails. Signed-off-by: he, bo <bo.he@intel.com> Reviewed-by: Zhang, Yanmin <yanmin.zhang@intel.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/1335355161.5892.17.camel@hebo [ So, we fail to deallocate a CPU because we cannot allocate RAM :-/ I don't like that kind of sad behavior but nevertheless it should not crash under high memory load. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-04-26smp: Provide generic idle thread allocationThomas Gleixner
All SMP architectures have magic to fork the idle task and to store it for reusage when cpu hotplug is enabled. Provide a generic infrastructure for it. Create/reinit the idle thread for the cpu which is brought up in the generic code and hand the thread pointer to the architecture code via __cpu_up(). Note, that fork_idle() is called via a workqueue, because this guarantees that the idle thread does not get a reference to a user space VM. This can happen when the boot process did not bring up all possible cpus and a later cpu_up() is initiated via the sysfs interface. In that case fork_idle() would be called in the context of the user space task and take a reference on the user space VM. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Richard Weinberger <richard@nod.at> Cc: x86@kernel.org Acked-by: Venkatesh Pallipadi <venki@google.com> Link: http://lkml.kernel.org/r/20120420124557.102478630@linutronix.de
2012-04-07userns: Use cred->user_ns instead of cred->user->user_nsEric W. Biederman
Optimize performance and prepare for the removal of the user_ns reference from user_struct. Remove the slow long walk through cred->user->user_ns and instead go straight to cred->user_ns. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-04-01cgroup: convert all non-memcg controllers to the new cftype interfaceTejun Heo
Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio, net_cls and device controllers to use the new cftype based interface. Termination entry is added to cftype arrays and populate callbacks are replaced with cgroup_subsys->base_cftypes initializations. This is functionally identical transformation. There shouldn't be any visible behavior change. memcg is rather special and will be converted separately. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vivek Goyal <vgoyal@redhat.com>
2012-03-31Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix incorrect usage of for_each_cpu_mask() in select_fallback_rq() sched: Fix __schedule_bug() output when called from an interrupt sched/arch: Introduce the finish_arch_post_lock_switch() scheduler callback
2012-03-31sched: Fix incorrect usage of for_each_cpu_mask() in select_fallback_rq()Srivatsa S. Bhat
The function for_each_cpu_mask() expects a *pointer* to struct cpumask as its second argument, whereas select_fallback_rq() passes the value itself. And moreover, for_each_cpu_mask() has been marked as obselete in include/linux/cpumask.h. So move to the more appropriate for_each_cpu() variant. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Dave Jones <davej@redhat.com> Cc: Liu Chuansheng <chuansheng.liu@intel.com> Cc: vapier@gentoo.org Cc: rusty@rustcorp.com.au Link: http://lkml.kernel.org/r/4F75BED4.9050005@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-29Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpusets: Remove an unused variable sched/rt: Improve pick_next_highest_task_rt() sched: Fix select_fallback_rq() vs cpu_active/cpu_online sched/x86/smp: Do not enable IRQs over calibrate_delay() sched: Fix compiler warning about declared inline after use MAINTAINERS: Update email address for SCHEDULER and PERF EVENTS
2012-03-29Merge branch 'sched/arch' into sched/urgentIngo Molnar
Merge reason: It has not gone upstream via the ARM tree, merge it via the scheduler tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-29sched: Fix __schedule_bug() output when called from an interruptStephen Boyd
If schedule is called from an interrupt handler __schedule_bug() will call show_regs() with the registers saved during the interrupt handling done in do_IRQ(). This means we'll see the registers and the backtrace for the process that was interrupted and not the full backtrace explaining who called schedule(). This is due to 838225b ("sched: use show_regs() to improve __schedule_bug() output", 2007-10-24) which improperly assumed that get_irq_regs() would return the registers for the current stack because it is being called from within an interrupt handler. Simply remove the show_reg() code so that we dump a backtrace for the interrupt handler that called schedule(). [ I ran across this when I was presented with a scheduling while atomic log with a stacktrace pointing at spin_unlock_irqrestore(). It made no sense and I had to guess what interrupt handler could be called and poke around for someone calling schedule() in an interrupt handler. A simple test of putting an msleep() in an interrupt handler works better with this patch because you can actually see the msleep() call in the backtrace. ] Also-reported-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Satyam Sharma <satyam@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1332979847-27102-1-git-send-email-sboyd@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-28Add #includes needed to permit the removal of asm/system.hDavid Howells
asm/system.h is a cause of circular dependency problems because it contains commonly used primitive stuff like barrier definitions and uncommonly used stuff like switch_to() that might require MMU definitions. asm/system.h has been disintegrated by this point on all arches into the following common segments: (1) asm/barrier.h Moved memory barrier definitions here. (2) asm/cmpxchg.h Moved xchg() and cmpxchg() here. #included in asm/atomic.h. (3) asm/bug.h Moved die() and similar here. (4) asm/exec.h Moved arch_align_stack() here. (5) asm/elf.h Moved AT_VECTOR_SIZE_ARCH here. (6) asm/switch_to.h Moved switch_to() here. Signed-off-by: David Howells <dhowells@redhat.com>
2012-03-27sched: Fix select_fallback_rq() vs cpu_active/cpu_onlinePeter Zijlstra
Commit 5fbd036b55 ("sched: Cleanup cpu_active madness"), which was supposed to finally sort the cpu_active mess, instead uncovered more. Since CPU_STARTING is ran before setting the cpu online, there's a (small) window where the cpu has active,!online. If during this time there's a wakeup of a task that used to reside on that cpu select_task_rq() will use select_fallback_rq() to compute an alternative cpu to run on since we find !online. select_fallback_rq() however will compute the new cpu against cpu_active, this means that it can return the same cpu it started out with, the !online one, since that cpu is in fact marked active. This results in us trying to scheduling a task on an offline cpu and triggering a WARN in the IPI code. The solution proposed by Chuansheng Liu of setting cpu_active in set_cpu_online() is buggy, firstly not all archs actually use set_cpu_online(), secondly, not all archs call set_cpu_online() with IRQs disabled, this means we would introduce either the same race or the race from fd8a7de17 ("x86: cpu-hotplug: Prevent softirq wakeup on wrong CPU") -- albeit much narrower. [ By setting online first and active later we have a window of online,!active, fresh and bound kthreads have task_cpu() of 0 and since cpu0 isn't in tsk_cpus_allowed() we end up in select_fallback_rq() which excludes !active, resulting in a reset of ->cpus_allowed and the thread running all over the place. ] The solution is to re-work select_fallback_rq() to require active _and_ online. This makes the active,!online case work as expected, OTOH archs running CPU_STARTING after setting online are now vulnerable to the issue from fd8a7de17 -- these are alpha and blackfin. Reported-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Frysinger <vapier@gentoo.org> Cc: linux-alpha@vger.kernel.org Link: http://lkml.kernel.org/n/tip-hubqk1i10o4dpvlm06gq7v6j@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-03-21Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates for 3.4 from James Morris: "The main addition here is the new Yama security module from Kees Cook, which was discussed at the Linux Security Summit last year. Its purpose is to collect miscellaneous DAC security enhancements in one place. This also marks a departure in policy for LSM modules, which were previously limited to being standalone access control systems. Chromium OS is using Yama, and I believe there are plans for Ubuntu, at least. This patchset also includes maintenance updates for AppArmor, TOMOYO and others." Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key rename. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits) AppArmor: Fix location of const qualifier on generated string tables TOMOYO: Return error if fails to delete a domain AppArmor: add const qualifiers to string arrays AppArmor: Add ability to load extended policy TOMOYO: Return appropriate value to poll(). AppArmor: Move path failure information into aa_get_name and rename AppArmor: Update dfa matching routines. AppArmor: Minor cleanup of d_namespace_path to consolidate error handling AppArmor: Retrieve the dentry_path for error reporting when path lookup fails AppArmor: Add const qualifiers to generated string tables AppArmor: Fix oops in policy unpack auditing AppArmor: Fix error returned when a path lookup is disconnected KEYS: testing wrong bit for KEY_FLAG_REVOKED TOMOYO: Fix mount flags checking order. security: fix ima kconfig warning AppArmor: Fix the error case for chroot relative path name lookup AppArmor: fix mapping of META_READ to audit and quiet flags AppArmor: Fix underflow in xindex calculation AppArmor: Fix dropping of allowed operations that are force audited AppArmor: Add mising end of structure test to caps unpacking ...
2012-03-20Merge branch 'for-3.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Out of the 8 commits, one fixes a long-standing locking issue around tasklist walking and others are cleanups." * 'for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Walk task list under tasklist_lock in cgroup_enable_task_cg_list cgroup: Remove wrong comment on cgroup_enable_task_cg_list() cgroup: remove cgroup_subsys argument from callbacks cgroup: remove extra calls to find_existing_css_set cgroup: replace tasklist_lock with rcu_read_lock cgroup: simplify double-check locking in cgroup_attach_proc cgroup: move struct cgroup_pidlist out from the header file cgroup: remove cgroup_attach_task_current_cg()
2012-03-20Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes for v3.4 from Ingo Molnar * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) printk: Make it compile with !CONFIG_PRINTK sched/x86: Fix overflow in cyc2ns_offset sched: Fix nohz load accounting -- again! sched: Update yield() docs printk/sched: Introduce special printk_sched() for those awkward moments sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer sched: Cleanup cpu_active madness sched: Fix load-balance wreckage sched: Clean up parameter passing of proc_sched_autogroup_set_nice() sched: Ditch per cgroup task lists for load-balancing sched: Rename load-balancing fields sched: Move load-balancing arguments into helper struct sched/rt: Do not submit new work when PI-blocked sched/rt: Prevent idle task boosting sched/wait: Add __wake_up_all_locked() API sched/rt: Document scheduler related skip-resched-check sites sched/rt: Use schedule_preempt_disabled() sched/rt: Add schedule_preempt_disabled() sched/rt: Do not throttle when PI boosting sched/rt: Keep period timer ticking when rt throttling is active ...
2012-03-13sched/arch: Introduce the finish_arch_post_lock_switch() scheduler callbackCatalin Marinas
This callback is called by the scheduler after rq->lock has been released and interrupts enabled. It will be used in subsequent patches on the ARM architecture. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Frank Rowand <frank.rowand@am.sony.com> Tested-by: Will Deacon <will.deacon@arm.com> Tested-by: Marc Zyngier <Marc.Zyngier@arm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/20120313110840.7b444deb6b1bb902c15f3cdf@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12sched: Fix nohz load accounting -- again!Peter Zijlstra
Various people reported nohz load tracking still being wrecked, but Doug spotted the actual problem. We fold the nohz remainder in too soon, causing us to loose samples and under-account. So instead of playing catch-up up-front, always do a single load-fold with whatever state we encounter and only then fold the nohz remainder and play catch-up. Reported-by: Doug Smythies <dsmythies@telus.net> Reported-by: LesÅ=82aw Kope=C4=87 <leslaw.kopec@nasza-klasa.pl> Reported-by: Aman Gupta <aman@tmm1.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-4v31etnhgg9kwd6ocgx3rxl8@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12sched: Update yield() docsPeter Zijlstra
Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1331056466.11248.327.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12printk/sched: Introduce special printk_sched() for those awkward momentsPeter Zijlstra
There's a few awkward printk()s inside of scheduler guts that people prefer to keep but really are rather deadlock prone. Fudge around it by storing the text in a per-cpu buffer and poll it using the existing printk_tick() handler. This will drop output when its more frequent than once a tick, however only the affinity thing could possible go that fast and for that just one should suffice to notify the admin he's done something silly.. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-wua3lmkt3dg8nfts66o6brne@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12sched: Cleanup cpu_active madnessPeter Zijlstra
Stepan found: CPU0 CPUn _cpu_up() __cpu_up() boostrap() notify_cpu_starting() set_cpu_online() while (!cpu_active()) cpu_relax() <PREEMPT-out> smp_call_function(.wait=1) /* we find cpu_online() is true */ arch_send_call_function_ipi_mask() /* wait-forever-more */ <PREEMPT-in> local_irq_enable() cpu_notify(CPU_ONLINE) sched_cpu_active() set_cpu_active() Now the purpose of cpu_active is mostly with bringing down a cpu, where we mark it !active to avoid the load-balancer from moving tasks to it while we tear down the cpu. This is required because we only update the sched_domain tree after we brought the cpu-down. And this is needed so that some tasks can still run while we bring it down, we just don't want new tasks to appear. On cpu-up however the sched_domain tree doesn't yet include the new cpu, so its invisible to the load-balancer, regardless of the active state. So instead of setting the active state after we boot the new cpu (and consequently having to wait for it before enabling interrupts) set the cpu active before we set it online and avoid the whole mess. Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-07Revert "CPU hotplug, cpusets, suspend: Don't touch cpusets during ↵Linus Torvalds
suspend/resume" This reverts commit 8f2f748b0656257153bcf0941df8d6060acc5ca6. It causes some odd regression that we have not figured out, and it's too late in the -rc series to try to figure it out now. As reported by Konstantin Khlebnikov, it causes consistent hangs on his laptop (Thinkpad x220: 2x cores + HT). They can be avoided by adding calls to "rebuild_sched_domains();" in cpuset_cpu_[in]active() for the CPU_{ONLINE/DOWN_FAILED/DOWN_PREPARE}_FROZEN cases, but it's not at all clear why, and it makes no sense. Konstantin's config doesn't even have CONFIG_CPUSETS enabled, just to make things even more interesting. So it's not the cpusets, it's just the scheduling domains. So until this is understood, revert. Bisected-reported-and-tested-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-05Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: tools/perf/builtin-record.c tools/perf/builtin-top.c tools/perf/perf.h tools/perf/util/top.h Merge reason: resolve these cherry-picking conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01sched: Ditch per cgroup task lists for load-balancingPeter Zijlstra
Per cgroup load-balance has numerous problems, chief amongst them that there is no real sane order in them. So stop pretending it makes sense and enqueue all tasks on a single list. This also allows us to more easily fix the fwd progress issue uncovered by the lock-break stuff. Rotate the list on failure to migreate and limit the total iterations to nr_running (which with releasing the lock isn't strictly accurate but close enough). Also add a filter that skips very light tasks on the first attempt around the list, this attempts to avoid shooting whole cgroups around without affecting over balance. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: pjt@google.com Link: http://lkml.kernel.org/n/tip-tx8yqydc7eimgq7i4rkc3a4g@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01sched/rt: Do not submit new work when PI-blockedThomas Gleixner
When we are PI-blocked then we want to get things done ASAP. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-vw8et3445km5b8mpihf4trae@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01sched/rt: Prevent idle task boostingThomas Gleixner
Idle task boosting is a nono in general. There is one exception, when PREEMPT_RT and NOHZ is active: The idle task calls get_next_timer_interrupt() and holds the timer wheel base->lock on the CPU and another CPU wants to access the timer (probably to cancel it). We can safely ignore the boosting request, as the idle CPU runs this code with interrupts disabled and will complete the lock protected section without being interrupted. So there is no real need to boost. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-755rvsosz7sdzot12a3gbha6@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01sched/wait: Add __wake_up_all_locked() APIThomas Gleixner
For code which protects the waitqueue itself with another lock it makes no sense to acquire the waitqueue lock for wakeup all. Provide __wake_up_all_locked(). This is an optimization on the vanilla kernel (to be used by the PCI code) and an important semantic distinction on -rt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-ux6m4b8jonb9inx8xafh77ds@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01sched/rt: Document scheduler related skip-resched-check sitesThomas Gleixner
Create a distinction between scheduler related preempt_enable_no_resched() calls and the nearly one hundred other places in the kernel that do not want to reschedule, for one reason or another. This distinction matters for -rt, where the scheduler and the non-scheduler preempt models (and checks) are different. For upstream it's purely documentational. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/n/tip-gs88fvx2mdv5psnzxnv575ke@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01sched/rt: Add schedule_preempt_disabled()Thomas Gleixner
Add helper to get rid of the ever repeating: preempt_enable_no_resched(); schedule(); preempt_disable(); patterns. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-wxx7btox7coby6ifv5vzhzgp@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01Merge branch 'linus' into sched/coreIngo Molnar
Merge reason: we'll queue up dependent patches. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-27CPU hotplug, cpusets, suspend: Don't touch cpusets during suspend/resumeSrivatsa S. Bhat
Currently, during CPU hotplug, the cpuset callbacks modify the cpusets to reflect the state of the system, and this handling is asymmetric. That is, upon CPU offline, that CPU is removed from all cpusets. However when it comes back online, it is put back only to the root cpuset. This gives rise to a significant problem during suspend/resume. During suspend, we offline all non-boot cpus and during resume we online them back. Which means, after a resume, all cpusets (except the root cpuset) will be restricted to just one single CPU (the boot cpu). But the whole point of suspend/resume is to restore the system to a state which is as close as possible to how it was before suspend. So to fix this, don't touch cpusets during suspend/resume. That is, modify the cpuset-related CPU hotplug callback to just ignore CPU hotplug when it is initiated as part of the suspend/resume sequence. Reported-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/4F460D7B.1020703@linux.vnet.ibm.com Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-24static keys: Introduce 'struct static_key', static_key_true()/false() and ↵Ingo Molnar
static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-22sched/events: Revert trace_sched_stat_sleeptime()Peter Zijlstra
Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime") added a new sched:sched_stat_sleeptime tracepoint. It's broken: the first sample we get on a task might be bad because of a stale sleep_start value that wasn't reset at the last task switch because the tracepoint was not active. It also breaks the existing schedstat samples due to the side effects of: - se->statistics.sleep_start = 0; ... - se->statistics.block_start = 0; Nor do I see means to fix it without adding overhead to the scheduler fast path, which I'm not willing to for the sake of redundant instrumentation. Most importantly, sleep time information can already be constructed by tracing context switches and wakeups, and taking the timestamp difference between the schedule-out, the wakeup and the schedule-in. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Vagin <avagin@openvz.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-14security: trim security.hAl Viro
Trim security.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
2012-02-02cgroup: remove cgroup_subsys argument from callbacksLi Zefan
The argument is not used at all, and it's not necessary, because a specific callback handler of course knows which subsys it belongs to. Now only ->pupulate() takes this argument, because the handlers of this callback always call cgroup_add_file()/cgroup_add_files(). So we reduce a few lines of code, though the shrinking of object size is minimal. 16 files changed, 113 insertions(+), 162 deletions(-) text data bss dec hex filename 5486240 656987 7039960 13183187 c928d3 vmlinux.o.orig 5486170 656987 7039960 13183117 c9288d vmlinux.o Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2012-01-27sched, block: Unify cache detectionPeter Zijlstra
The block layer has some code trying to determine if two CPUs share a cache, the scheduler has a similar function. Expose the function used by the scheduler and make the block layer use it, thereby removing the block layers usage of CONFIG_SCHED* and topology bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jens Axboe <axboe@kernel.dk> Link: http://lkml.kernel.org/r/1327579450.2446.95.camel@twins
2012-01-26sched/s390: Fix compile error in sched/core.cChristian Borntraeger
Commit 029632fbb7b7c9d85063cc9eb470de6c54873df3 ("sched: Make separate sched*.c translation units") removed the include of asm/mutex.h from sched.c. This breaks the combination of: CONFIG_MUTEX_SPIN_ON_OWNER=yes CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX=yes like s390 without mutex debugging: CC kernel/sched/core.o kernel/sched/core.c: In function ‘mutex_spin_on_owner’: kernel/sched/core.c:3287: error: implicit declaration of function ‘arch_mutex_cpu_relax’ Lets re-add the include to kernel/sched/core.c Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1326268696-30904-1-git-send-email-borntraeger@de.ibm.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-26sched: Fix rq->nr_uninterruptible update racePeter Zijlstra
KOSAKI Motohiro noticed the following race: > CPU0 CPU1 > -------------------------------------------------------- > deactivate_task() > task->state = TASK_UNINTERRUPTIBLE; > activate_task() > rq->nr_uninterruptible--; > > schedule() > deactivate_task() > rq->nr_uninterruptible++; > Kosaki-San's scenario is possible when CPU0 runs __sched_setscheduler() against CPU1's current @task. __sched_setscheduler() does a dequeue/enqueue in order to move the task to its new queue (position) to reflect the newly provided scheduling parameters. However it should be completely invariant to nr_uninterruptible accounting, sched_setscheduler() doesn't affect readyness to run, merely policy on when to run. So convert the inappropriate activate/deactivate_task usage to enqueue/dequeue_task, which avoids the nr_uninterruptible accounting. Also convert the two other sites: __migrate_task() and normalize_task() that still use activate/deactivate_task. These sites aren't really a problem since __migrate_task() will only be called on non-running task (and therefore are immume to the described problem) and normalize_task() isn't ever used on regular systems. Also remove the comments from activate/deactivate_task since they're misleading at best. Reported-by: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1327486224.2614.45.camel@laptop Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-14Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security: capabilities: remove __cap_full_set definition security: remove the security_netlink_recv hook as it is equivalent to capable() ptrace: do not audit capability check when outputing /proc/pid/stat capabilities: remove task_ns_* functions capabitlies: ns_capable can use the cap helpers rather than lsm call capabilities: style only - move capable below ns_capable capabilites: introduce new has_ns_capabilities_noaudit capabilities: call has_ns_capability from has_capability capabilities: remove all _real_ interfaces capabilities: introduce security_capable_noaudit capabilities: reverse arguments to security_capable capabilities: remove the task from capable LSM hook entirely selinux: sparse fix: fix several warnings in the security server cod selinux: sparse fix: fix warnings in netlink code selinux: sparse fix: eliminate warnings for selinuxfs selinux: sparse fix: declare selinux_disable() in security.h selinux: sparse fix: move selinux_complete_init selinux: sparse fix: make selinux_secmark_refcount static SELinux: Fix RCU deref check warning in sel_netport_insert() Manually fix up a semantic mis-merge wrt security_netlink_recv(): - the interface was removed in commit fd7784615248 ("security: remove the security_netlink_recv hook as it is equivalent to capable()") - a new user of it appeared in commit a38f7907b926 ("crypto: Add userspace configuration API") causing no automatic merge conflict, but Eric Paris pointed out the issue.
2012-01-11Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix lockup by limiting load-balance retries on lock-break sched: Fix CONFIG_CGROUP_SCHED dependency sched: Remove empty #ifdefs
2012-01-10sched: Remove empty #ifdefsHiroshi Shimamoto
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4F0B8525.8070901@ct.jp.nec.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-09Merge branch 'for-3.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits) cgroup: fix to allow mounting a hierarchy by name cgroup: move assignement out of condition in cgroup_attach_proc() cgroup: Remove task_lock() from cgroup_post_fork() cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end() cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static cgroup: only need to check oldcgrp==newgrp once cgroup: remove redundant get/put of task struct cgroup: remove redundant get/put of old css_set from migrate cgroup: Remove unnecessary task_lock before fetching css_set on migration cgroup: Drop task_lock(parent) on cgroup_fork() cgroups: remove redundant get/put of css_set from css_set_check_fetched() resource cgroups: remove bogus cast cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task() cgroup, cpuset: don't use ss->pre_attach() cgroup: don't use subsys->can_attach_task() or ->attach_task() cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() cgroup: improve old cgroup handling in cgroup_attach_proc() cgroup: always lock threadgroup during migration threadgroup: extend threadgroup_lock() to cover exit and exec threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem ... Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups: fix a css_set not found bug in cgroup_attach_proc" that already mentioned that the bug is fixed (differently) in Tejun's cgroup patchset. This one, in other words.
2012-01-08Merge branch 'for-linus2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits) reiserfs: Properly display mount options in /proc/mounts vfs: prevent remount read-only if pending removes vfs: count unlinked inodes vfs: protect remounting superblock read-only vfs: keep list of mounts for each superblock vfs: switch ->show_options() to struct dentry * vfs: switch ->show_path() to struct dentry * vfs: switch ->show_devname() to struct dentry * vfs: switch ->show_stats to struct dentry * switch security_path_chmod() to struct path * vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb vfs: trim includes a bit switch mnt_namespace ->root to struct mount vfs: take /proc/*/mounts and friends to fs/proc_namespace.c vfs: opencode mntget() mnt_set_mountpoint() vfs: spread struct mount - remaining argument of next_mnt() vfs: move fsnotify junk to struct mount vfs: move mnt_devname vfs: move mnt_list to struct mount vfs: switch pnode.h macros to struct mount * ...
2012-01-07Merge branch 'driver-core-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits) arm: fix up some samsung merge sysdev conversion problems firmware: Fix an oops on reading fw_priv->fw in sysfs loading file Drivers:hv: Fix a bug in vmbus_driver_unregister() driver core: remove __must_check from device_create_file debugfs: add missing #ifdef HAS_IOMEM arm: time.h: remove device.h #include driver-core: remove sysdev.h usage. clockevents: remove sysdev.h arm: convert sysdev_class to a regular subsystem arm: leds: convert sysdev_class to a regular subsystem kobject: remove kset_find_obj_hinted() m86k: gpio - convert sysdev_class to a regular subsystem mips: txx9_sram - convert sysdev_class to a regular subsystem mips: 7segled - convert sysdev_class to a regular subsystem sh: dma - convert sysdev_class to a regular subsystem sh: intc - convert sysdev_class to a regular subsystem power: suspend - convert sysdev_class to a regular subsystem power: qe_ic - convert sysdev_class to a regular subsystem power: cmm - convert sysdev_class to a regular subsystem s390: time - convert sysdev_class to a regular subsystem ... Fix up conflicts with 'struct sysdev' removal from various platform drivers that got changed: - arch/arm/mach-exynos/cpu.c - arch/arm/mach-exynos/irq-eint.c - arch/arm/mach-s3c64xx/common.c - arch/arm/mach-s3c64xx/cpu.c - arch/arm/mach-s5p64x0/cpu.c - arch/arm/mach-s5pv210/common.c - arch/arm/plat-samsung/include/plat/cpu.h - arch/powerpc/kernel/sysfs.c and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2011-12-23sched/tracing: Add a new tracepoint for sleeptimeArun Sharma
If CONFIG_SCHEDSTATS is defined, the kernel maintains information about how long the task was sleeping or in the case of iowait, blocking in the kernel before getting woken up. This will be useful for sleep time profiling. Note: this information is only provided for sched_fair. Other scheduling classes may choose to provide this in the future. Note: the delay includes the time spent on the runqueue as well. Signed-off-by: Arun Sharma <asharma@fb.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1324512940-32060-2-git-send-email-asharma@fb.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-23sched: Disable scheduler warnings during oopsesDave Jones
The panic-on-framebuffer code seems to cause a schedule to occur during an oops. This causes a bunch of extra spew as can be seen in: https://bugzilla.redhat.com/attachment.cgi?id=549230 Don't do scheduler debug checks when we are oopsing already. Signed-off-by: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/20111222213929.GA4722@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21sched: Remove cfs bandwidth period check in tg_set_cfs_period()Kamalesh Babulal
Remove cfs bandwidth period check from tg_set_cfs_period. Invalid bandwidth period's lower/upper limits are denoted by min_cfs_quota_period/max_cfs_quota_period repsectively, and are checked against valid period in tg_set_cfs_bandwidth(). As pjt pointed out, negative input will result in very large unsigned numbers and will be caught by the max allowed period test. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Acked-by: Paul Turner <pjt@google.com> [ammended changelog to mention negative values] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20111210135925.GA14593@linux.vnet.ibm.com -- kernel/sched/core.c | 3 --- 1 file changed, 3 deletions(-) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21sched: Only queue remote wakeups when crossing cache boundariesPeter Zijlstra
Mike reported a 13% drop in netperf TCP_RR performance due to the new remote wakeup code. Suresh too noticed some performance issues with it. Reducing the IPIs to only cross cache domains solves the observed performance issues. Reported-by: Suresh Siddha <suresh.b.siddha@intel.com> Reported-by: Mike Galbraith <efault@gmx.de> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Link: http://lkml.kernel.org/r/1323338531.17673.7.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-19Merge branch 'sched/core' of ↵Martin Schwidefsky
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into cputime-tip Conflicts: drivers/cpufreq/cpufreq_conservative.c drivers/cpufreq/cpufreq_ondemand.c drivers/macintosh/rack-meter.c fs/proc/stat.c fs/proc/uptime.c kernel/sched/core.c