linaro-lsk.git - [no description]

Age	Commit message (Collapse)	Author
2011-01-07	perf_events: Add perf_event_time()	Stephane Eranian
	Adds perf_event_time() to try and centralize access to event timing and in particular ctx->time. Prepares for cgroup support. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d22059c.122ae30a.5e0e.ffff8b8b@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07	perf_events: Generalize use of event_filter_match()	Stephane Eranian
	Replace all occurrences of: event->cpu != -1 && event->cpu == smp_processor_id() by a call to: event_filter_match(event) This makes the code more consistent and will make the cgroup patch smaller. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d220593.2308e30a.48c5.ffff8ae9@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07	perf_events: Move code around to prepare for cgroup	Stephane Eranian
	In particular this patch move perf_event_exit_task() before cgroup_exit() to allow for cgroup support. The cgroup_exit() function detaches the cgroups attached to a task. Other movements include hoisting some definitions and inlines at the top of perf_event.c Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4d22058b.cdace30a.4657.ffff95b1@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16	perf: Sysfs enumeration	Peter Zijlstra
	Simple sysfs emumeration of the PMUs. Use a "event_source" bus, and add PMU devices using their name. Each PMU device has a type attribute which contrains the value needed for perf_event_attr::type to identify this PMU. This is the minimal stub needed to start using this interface, we'll consider extending the sysfs usage later. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.316982569@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16	perf: Dynamic pmu types	Peter Zijlstra
	Extend the perf_pmu_register() interface to allow for named and dynamic pmu types. Because we need to support the existing static types we cannot use dynamic types for everything, hence provide a type argument. If we want to enumerate the PMUs they need a name, provide one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.259707703@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16	Merge branch 'perf/urgent' into perf/core	Ingo Molnar
	Merge reason: We want to apply a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-16	perf: Fix off by one in perf_swevent_init()	Dan Carpenter
	The perf_swevent_enabled[] array has PERF_COUNT_SW_MAX elements. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101024195041.GT5985@bicker> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08	perf: Stop all counters on reboot	Peter Zijlstra
	Use the reboot notifier to detach all running counters on reboot, this solves a problem with kexec where the new kernel doesn't expect running counters (rightly so). It will however decrease the coverage of the NMI watchdog. Making a kexec specific reboot notifier callback would be best, however that would require touching all notifier callback handlers as they are not properly structured to deal with new state. As a compromise, place the perf reboot notifier at the very last position in the list. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-08	perf: Fix duplicate events with multiple-pmu vs software events	Peter Zijlstra
	Because the multi-pmu bits can share contexts between struct pmu instances we could get duplicate events by iterating the pmu list. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-12-04	perf events: Make sample_type identity fields available in all PERF_RECORD_ ↵	Arnaldo Carvalho de Melo
	events If perf_event_attr.sample_id_all is set it will add the PERF_SAMPLE_ identity info: TID, TIME, ID, CPU, STREAM_ID As a trailer, so that older perf tools can process new files, just ignoring the extra payload. With this its possible to do further analysis on problems in the event stream, like detecting reordering of MMAP and FORK events, etc. V2: Fixup header size in comm, mmap and task processing, as we have to take into account different sample_types for each matching event, noticed by Thomas Gleixner. Thomas also noticed a problem in v2 where if we didn't had space in the buffer we wouldn't restore the header size. Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-04	perf events: Separate the routines handling the PERF_SAMPLE_ identity fields	Arnaldo Carvalho de Melo
	Those will be made available in sample like events like MMAP, EXEC, etc in a followup patch. So precalculate the extra id header space and have a separate routine to fill them up. V2: Thomas noticed that the id header needs to be precalculated at inherit_events too: LKML-Reference: <alpine.LFD.2.00.1012031245220.2653@localhost6.localdomain6> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1291318772-30880-2-git-send-email-acme@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-04	perf events: Fix event inherit fallout of precalculated headers	Thomas Gleixner
	The precalculated header size is not updated when an event is inherited. That results in bogus sample entries for all child events. Bug introduced in c320c7b. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <alpine.LFD.2.00.1012031245220.2653@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-30	perf events: Precalculate the header space for PERF_SAMPLE_ fields	Arnaldo Carvalho de Melo
	PERF_SAMPLE_{CALLCHAIN,RAW} have variable lenghts per sample, but the others can be precalculated, reducing a bit the per sample cost. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-26	perf: Ignore non-sampling overflows	Peter Zijlstra
	Some arch implementations call perf_event_overflow() by 'accident', ignore this. Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26	perf: Don't bother to init the hrtimer for no SW sampling counters	Franck Bui-Huu
	Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290525705-6265-3-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26	perf: Limit event refresh to sampling event	Franck Bui-Huu
	Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290525705-6265-2-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26	perf: Introduce is_sampling_event()	Franck Bui-Huu
	and use it when appropriate. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290525705-6265-1-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26	Merge branch 'perf/urgent' into perf/core	Ingo Molnar
	Conflicts: arch/x86/kernel/apic/hw_nmi.c Merge reason: Resolve conflict, queue up dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26	perf: Fix the software context switch counter	Peter Zijlstra
	Stephane noticed that because the perf_sw_event() call is inside the perf_event_task_sched_out() call it won't get called unless we have a per-task counter. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-26	perf: Fix inherit vs. context rotation bug	Thomas Gleixner
	It was found that sometimes children of tasks with inherited events had one extra event. Eventually it turned out to be due to the list rotation no being exclusive with the list iteration in the inheritance code. Cure this by temporarily disabling the rotation while we inherit the events. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18	Merge branch 'perf/core' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core
2010-11-18	tracing: New flag to allow non privileged users to use a trace event	Frederic Weisbecker
	This adds a new trace event internal flag that allows them to be used in perf by non privileged users in case of task bound tracing. This is desired for syscalls tracepoint because they don't leak global system informations, like some other tracepoints. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Jason Baron <jbaron@redhat.com>
2010-11-18	perf: Fix owner-list vs exit	Peter Zijlstra
	Oleg noticed that a perf-fd keeping a reference on the creating task leads to a few funny side effects. There's two different aspects to this: - kernel based perf-events, these should not take out a reference on the creating task and appear on the task's event list since they're not bound to fds nor visible to userspace. - fork() and pthread_create(), these can lead to the creating task dying (and thus the task's event-list becomming useless) but keeping the list and ref alive until the event is closed. Combined they lead to malfunction of the ptrace hw_tracepoints. Cure this by not considering kernel based perf_events for the owner-list and destroying the owner-list when the owner dies. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <1289576883.2084.286.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-11-18	Merge branch 'perf/urgent' of ↵	Ingo Molnar
	git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
2010-11-12	perf,hw_breakpoint: Initialize hardware api earlier	Jason Wessel
	When using early debugging, the kernel does not initialize the hw_breakpoint API early enough and causes the late initialization of the kernel debugger to fail. The boot arguments are: earlyprintk=vga ekgdboc=kbd kgdbwait Then simply type "go" at the kdb prompt and boot. The kernel will later emit the message: kgdb: Could not allocate hwbreakpoints And at that point the kernel debugger will cease to work correctly. The solution is to initialize the hw_breakpoint at the same time that all the other perf call backs are initialized instead of using a core_initcall() initialization which happens well after the kernel debugger can make use of hardware breakpoints. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4CD3396D.1090308@windriver.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-11-10	perf_events: Fix time tracking in samples	Stephane Eranian
	This patch corrects time tracking in samples. Without this patch both time_enabled and time_running are bogus when user asks for PERF_SAMPLE_READ. One uses PERF_SAMPLE_READ to sample the values of other counters in each sample. Because of multiplexing, it is necessary to know both time_enabled, time_running to be able to scale counts correctly. In this second version of the patch, we maintain a shadow copy of ctx->time which allows us to compute ctx->time without calling update_context_time() from NMI context. We avoid the issue that update_context_time() must always be called with ctx->lock held. We do not keep shadow copies of the other event timings because if the lead event is overflowing then it is active and thus it's been scheduled in via event_sched_in() in which case neither tstamp_stopped, tstamp_running can be modified. This timing logic only applies to samples when PERF_SAMPLE_READ is used. Note that this patch does not address timing issues related to sampling inheritance between tasks. This will be addressed in a future patch. With this patch, the libpfm4 example task_smpl now reports correct counts (shown on 2.4GHz Core 2): $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears noploop 5 noploop for 5 seconds IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3 2,400,000,254 unhalted_core_cycles:u (33) 2,399,273,744 instructions_retired:u (34) 53,340 baclears (35) Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4cc6e14b.1e07e30a.256e.5190@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22	perf_events: Fix for transaction recovery in group_sched_in()	Stephane Eranian
	This new version (see commit 8e5fc1a) is much simpler and ensures that in case of error in group_sched_in() during event_sched_in(), the events up to the failed event go through regular event_sched_out(). But the failed event and the remaining events in the group have their timings adjusted as if they had also gone through event_sched_in() and event_sched_out(). This ensures timing uniformity across all events in a group. This also takes care of the tstamp_stopped problem in case the group could never be scheduled. The tstamp_stopped is updated as if the event had actually run. With this patch, the following now reports correct time_enabled, in case the NMI watchdog is active: $ task -e unhalted_core_cycles,instructions_retired,baclears,baclears noploop 1 noploop for 1 seconds 0 unhalted_core_cycles (100.00% scaling, ena=997,552,872, run=0) 0 instructions_retired (100.00% scaling, ena=997,552,872, run=0) 0 baclears (100.00% scaling, ena=997,552,872, run=0) 0 baclears (100.00% scaling, ena=997,552,872, run=0) And the older test case also works: $ task -einstructions_retired,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5 1680885 instructions_retired (69.39% scaling, ena=950756, run=291006) 10735 baclears (69.39% scaling, ena=950756, run=291006) 10735 baclears (69.39% scaling, ena=950756, run=291006) 0 unhalted_core_cycles (100.00% scaling, ena=817932, run=0) 0 baclears (100.00% scaling, ena=817932, run=0) 0 baclears (100.00% scaling, ena=817932, run=0) Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4cbeeebc.8ee7d80a.5a28.0d5f@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-22	perf_events: Revert: Fix transaction recovery in group_sched_in()	Stephane Eranian
	This patch reverts commit 8e5fc1a (perf_events: Fix transaction recovery in group_sched_in()) because it had one flaw in case the group could never be scheduled. It would cause time_enabled to get negative. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4cbeeeb7.0aefd80a.6e40.0e2f@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-21	Merge branch 'perf-core-for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits) tracing: Fix compile issue for trace_sched_wakeup.c [S390] hardirq: remove pointless header file includes [IA64] Move local_softirq_pending() definition perf, powerpc: Fix power_pmu_event_init to not use event->ctx ftrace: Remove recursion between recordmcount and scripts/mod/empty jump_label: Add COND_STMT(), reducer wrappery perf: Optimize sw events perf: Use jump_labels to optimize the scheduler hooks jump_label: Add atomic_t interface jump_label: Use more consistent naming perf, hw_breakpoint: Fix crash in hw_breakpoint creation perf: Find task before event alloc perf: Fix task refcount bugs perf: Fix group moving irq_work: Add generic hardirq context callbacks perf_events: Fix transaction recovery in group_sched_in() perf_events: Fix bogus AMD64 generic TLB events perf_events: Fix bogus context time tracking tracing: Remove parent recording in latency tracer graph options tracing: Use one prologue for the preempt irqs off tracer function tracers ...
2010-10-18	perf: Optimize sw events	Peter Zijlstra
	Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18	perf: Use jump_labels to optimize the scheduler hooks	Peter Zijlstra
	Trades a call + conditional + ret for an unconditional jmp. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101014203625.501657727@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18	perf, hw_breakpoint: Fix crash in hw_breakpoint creation	Peter Zijlstra
	hw_breakpoint creation needs to account stuff per-task to ensure there is always sufficient hardware resources to back these things due to ptrace. With the perf per pmu context changes the event initialization no longer has access to the event context, for the simple reason that we need to first find the pmu (result of initialization) before we can find the context. This makes hw_breakpoints unhappy, because it can no longer do per task accounting, cure this by frobbing a task pointer in the event::hw bits for now... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20101014203625.391543667@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18	perf: Find task before event alloc	Peter Zijlstra
	So that we can pass the task pointer to the event allocation, so that we can use task associated data during event initialization. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101014203625.340789919@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18	perf: Fix task refcount bugs	Peter Zijlstra
	Currently it looks like find_lively_task_by_vpid() takes a task ref and relies on find_get_context() to drop it. The problem is that perf_event_create_kernel_counter() shouldn't be dropping task refs. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Matt Helsley <matthltc@us.ibm.com> LKML-Reference: <20101014203625.278436085@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18	perf: Fix group moving	Peter Zijlstra
	Matt found we trigger the WARN_ON_ONCE() in perf_group_attach() when we take the move_group path in perf_event_open(). Since we cannot de-construct the group (we rely on it to move the events), we have to simply ignore the double attach. The group state is context invariant and doesn't need changing. Reported-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1287135757.29097.1368.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18	irq_work: Add generic hardirq context callbacks	Peter Zijlstra
	Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18	perf_events: Fix transaction recovery in group_sched_in()	Stephane Eranian
	The group_sched_in() function uses a transactional approach to schedule a group of events. In a group, either all events can be scheduled or none are. To schedule each event in, the function calls event_sched_in(). In case of error, event_sched_out() is called on each event in the group. The problem is that event_sched_out() does not completely cancel the effects of event_sched_in(). Furthermore event_sched_out() changes the state of the event as if it had run which is not true is this particular case. Those inconsistencies impact time tracking fields and may lead to events in a group not all reporting the same time_enabled and time_running values. This is demonstrated with the example below: $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5 1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827) 11423 baclears (32.85% scaling, ena=829181, run=556827) 7671 baclears (0.00% scaling, ena=556827, run=556827) 2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995) 11705 baclears (57.83% scaling, ena=962822, run=405995) 11705 baclears (57.83% scaling, ena=962822, run=405995) Notice that in the first group, the last baclears event does not report the same timings as its siblings. This issue comes from the fact that tstamp_stopped is updated by event_sched_out() as if the event had actually run. To solve the issue, we must ensure that, in case of error, there is no change in the event state whatsoever. That means timings must remain as they were when entering group_sched_in(). To do this we defer updating tstamp_running until we know the transaction succeeded. Therefore, we have split event_sched_in() in two parts separating the update to tstamp_running. Similarly, in case of error, we do not want to update tstamp_stopped. Therefore, we have split event_sched_out() in two parts separating the update to tstamp_stopped. With this patch, we now get the following output: $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5 2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841) 11243 baclears (71.75% scaling, ena=1093330, run=308841) 11243 baclears (71.75% scaling, ena=1093330, run=308841) 1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489) 9253 baclears (0.00% scaling, ena=784489, run=784489) 9253 baclears (0.00% scaling, ena=784489, run=784489) Note that the uneven timing between groups is a side effect of the process spending most of its time sleeping, i.e., not enough event rotations (but that's a separate issue). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4cb86b4c.41e9d80a.44e9.3e19@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-18	perf_events: Fix bogus context time tracking	Stephane Eranian
	You can only call update_context_time() when the context is active, i.e., the thread it is attached to is still running. However, perf_event_read() can be called even when the context is inactive, e.g., user read() the counters. The call to update_context_time() must be conditioned on the status of the context, otherwise, bogus time_enabled, time_running may be returned. Here is an example on AMD64. The task program is an example from libpfm4. The -p prints deltas every 1s. $ task -p -e cpu_clk_unhalted sleep 5 2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982) 5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270) Whereas if you don't read deltas, e.g., no call to perf_event_read() until the process terminates: $ task -e cpu_clk_unhalted sleep 5 2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899) Notice that time_enable, time_running are bogus in the first example causing bogus scaling. This patch fixes the problem, by conditionally calling update_context_time() in perf_event_read(). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org LKML-Reference: <4cb856dc.51edd80a.5ae0.38fb@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-15	Merge remote branch 'tip/perf/core' into oprofile/core	Robert Richter
	Conflicts: arch/arm/oprofile/common.c kernel/perf_event.c
2010-10-12	perf: Fix incorrect copy_from_user() usage	John Blackwood
	perf events: repair incorrect use of copy_from_user This makes the perf_event_period() return 0 instead of -EFAULT on success. Signed-off-by: John Blackwood<john.blackwood@ccur.com> Signed-off-by: Joe Korty <joe.korty@ccur.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100928220311.GA18145@tsunami.ccur.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-10-11	Merge branch 'oprofile/urgent' (early part) into oprofile/perf	Robert Richter

2010-10-11	perf: New helper function for pmu name	Matt Fleming
	Introduce perf_pmu_name() helper function that returns the name of the pmu. This gives us a generic way to get the name of a pmu regardless of how an architecture identifies it internally. Signed-off-by: Matt Fleming <matt@console-pimps.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-10-04	perf_events: Fix invalid pointer when pid is invalid	Stephane Eranian
	This patch fixes an error in perf_event_open() when the pid provided by the user is invalid. find_lively_task_by_vpid() does not return NULL on error but an error code. Without the fix the error code was silently passed to find_get_context() which would eventually cause a invalid pointer dereference. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: perfmon2-devel@lists.sf.net Cc: eranian@gmail.com Cc: robert.richter@amd.com LKML-Reference: <4ca9a5d1.e8e9d80a.3dbb.ffff8f2e@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-21	perf: Avoid RCU vs preemption assumptions	Peter Zijlstra
	The per-pmu per-cpu context patch converted things from get_cpu_var() to this_cpu_ptr(), but that only works if rcu_read_lock() actually disables preemption, and since there is no such guarantee, we need to fix that. Use the newly introduced {get,put}_cpu_ptr(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <20100917093009.308453028@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-17	perf: Undo the per cpu-context timer stuff	Peter Zijlstra
	Revert the timer per cpu-context timers because of unfortunate nohz interaction. Fixing that would have been somewhat ugly, so go back to driving things from the regular tick. Provide a jiffies interval feature for people who want slower rotations. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20100917093009.519845633@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-17	perf: Fix perf_event_exit_cpu_context()	Peter Zijlstra
	Use the right cpu-context.. spotted by preempt warning on hot-unplug Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <20100917093009.461794357@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-17	perf: Complete software pmu grouping	Peter Zijlstra
	Aside from allowing software events into a !software group, allow adding !software events to pure software groups. Once we've moved the software group and attached the first !software event, the group will no longer be a pure software group and hence no longer be eligible for movement, at which point the straight ctx comparison is correct again. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20100917093009.410784731@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-17	perf_events: Fix broken event grouping	Stephane Eranian
	Events were not grouped anymore. The reason was that in perf_event_open(), the field event->group_leader was initialized before the function looked up the group_fd to find the event leader. This patch fixes this by reordering the code correctly. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <20100917093009.360420946@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-15	perf events: Clean up pid passing	Matt Helsley
	The kernel perf event creation path shouldn't use find_task_by_vpid() because a vpid exists in a specific namespace. find_task_by_vpid() uses current's pid namespace which isn't always the correct namespace to use for the vpid in all the places perf_event_create_kernel_counter() (and thus find_get_context()) is called. The goal is to clean up pid namespace handling and prevent bugs like: https://bugzilla.kernel.org/show_bug.cgi?id=17281 Instead of using pids switch find_get_context() to use task struct pointers directly. The syscall is responsible for resolving the pid to a task struct. This moves the pid namespace resolution into the syscall much like every other syscall that takes pid parameters. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robin Green <greenrd@greenrd.org> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> LKML-Reference: <a134e5e392ab0204961fd1a62c84a222bf5874a9.1284407763.git.matthltc@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-15	perf events: Split out task search into helper	Matt Helsley
	Split out the code which searches for non-exiting tasks into its own helper. Creating this helper not only makes the code slightly more readable it prepares to move the search out of find_get_context() in a subsequent commit. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robin Green <greenrd@greenrd.org> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> LKML-Reference: <561205417b450b8a4bf7488374541d64b4690431.1284407762.git.matthltc@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>