aboutsummaryrefslogtreecommitdiff
path: root/tools/perf/util/sort.c
AgeCommit message (Collapse)Author
2013-04-01perf tools: Fix output of symbol_daddr offsetNamhyung Kim
The symbol addresses in a dso have relative offsets from the start of a mapping. So in order to ouput correct offset value from @ip, one of them should be converted. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1359040242-8269-19-git-send-email-eranian@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-04-01perf tools: Add mem access sampling core supportStephane Eranian
This patch adds the sorting and histogram support functions to enable profiling of memory accesses. The following sorting orders are added: - symbol_daddr: data address symbol (or raw address) - dso_daddr: data address shared object - locked: access uses locked transaction - tlb : TLB access - mem : memory level of the access (L1, L2, L3, RAM, ...) - snoop: access snoop mode Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1359040242-8269-12-git-send-email-eranian@google.com [ committer note: changed to cope with fc5871ed, the move of methods to machine.[ch], and the rename of dsrc to data_src, to match the change made in the PERF_SAMPLE_DSRC in a previous patch. ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-04-01perf tools: Add support for weight v7 (modified)Andi Kleen
perf record has a new option -W that enables weightened sampling. Add sorting support in top/report for the average weight per sample and the total weight sum. This allows to both compare relative cost per event and the total cost over the measurement period. Add the necessary glue to perf report, record and the library. v2: Merge with new hist refactoring. v3: Fix manpage. Remove value check. Rename global_weight to weight and weight to local_weight. v4: Readd sort keys to manpage v5: Move weight to end v6: Move weight to template v7: Rename weight key. Original patch from Andi modified by Stephane Eranian <eranian@google.com> to include ONLY the weight supporting code and apply to pristine 3.8.0-rc4. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1359040242-8269-6-git-send-email-eranian@google.com [ committer note: changed to cope with fc5871ed and the hists_link perf test entry ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-02-06perf sort: Check return value of strdup()Namhyung Kim
When setup_sorting() is called, 'str' is passed to strtok_r() but it's not checked to have a valid pointer. As strtok_r() accepts NULL pointer on a first argument and use the third argument in that case, it can cause a trouble since our third argument, tmp, is not initialized. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1360130237-9963-3-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-02-06perf sort: Make setup_sorting returns an error codeNamhyung Kim
Currently the setup_sorting() is called for parsing sort keys and exits if it failed to add the sort key. As it's included in libperf it'd be better returning an error code rather than exiting application inside of the library. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Suggested-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1360130237-9963-2-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-02-06perf sort: Drop ip_[lr] arguments from _sort__sym_cmp()Namhyung Kim
Current _sort__sym_cmp() function is used for comparing symbols between two hist entries on symbol, symbol_from and symbol_to sort keys. Those functions pass addresses of symbols but it's meaningless since it gets over-written inside of the _sort__sym_cmp function to a start address of the symbol. So just get rid of them. This might cause a difference than prior output for branch stacks since it seems not using start address of the symbol but branch address. However AFAICS it'd be same as it gets overwritten anyway. Also remove redundant part of code in sort__sym_cmp(). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1360130237-9963-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-30perf sort: Use pclose() instead of fclose() on pipe streamThomas Jarosch
cppcheck message: [tools/perf/util/sort.c:277]: (error) Mismatching allocation and deallocation: fp Also fix descriptor leak on error and always initialize the "fp" variable. Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Link: http://lkml.kernel.org/r/1359112354.yZcisNZ4k0@storm Link: http://lkml.kernel.org/r/2266358.qvDXKLvJ67@storm Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-24perf sort: Separate out branch stack specific sort keysNamhyung Kim
Current perf report gets segmentation fault when a branch stack specific sort key is provided by --sort option to a perf.data file which contains no branch infomation. It's because those sort keys reference branch info of a hist entry unconditionally. Maybe we can change it checks whether such branch info is valid or not. But if the branch stacks are not recorded, it'd be nop. Thus it'd be better to make those keys are unselectable. This patch separates those keys to a different dimension array, so that if user passes such a key to a file which has no branch stack will get following message rather than a segfault. Error: Invalid --sort key: `symbol_from' Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reported-by: Stefan Beller <stefanbeller@googlemail.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356599507-14226-10-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-24perf sort: Clean up sort__first_dimension settingNamhyung Kim
It doesn't need to compare to every sort key names since the index already has the required information. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356599507-14226-9-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-24perf sort: Align cpu column to rightNamhyung Kim
Since cpu number is a natural number, it'd be more appropriate aligning it to right. Before: # Overhead CPU Command: Pid Shared Object # ........ ... ................. ..................... # 8.91% 8 gnome-shell: 1497 perf-1497.map 8.90% 7 gnome-shell: 1497 perf-1497.map 8.86% 9 gnome-shell: 1497 perf-1497.map 8.83% 6 gnome-shell: 1497 perf-1497.map 8.81% 10 gnome-shell: 1497 perf-1497.map 7.44% 5 gnome-shell: 1497 perf-1497.map 6.20% 3 gnome-shell: 1497 perf-1497.map 5.10% 0 gnome-shell: 1497 perf-1497.map After: # Overhead CPU Command: Pid Shared Object # ........ ... ................. ..................... # 8.91% 8 gnome-shell: 1497 perf-1497.map 8.90% 7 gnome-shell: 1497 perf-1497.map 8.86% 9 gnome-shell: 1497 perf-1497.map 8.83% 6 gnome-shell: 1497 perf-1497.map 8.81% 10 gnome-shell: 1497 perf-1497.map 7.44% 5 gnome-shell: 1497 perf-1497.map 6.20% 3 gnome-shell: 1497 perf-1497.map 5.10% 0 gnome-shell: 1497 perf-1497.map Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356599507-14226-5-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-24perf sort: Fix --sort pid outputNamhyung Kim
The "pid" sort key prints "Command: Pid" output but it's misaligned. It's because of the offset of 6 was added to the column length during the calculation in order to reserve an space for Pid part but it isn't honored when printed. The output before this patch was like this: # Overhead Command: Pid Shared Object # ........ ............. ................. # 99.70% noploop:17814 noploop 0.29% noploop:17814 [kernel.kallsyms] 0.01% noploop:17814 ld-2.15.so Fix it by subtracting 6 for printing comm part. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356599507-14226-4-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-24perf sort: Get rid of unnecessary __maybe_unusedNamhyung Kim
Some functions have set __maybe_unused on its arguments that are used actually. Remove them. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356599507-14226-3-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-24perf sort: Move misplaced sort entry functionsNamhyung Kim
Some functions are misplaced along with other entries. Move them to a right place so that it can be found together with related functions. No functional change intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356599507-14226-2-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-24perf tools: remove redundant checks from _sort__sym_cmpSasha Levin
We already check that sym_l and sum_r are non-NULLs, no need to do it twice. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1356030701-16284-12-git-send-email-sasha.levin@oracle.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-10-16perf tools: Remove warnings on JIT samples for srcline sort keyNamhyung Kim
When using the srcline sort key with perf report, I see many lines of warning related to JIT samples like below: addr2line: '/tmp/perf-1397.map': No such file Since it's not a ELF binary and doesn't provide such information, just use the raw ip address. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Irina Tirdea <irina.tirdea@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1350272383-7016-2-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-10-16perf tools: Fix segfault when using srcline sort keyNamhyung Kim
The srcline sort key is for grouping samples based on their source file and line number. It use addr2line tool to get the information but it requires dso name. It caused a segfault when a sample does not have the name by dereferencing a NULL pointer. Fix it by using raw ip addresses for those samples. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1350272383-7016-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-17perf tools: Add sort__has_symNamhyung Kim
The sort__has_sym variable is for checking whether the sort_list includes 'symbol' as a sort key. It will be used for later patch. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1347611729-16994-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-11perf tools: Use __maybe_used for unused variablesIrina Tirdea
perf defines both __used and __unused variables to use for marking unused variables. The variable __used is defined to __attribute__((__unused__)), which contradicts the kernel definition to __attribute__((__used__)) for new gcc versions. On Android, __used is also defined in system headers and this leads to warnings like: warning: '__used__' attribute ignored __unused is not defined in the kernel and is not a standard definition. If __unused is included everywhere instead of __used, this leads to conflicts with glibc headers, since glibc has a variables with this name in its headers. The best approach is to use __maybe_unused, the definition used in the kernel for __attribute__((unused)). In this way there is only one definition in perf sources (instead of 2 definitions that point to the same thing: __used and __unused) and it works on both Linux and Android. This patch simply replaces all instances of __used and __unused with __maybe_unused. Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com [ committer note: fixed up conflict with a116e05 in builtin-sched.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-07perf tools: Replace sort's standalone field_sep with symbol_conf.field_sepJiri Olsa
The repsep_snprintf function was still using standalone field_sep, which not even set anymore. Replacing it with 'symbol_conf.field_sep'. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1346946426-13496-3-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-06-19perf tools: Add sort by src line/numberArnaldo Carvalho de Melo
Using addr2line for now, requires debuginfo, needs more work to support detached debuginfo, aka foo-debuginfo packages. Example: [root@sandy ~]# perf record -a sleep 3 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.555 MB perf.data (~24236 samples) ] [root@sandy ~]# perf report -s dso,srcline 2>&1 | grep -v ^# | head -5 22.41% [kernel.kallsyms] /home/git/linux/drivers/idle/intel_idle.c:280 4.79% [kernel.kallsyms] /home/git/linux/drivers/cpuidle/cpuidle.c:148 4.78% [kernel.kallsyms] /home/git/linux/arch/x86/include/asm/atomic64_64.h:121 4.49% [kernel.kallsyms] /home/git/linux/kernel/sched/core.c:1690 4.30% [kernel.kallsyms] /home/git/linux/include/linux/seqlock.h:90 [root@sandy ~]# [root@sandy ~]# perf top -U -s dso,symbol,srcline Samples: 1K of event 'cycles', Event count (approx.): 589617389 18.66% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:143 7.83% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:39 6.59% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:38 3.66% [kernel] [k] page_fault /home/git/linux/arch/x86/kernel/entry_64.S:1379 3.25% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:40 3.12% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:37 2.74% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:36 2.39% [kernel] [k] clear_page /home/git/linux/arch/x86/lib/clear_page_64.S:43 2.12% [kernel] [k] ioread32 /home/git/linux/lib/iomap.c:90 1.51% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:144 1.19% [kernel] [k] copy_user_generic_unrolled /home/git/linux/arch/x86/lib/copy_user_64.S:154 Suggested-by: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-pdmqbng9twz06jzkbgtuwbp8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-20Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events changes for v3.4 from Ingo Molnar: - New "hardware based branch profiling" feature both on the kernel and the tooling side, on CPUs that support it. (modern x86 Intel CPUs with the 'LBR' hardware feature currently.) This new feature is basically a sophisticated 'magnifying glass' for branch execution - something that is pretty difficult to extract from regular, function histogram centric profiles. The simplest mode is activated via 'perf record -b', and the result looks like this in perf report: $ perf record -b any_call,u -e cycles:u branchy $ perf report -b --sort=symbol 52.34% [.] main [.] f1 24.04% [.] f1 [.] f3 23.60% [.] f1 [.] f2 0.01% [k] _IO_new_file_xsputn [k] _IO_file_overflow 0.01% [k] _IO_vfprintf_internal [k] _IO_new_file_xsputn 0.01% [k] _IO_vfprintf_internal [k] strchrnul 0.01% [k] __printf [k] _IO_vfprintf_internal 0.01% [k] main [k] __printf This output shows from/to branch columns and shows the highest percentage (from,to) jump combinations - i.e. the most likely taken branches in the system. "branches" can also include function calls and any other synchronous and asynchronous transitions of the instruction pointer that are not 'next instruction' - such as system calls, traps, interrupts, etc. This feature comes with (hopefully intuitive) flat ascii and TUI support in perf report. - Various 'perf annotate' visual improvements for us assembly junkies. It will now recognize function calls in the TUI and by hitting enter you can follow the call (recursively) and back, amongst other improvements. - Multiple threads/processes recording support in perf record, perf stat, perf top - which is activated via a comma-list of PIDs: perf top -p 21483,21485 perf stat -p 21483,21485 -ddd perf record -p 21483,21485 - Support for per UID views, via the --uid paramter to perf top, perf report, etc. For example 'perf top --uid mingo' will only show the tasks that I am running, excluding other users, root, etc. - Jump label restructurings and improvements - this includes the factoring out of the (hopefully much clearer) include/linux/static_key.h generic facility: struct static_key key = STATIC_KEY_INIT_FALSE; ... if (static_key_false(&key)) do unlikely code else do likely code ... static_key_slow_inc(); ... static_key_slow_inc(); ... The static_key_false() branch will be generated into the code with as little impact to the likely code path as possible. the static_key_slow_*() APIs flip the branch via live kernel code patching. This facility can now be used more widely within the kernel to micro-optimize hot branches whose likelihood matches the static-key usage and fast/slow cost patterns. - SW function tracer improvements: perf support and filtering support. - Various hardenings of the perf.data ABI, to make older perf.data's smoother on newer tool versions, to make new features integrate more smoothly, to support cross-endian recording/analyzing workflows better, etc. - Restructuring of the kprobes code, the splitting out of 'optprobes', and a corner case bugfix. - Allow the tracing of kernel console output (printk). - Improvements/fixes to user-space RDPMC support, allowing user-space self-profiling code to extract PMU counts without performing any system calls, while playing nice with the kernel side. - 'perf bench' improvements - ... and lots of internal restructurings, cleanups and fixes that made these features possible. And, as usual this list is incomplete as there were also lots of other improvements * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits) perf report: Fix annotate double quit issue in branch view mode perf report: Remove duplicate annotate choice in branch view mode perf/x86: Prettify pmu config literals perf report: Enable TUI in branch view mode perf report: Auto-detect branch stack sampling mode perf record: Add HEADER_BRANCH_STACK tag perf record: Provide default branch stack sampling mode option perf tools: Make perf able to read files from older ABIs perf tools: Fix ABI compatibility bug in print_event_desc() perf tools: Enable reading of perf.data files from different ABI rev perf: Add ABI reference sizes perf report: Add support for taken branch sampling perf record: Add support for sampling taken branch perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK x86/kprobes: Split out optprobe related code to kprobes-opt.c x86/kprobes: Fix a bug which can modify kernel code permanently x86/kprobes: Fix instruction recovery on optimized path perf: Add callback to flush branch_stack on context switch perf: Disable PERF_SAMPLE_BRANCH_* when not supported perf/x86: Add LBR software filter support for Intel CPUs ...
2012-03-14perf tools: Incorrect use of snprintf results in SEGVAnton Blanchard
I have a workload where perf top scribbles over the stack and we SEGV. What makes it interesting is that an snprintf is causing this. The workload is a c++ gem that has method names over 3000 characters long, but snprintf is designed to avoid overrunning buffers. So what went wrong? The problem is we assume snprintf returns the number of characters written: ret += repsep_snprintf(bf + ret, size - ret, "[%c] ", self->level); ... ret += repsep_snprintf(bf + ret, size - ret, "%s", self->ms.sym->name); Unfortunately this is not how snprintf works. snprintf returns the number of characters that would have been written if there was enough space. In the above case, if the first snprintf returns a value larger than size, we pass a negative size into the second snprintf and happily scribble over the stack. If you have 3000 character c++ methods thats a lot of stack to trample. This patch fixes repsep_snprintf by clamping the value at size - 1 which is the maximum snprintf can write before adding the NULL terminator. I get the sinking feeling that there are a lot of other uses of snprintf that have this same bug, we should audit them all. Cc: David Ahern <dsahern@gmail.com> Cc: Eric B Munson <emunson@mgebm.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/20120307114249.44275ca3@kryten Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-03-09perf report: Auto-detect branch stack sampling modeStephane Eranian
This patch enhances perf report to auto-detect when the perf.data file contains samples with branch stacks. That way it is not necessary to use the -b option. To force branch view mode to off, simply use --no-branch-stack. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: asharma@fb.com Cc: ravitillo@lbl.gov Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1331246868-19905-4-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-09perf tools: Add code to support PERF_SAMPLE_BRANCH_STACKRoberto Agostino Vitillo
This patch adds: - ability to parse samples with PERF_SAMPLE_BRANCH_STACK - sort on branches (dso_from, symbol_from, dso_to, symbol_to, mispredict) - build histograms on branches Signed-off-by: Roberto Agostino Vitillo <ravitillo@lbl.gov> Signed-off-by: Stephane Eranian <eranian@google.com> Cc: peterz@infradead.org Cc: acme@redhat.com Cc: robert.richter@amd.com Cc: ming.m.lin@intel.com Cc: andi@firstfloor.org Cc: asharma@fb.com Cc: vweaver1@eecs.utk.edu Cc: khandual@linux.vnet.ibm.com Cc: dsahern@gmail.com Link: http://lkml.kernel.org/r/1328826068-11713-12-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-10-20perf hists browser: Elide DSO column when it is set to just one DSO, ditto ↵Arnaldo Carvalho de Melo
for threads And also no leed to show the [.] (level: k, . for userspace) when showing just one DSO. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4h3f6ro5o7ebepjbssxf0dd3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-09-23perf sort: Fix symbol sort output by separating unresolved samples by typeAnton Blanchard
I took a profile that suggested 60% of total CPU time was in the hypervisor: ... 60.20% [H] 0x33d43c 4.43% [k] ._spin_lock_irqsave 1.07% [k] ._spin_lock Using perf stat to get the user/kernel/hypervisor breakdown contradicted this. The problem is we merge all unresolved samples into the one unknown bucket. If add a comparison by sample type to sort__sym_cmp we get the real picture: ... 57.11% [.] 0x80fbf63c 4.43% [k] ._spin_lock_irqsave 1.07% [k] ._spin_lock 0.65% [H] 0x33d43c So it was almost all userspace, not hypervisor as the initial profile suggested. I found another issue while adding this. Symbol sorting sometimes shows multiple entries for the unknown bucket: ... 16.65% [.] 0x6cd3a8 7.25% [.] 0x422460 5.37% [.] yylex 4.79% [.] malloc 4.78% [.] _int_malloc 4.03% [.] _int_free 3.95% [.] hash_source_code_string 2.82% [.] 0x532908 2.64% [.] 0x36b538 0.94% [H] 0x8000000000e132a4 0.82% [H] 0x800000000000e8b0 This happens because we aren't consistent with our sorting. On one hand we check to see if both symbols match and for two unresolved samples sym is NULL so we match: if (left->ms.sym == right->ms.sym) return 0; On the other hand we use sample IP for unresolved samples when comparing against a symbol: ip_l = left->ms.sym ? left->ms.sym->start : left->ip; ip_r = right->ms.sym ? right->ms.sym->start : right->ip; This means unresolved samples end up spread across the rbtree and we can't merge them all. If we use cmp_null all unresolved samples will end up in the one bucket and the output makes more sense: ... 39.12% [.] 0x36b538 5.37% [.] yylex 4.79% [.] malloc 4.78% [.] _int_malloc 4.03% [.] _int_free 3.95% [.] hash_source_code_string 2.26% [H] 0x800000000000e8b0 Acked-by: Eric B Munson <emunson@mgebm.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ian Munsie <imunsie@au1.ibm.com> Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@kryten Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-30perf tools: Allow sort dimensions to be registered more than onceFrederic Weisbecker
So that the parent sort dimension can be registered twice: once if we add it as an explicit sort dimension (-s parent) and twice if we request a parent filter (-p foo). We'll have only one parent sort dimension in the end but this allows to override the default parent filter with we gave in "-p" option. The goal of this is to prepare to allow the use of "-s parent" and "-p foo" at the same time, ie: sort by filtered parent. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Sam Liao <phyomh@gmail.com>
2011-06-30perf tools: Make sort operations staticFrederic Weisbecker
These don't need to be globally visible. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Sam Liao <phyomh@gmail.com>
2010-12-06perf hist: Better displaying of unresolved DSOs and symbolsIan Munsie
In the event that a DSO has not been identified, just print out [unknown] instead of the instruction pointer as we previously were doing, which is pretty meaningless for a shared object (at least to the users perspective). The IP we print out is fairly meaningless in general anyway - it's just one (the first) of the many addresses that were lumped together as unidentified, and could span many shared objects and symbols. In reality if we see this [unknown] output then the report -D output is going to be more useful anyway as we can see all the different address that it represents. If we are printing the symbols we are still going to see this IP in that column anyway since they shouldn't resolve either. This patch also changes the symbol address printouts so that they print out 0x before the address, are left aligned, and changes the %L format string (which relies on a glibc bug) to %ll. Before: 74.11% :3259 4a6c [k] 4a6c After: 74.11% :3259 [unknown] [k] 0x4a6c Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1291603026-11785-2-git-send-email-imunsie@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-08-05perf hists: Fixup addr snprintf width on 32 bit archesArnaldo Carvalho de Melo
By using BITS_PER_LONG/4 as the width specifier. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-07-23perf sort: Make column width code per hists instanceArnaldo Carvalho de Melo
They were globals, and since we support multiple hists and sessions at the same time, it doesn't make sense to calculate those values considereing all symbols in all sessions. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-06-05perf report: Implement --sort cpuArun Sharma
In a shared multi-core environment, users want to analyze why their program was slow. In particular, if the code ran slower only on certain CPUs due to interference from other programs or kernel threads, the user should be able to notice that. Sample usage: perf record -f -a -- sleep 3 perf report --sort cpu,comm Workload: program is running on 16 CPUs Experiencing interference from an antagonist only on 4 CPUs. Samples: 106218177676 cycles Overhead CPU Command ........ ... ............... 6.25% 2 program 6.24% 6 program 6.24% 11 program 6.24% 5 program 6.24% 9 program 6.24% 10 program 6.23% 15 program 6.23% 7 program 6.23% 3 program 6.23% 14 program 6.22% 1 program 6.20% 13 program 3.17% 12 program 3.15% 8 program 3.14% 0 program 3.13% 4 program 3.11% 4 antagonist 3.11% 0 antagonist 3.10% 8 antagonist 3.07% 12 antagonist Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <20100505181612.GA5091@sharma-home.net> Signed-off-by: Arun Sharma <aruns@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-17perf options: Type check all the remaining OPT_ variantsArnaldo Carvalho de Melo
OPT_SET_INT was renamed to OPT_SET_UINT since the only use in these tools is to set something that has an enum type, that is builtin compatible with unsigned int. Several string constifications were done to make OPT_STRING require a const char * type. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-14perf tools: Fix accidentally preprocessed snprintf callbackFrederic Weisbecker
struct sort_entry has a callback named snprintf that turns an entry into a string result. But there are glibc versions that implement snprintf through a macro. The following expression is then going to get the snprintf call preprocessed: ent->snprintf(...) to finally end up in a build error: util/hist.c: Dans la fonction «hist_entry__snprintf» : util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk» To fix this, prepend struct sort_entry callbacks with an "se_" prefix. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-02perf tools: sort_dimension__add shouldn't dieArnaldo Carvalho de Melo
Propagate error instead. LKML-Reference: <new-submission> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-02perf hist: Replace ->print() routines by ->snprintf() equivalentsArnaldo Carvalho de Melo
Then hist_entry__fprintf will just us the newly introduced hist_entry__snprintf, add the newline and fprintf it to the supplied FILE descriptor. This allows us to remove the use_browser checking in the color_printf routines, that now got color_snprintf variants too. The newt TUI browser (and other GUIs that may come in the future) don't have to worry about stdio specific stuff in the strings they get from the se->snprintf routines and instead use whatever means to do the equivalent. Also the newt TUI browser don't have to use the fmemopen() hack, instead it can use the se->snprintf routines directly. For now tho use the hist_entry__snprintf routine to reduce the patch size. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-03-26perf tools: Introduce struct map_symbolArnaldo Carvalho de Melo
That will be in both struct hist_entry and struct callchain_list, so that the TUI can store a pointer to the pair (map, symbol) in the trees where hist_entries and callchain_lists are present, to allow precise annotation instead of looking for the first symbol with the selected name. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1269459619-982-4-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16perf diff: Use perf_session__fprintf_hists just like 'perf record'Arnaldo Carvalho de Melo
That means that almost everything you can do with 'perf report' can be done with 'perf diff', for instance: $ perf record -f find / > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.062 MB perf.data (~2699 samples) ] $ perf record -f find / > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.062 MB perf.data (~2687 samples) ] perf diff | head -8 9.02% +1.00% find libc-2.10.1.so [.] _IO_vfprintf_internal 2.91% -1.00% find [kernel] [k] __kmalloc 2.85% -1.00% find [kernel] [k] ext4_htree_store_dirent 1.99% -1.00% find [kernel] [k] _atomic_dec_and_lock 2.44% find [kernel] [k] half_md4_transform $ So if you want to zoom into libc: $ perf diff --dsos libc-2.10.1.so | head -8 37.34% find [.] _IO_vfprintf_internal 10.34% find [.] __GI_memmove 8.25% +2.00% find [.] _int_malloc 5.07% -1.00% find [.] __GI_mempcpy 7.62% +2.00% find [.] _int_free $ And if there were multiple commands using libc, it is also possible to aggregate them all by using --sort symbol: $ perf diff --dsos libc-2.10.1.so --sort symbol | head -8 37.34% [.] _IO_vfprintf_internal 10.34% [.] __GI_memmove 8.25% +2.00% [.] _int_malloc 5.07% -1.00% [.] __GI_mempcpy 7.62% +2.00% [.] _int_free $ The displacement column now is off by default, to use it: perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8 37.34% [.] _IO_vfprintf_internal 10.34% [.] __GI_memmove 8.25% +2.00% [.] _int_malloc 5.07% -1.00% +2 [.] __GI_mempcpy 7.62% +2.00% -1 [.] _int_free $ Using -t/--field-separator can be used for scripting: $ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8 37.34, , ,[.] _IO_vfprintf_internal 10.34, , ,[.] __GI_memmove 8.25,+2.00%, ,[.] _int_malloc 5.07,-1.00%, +2,[.] __GI_mempcpy 7.62,+2.00%, -1,[.] _int_free 6.99,+1.00%, -1,[.] _IO_new_file_xsputn 1.89,-2.00%, +4,[.] __readdir64 $ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15perf util: Remove setup_sorting dupsArnaldo Carvalho de Melo
And it is also needed by 'perf diff'. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1260828571-3613-1-git-send-email-acme@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23perf tools: Bind callchains to the first sort dimension columnFrederic Weisbecker
Currently, the callchains are displayed using a constant left margin. So depending on the current sort dimension configuration, callchains may appear to be well attached to the first sort dimension column field which is mostly the case, except when the first dimension of sorting is done by comm, because these are right aligned. This patch binds the callchain to the first letter in the first column, whatever type of column it is (dso, comm, symbol). Before: 0.80% perf [k] __lock_acquire __lock_acquire lock_acquire | |--58.33%-- _spin_lock | | | |--28.57%-- inotify_should_send_event | | fsnotify | | __fsnotify_parent After: 0.80% perf [k] __lock_acquire __lock_acquire lock_acquire | |--58.33%-- _spin_lock | | | |--28.57%-- inotify_should_send_event | | fsnotify | | __fsnotify_parent Also, for clarity, we don't put anymore the callchain as is but: - If we have a top level ancestor in the callchain, start it with a first ascii hook. Before: 0.80% perf [kernel] [k] __lock_acquire __lock_acquire lock_acquire | |--58.33%-- _spin_lock | | | |--28.57%-- inotify_should_send_event | | fsnotify [..] [..] After: 0.80% perf [kernel] [k] __lock_acquire | --- __lock_acquire lock_acquire | |--58.33%-- _spin_lock | | | |--28.57%-- inotify_should_send_event | | fsnotify [..] [..] - Otherwise, if we have several top level ancestors, then display these like we did before: 1.69% Xorg | |--21.21%-- vread_hpet | 0x7fffd85b46fc | 0x7fffd85b494d | 0x7f4fafb4e54d | |--15.15%-- exaOffscreenAlloc | |--9.09%-- I830WaitLpRing Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> LKML-Reference: <1256246604-17156-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-23perf tools: Fix missing top level callchainFrederic Weisbecker
While recursively printing the branches of each callchains, we forget to display the root. It is never printed. Say we have: symbol f1 f2 | -------- f3 | f4 | ---------f5 f6 Actually we never see that, instead it displays: symbol | --------- f3 | f4 | --------- f5 f6 However f1 is always the same than "symbol" and if we are sorting by symbols first then "symbol", f1 and f2 will be well aligned like in the above example, so displaying f1 looks redundant here. But if we are sorting by something else first (dso, comm, etc...), displaying f1 doesn't look redundant but rather necessary because the symbol is not well aligned anymore with its callchain: comm dso symbol f1 f2 | --------- [...] And we want the callchain to be obvious. So we fix the bug by printing the root branch, but we also filter its first entry if we are sorting by symbols first. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1256246604-17156-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-02perf tools: Rewrite and improve support for kernel modulesArnaldo Carvalho de Melo
Representing modules as struct map entries, backed by a DSO, etc, using /proc/modules to find where the module is loaded. DSOs now can have a short and long name, so that in verbose mode we can show exactly which .ko or vmlinux image was used. As kernel modules now are a DSO separate from the kernel, we can ask for just the hits for a particular set of kernel modules, just like we can do with shared libraries: [root@doppio linux-2.6-tip]# perf report -n --vmlinux /home/acme/git/build/tip-recvmmsg/vmlinux --modules --dsos \[drm\] | head -15 84.58% 13266 Xorg [k] drm_clflush_pages 4.02% 630 Xorg [k] trace_kmalloc.clone.0 3.95% 619 Xorg [k] drm_ioctl 2.07% 324 Xorg [k] drm_addbufs 1.68% 263 Xorg [k] drm_gem_close_ioctl 0.77% 120 Xorg [k] drm_setmaster_ioctl 0.70% 110 Xorg [k] drm_lastclose 0.68% 106 Xorg [k] drm_open 0.54% 85 Xorg [k] drm_mm_search_free [root@doppio linux-2.6-tip]# Specifying --dsos /lib/modules/2.6.31-tip/kernel/drivers/gpu/drm/drm.ko would have the same effect. Allowing specifying just 'drm.ko' is left for another patch. Processing kallsyms so that per kernel module struct map are instantiated was also left for another patch. That will allow removing the module name from each of its symbols. struct symbol was reduced by removing the ->module backpointer and moving it (well now the map) to struct symbol_entry in perf top, that is its only user right now. The total linecount went down by ~500 lines. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24perf tools: Create util/sort.and use itJohn Kacur
Create util/sort.[ch] and move common functionality for builtin-report.c and builtin-annotate.c there, and make use of it. Signed-off-by: John Kacur <jkacur@redhat.com> LKML-Reference: <alpine.LFD.2.00.0909241758390.11383@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>