perf/Documentation/perf-stat.txt

1 perf-stat(1)
5 ----
6 perf-stat - Run a command and gather performance counter statistics
9 --------
11 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf stat' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
13 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] \-- <command> [<options>]
14 'perf stat' report [-i file]
17 -----------
23 -------
33 -e::
34 --event=::
37 	- a symbolic event name (use 'perf list' to list all events)
39 	- a raw PMU event in the form of rN where N is a hexadecimal value
44         - a symbolic or raw PMU event followed by an optional colon
45 	  and a list of event modifiers, e.g., cpu-cycles:p.  See the
46 	  linkperf:perf-list[1] man page for details on event modifiers.
48 	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
54 	  perf stat -A -a -e cpu/event,percore=1/,otherevent ...
56 	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
69 -i::
70 --no-inherit::
72 -p::
73 --pid=<pid>::
76 -t::
77 --tid=<tid>::
80 -b::
81 --bpf-prog::
83         requiring root rights. bpftool-prog could be used to find program
86   # bpftool prog | head -n 1
89   # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
94              28,982      instructions              #    0.34  insn per cycle
98 --bpf-counters::
100 	allows multiple perf-stat sessions that are counting the same metric (cycles,
103 	"perf config stat.bpf-counter-events=<list_of_events>".
105 --bpf-attr-map::
106 	With option "--bpf-counters", different perf-stat sessions share
108 	Use "--bpf-attr-map" to specify the path of this pinned hashmap.
112 --pfm-events events::
114 including support for event filters. For example '--pfm-events
117 events cannot be mixed together. The latter must be used with the -e
118 option. The -e option and this one can be mixed and matched.  Events
122 -a::
123 --all-cpus::
124         system-wide collection from all CPUs (default if no target is specified)
126 --no-scale::
129 -d::
130 --detailed::
133 	   -d:          detailed events, L1 and LLC data cache
134         -d -d:     more detailed events, dTLB and iTLB events
135      -d -d -d:     very detailed events, adding prefetch events
137 -r::
138 --repeat=<n>::
141 -B::
142 --big-num::
144 	Enabled by default. Use "--no-big-num" to disable.
145 	Default setting can be changed with "perf config stat.big-num=false".
147 -C::
148 --cpu=::
150 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
151 In per-thread mode, this option is ignored. The -a option is still necessary
152 to activate system-wide monitoring. Default is to count on all CPUs.
154 -A::
155 --no-aggr::
158 -n::
159 --null::
160 null run - Don't start any counters.
162 This can be useful to measure just elapsed wall-clock time - or to assess the
165 -v::
166 --verbose::
169 -x SEP::
170 --field-separator SEP::
171 print counts using a CSV-style output to make it easy to import directly into
174 --table:: Display time for each run (-r option), in a table format, e.g.:
176   $ perf stat --null -r 5 --table perf bench sched pipe
181              5.189 (-0.293) #
182              5.189 (-0.294) #
183              5.186 (-0.296) #
188              5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )
190 -G name::
191 --cgroup name::
193 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
197 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
200 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
203 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
205 --for-each-cgroup name::
208 effect that repeating -e option and -G option for each event x name.  This option
209 cannot be used with -G/--cgroup option.
211 -o file::
212 --output file::
215 --append::
216 Append to the output file designated with the -o option. Ignored if -o is not specified.
218 --log-fd::
220 Log output to fd, instead of stderr.  Complementary to --output, and mutually exclusive
221 with it.  --append may be used here.  Examples:
222      3>results  perf stat --log-fd 3          \-- $cmd
223      3>>results perf stat --log-fd 3 --append \-- $cmd
225 --control=fifo:ctl-fifo[,ack-fifo]::
226 --control=fd:ctl-fd[,ack-fd]::
227 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
228 Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
230 --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
239  test -p ${ctl_fifo} && unlink ${ctl_fifo}
244  test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
248  perf stat -D -1 -e cpu-cycles -a -I 1000       \
249            --control fd:${ctl_fd},${ctl_fd_ack} \
250            \-- sleep 30 &
253  sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
254  sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
256  exec {ctl_fd_ack}>&-
259  exec {ctl_fd}>&-
262  wait -n ${perf_pid}
266 --pre::
267 --post::
270 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \-- make -s -j64 O=defc…
272 -I msecs::
273 --interval-print msecs::
276 	example: 'perf stat -I 1000 -e cycles -a sleep 5'
280 --interval-count times::
282 This option should be used together with "-I" option.
283 	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
285 --interval-clear::
288 --timeout msecs::
290 This option is not supported with the "-I" option.
291 	example: 'perf stat --time 2000 -e cycles -a'
293 --metric-only::
295 Don't show any raw values. Not supported with --per-thread.
297 --per-socket::
298 Aggregate counts per processor socket for system-wide mode measurements.  This
300 use --per-socket in addition to -a. (system-wide).  The output includes the
304 --per-die::
305 Aggregate counts per processor die for system-wide mode measurements.  This
307 use --per-die in addition to -a. (system-wide).  The output includes the
311 --per-cluster::
312 Aggregate counts per processor cluster for system-wide mode measurement.  This
314 use --per-cluster in addition to -a. (system-wide).  The output includes the
319 --per-cache::
320 Aggregate counts per cache instance for system-wide mode measurements.  By
323 alongside the option in the format [Ll][1-9][0-9]*. For example:
324 Using option "--per-cache=l3" or "--per-cache=L3" will aggregate the
327 --per-core::
328 Aggregate counts per physical processor for system-wide mode measurements.  This
330 use --per-core in addition to -a. (system-wide).  The output includes the
331 core number and the number of online logical processors on that physical processor.
333 --per-thread::
334 Aggregate counts per monitored threads, when monitoring threads (-t option)
335 or processes (-p option).
337 --per-node::
338 Aggregate counts per NUMA nodes for system-wide mode measurements. This
340 mode, use --per-node in addition to -a. (system-wide).
342 -D msecs::
343 --delay msecs::
344 After starting the program, wait msecs before measuring (-1: start with events
348 -T::
349 --transaction::
353 --metric-no-group::
356 --metric-no-group option places events outside of groups and may
357 increase the chance of the event being scheduled - leading to more
359 for metrics like instructions per cycle can be lower - as both metrics
362 --metric-no-merge::
372 --metric-no-threshold::
381 --quiet::
386 -----------
389 -o file::
390 --output file::
394 -----------
397 -i file::
398 --input file::
401 --per-socket::
402 Aggregate counts per processor socket for system-wide mode measurements.
404 --per-die::
405 Aggregate counts per processor die for system-wide mode measurements.
407 --per-cluster::
408 Aggregate counts perf processor cluster for system-wide mode measurements.
410 --per-cache::
411 Aggregate counts per cache instance for system-wide mode measurements.  By
414 alongside the option in the format [Ll][1-9][0-9]*. For example: Using
415 option "--per-cache=l3" or "--per-cache=L3" will aggregate the
418 --per-core::
419 Aggregate counts per physical processor for system-wide mode measurements.
421 -M::
422 --metrics::
434 -A::
435 --no-aggr::
436 --no-merge::
459 --hybrid-merge::
465 --topdown::
466 Print top-down metrics supported by the CPU. This allows to determine
479 mode like -I 1000, as the bottleneck of workloads can change often.
481 This enables --metric-only, unless overridden with --no-metric-only.
486 The top down metrics are collected per core instead of per
487 CPU thread. Per core mode is automatically enabled
488 and -a (global monitoring) is needed, requiring root rights or
489 perf.perf_event_paranoid=-1.
501 --record-tpebs::
509 --td-level::
510 Print the top-down statistics that equal the input level. It allows
511 users to print the interested top-down metrics level instead of the
512 level 1 top-down metrics.
520 'perf stat -M tma_frontend_bound_group...'.
524 --smi-cost::
530 The cost of SMI can be measured by (aperf - unhalted core cycles).
533 oriented analysis. --metric_only will be applied by default.
534 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
536 Users who wants to get the actual value can apply --no-metric-only.
538 --all-kernel::
541 --all-user::
544 --percore-show-thread::
546 for all hardware threads in a core and show the counts per core.
549 counts for all hardware threads in a core but show the sum counts per
553 --summary::
554 Print summary for interval mode (-I).
556 --no-csv-summary::
558 This option must be used with -x and --summary.
561 'stat.no-csv-summary'.
563 $ perf config stat.no-csv-summary=true
565 --cputype::
570 --------
572 $ perf stat \-- make
576         83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
577                    0      context-switches:u        #    0.000 K/sec
578                    0      cpu-migrations:u          #    0.000 K/sec
579            3,228,188      page-faults:u             #    0.039 M/sec
581      313,163,853,778      instructions:u            #    1.36  insn per cycle
583        2,078,861,393      branch-misses:u           #    2.98% of all branches
591 -------
606 ----------
608 With -x, perf stat is able to output a not-quite-CSV format output
610 it is recommended to use a different character like -x \;
614 	- optional usec time stamp in fractions of second (with -I xxx)
615 	- optional CPU, core, or socket identifier
616 	- optional number of logical CPUs aggregated
617 	- counter value
618 	- unit of the counter value or empty
619 	- event name
620 	- run time of counter
621 	- percentage of measurement time the counter was running
622 	- optional variance if multiple values are collected with -r
623 	- optional metric value
624 	- optional unit of metric
628 include::intel-hybrid.txt[]
631 -----------
633 With -j, perf stat is able to print out a JSON format output
636 - timestamp : optional usec time stamp in fractions of second (with -I)
637 - optional aggregate options:
638 		- core : core identifier (with --per-core)
639 		- die : die identifier (with --per-die)
640 		- socket : socket identifier (with --per-socket)
641 		- node : node identifier (with --per-node)
642 		- thread : thread identifier (with --per-thread)
643 - counter-value : counter value
644 - unit : unit of the counter value or empty
645 - event : event name
646 - variance : optional variance if multiple values are collected (with -r)
647 - runtime : run time of counter
648 - metric-value : optional metric value
649 - metric-unit : optional unit of metric
652 --------
653 linkperf:perf-top[1], linkperf:perf-list[1]