perf/Documentation/perf-stat.txt

1 perf-stat(1)
5 ----
6 perf-stat - Run a command and gather performance counter statistics
9 --------
11 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf stat' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
13 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] \-- <command> [<options>]
14 'perf stat' report [-i file]
17 -----------
23 -------
33 -e::
34 --event=::
37 	- a symbolic event name (use 'perf list' to list all events)
39 	- a raw PMU event in the form of rN where N is a hexadecimal value
42 	  /sys/bus/event_source/devices/cpu/format/*.
44         - a symbolic or raw PMU event followed by an optional colon
45 	  and a list of event modifiers, e.g., cpu-cycles:p.  See the
46 	  linkperf:perf-list[1] man page for details on event modifiers.
48 	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
54 	  perf stat -A -a -e cpu/event,percore=1/,otherevent ...
56 	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
69 -i::
70 --no-inherit::
72 -p::
73 --pid=<pid>::
76 -t::
77 --tid=<tid>::
80 -b::
81 --bpf-prog::
83         requiring root rights. bpftool-prog could be used to find program
86   # bpftool prog | head -n 1
89   # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
94              28,982      instructions              #    0.34  insn per cycle
98 --bpf-counters::
100 	allows multiple perf-stat sessions that are counting the same metric (cycles,
103 	"perf config stat.bpf-counter-events=<list_of_events>".
105 --bpf-attr-map::
106 	With option "--bpf-counters", different perf-stat sessions share
108 	Use "--bpf-attr-map" to specify the path of this pinned hashmap.
112 --pfm-events events::
114 including support for event filters. For example '--pfm-events
117 events cannot be mixed together. The latter must be used with the -e
118 option. The -e option and this one can be mixed and matched.  Events
122 -a::
123 --all-cpus::
124         system-wide collection from all CPUs (default if no target is specified)
126 --no-scale::
129 -d::
130 --detailed::
133 	   -d:          detailed events, L1 and LLC data cache
134         -d -d:     more detailed events, dTLB and iTLB events
135      -d -d -d:     very detailed events, adding prefetch events
137 -r::
138 --repeat=<n>::
141 -B::
142 --big-num::
144 	Enabled by default. Use "--no-big-num" to disable.
145 	Default setting can be changed with "perf config stat.big-num=false".
147 -C::
148 --cpu=::
150 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
151 In per-thread mode, this option is ignored. The -a option is still necessary
152 to activate system-wide monitoring. Default is to count on all CPUs.
154 -A::
155 --no-aggr::
158 -n::
159 --null::
160 null run - Don't start any counters.
162 This can be useful to measure just elapsed wall-clock time - or to assess the
165 -v::
166 --verbose::
169 -x SEP::
170 --field-separator SEP::
171 print counts using a CSV-style output to make it easy to import directly into
174 --table:: Display time for each run (-r option), in a table format, e.g.:
176   $ perf stat --null -r 5 --table perf bench sched pipe
181              5.189 (-0.293) #
182              5.189 (-0.294) #
183              5.186 (-0.296) #
188              5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )
190 -G name::
191 --cgroup name::
193 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
197 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
200 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
203 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
205 --for-each-cgroup name::
208 effect that repeating -e option and -G option for each event x name.  This option
209 cannot be used with -G/--cgroup option.
211 -o file::
212 --output file::
215 --append::
216 Append to the output file designated with the -o option. Ignored if -o is not specified.
218 --log-fd::
220 Log output to fd, instead of stderr.  Complementary to --output, and mutually exclusive
221 with it.  --append may be used here.  Examples:
222      3>results  perf stat --log-fd 3          \-- $cmd
223      3>>results perf stat --log-fd 3 --append \-- $cmd
225 --control=fifo:ctl-fifo[,ack-fifo]::
226 --control=fd:ctl-fd[,ack-fd]::
227 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
228 Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
230 --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
239  test -p ${ctl_fifo} && unlink ${ctl_fifo}
244  test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
248  perf stat -D -1 -e cpu-cycles -a -I 1000       \
249            --control fd:${ctl_fd},${ctl_fd_ack} \
250            \-- sleep 30 &
253  sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
254  sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
256  exec {ctl_fd_ack}>&-
259  exec {ctl_fd}>&-
262  wait -n ${perf_pid}
266 --pre::
267 --post::
270 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \-- make -s -j64 O=defc…
272 -I msecs::
273 --interval-print msecs::
276 	example: 'perf stat -I 1000 -e cycles -a sleep 5'
280 --interval-count times::
282 This option should be used together with "-I" option.
283 	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
285 --interval-clear::
288 --timeout msecs::
290 This option is not supported with the "-I" option.
291 	example: 'perf stat --time 2000 -e cycles -a'
293 --metric-only::
295 Don't show any raw values. Not supported with --per-thread.
297 --per-socket::
298 Aggregate counts per processor socket for system-wide mode measurements.  This
300 use --per-socket in addition to -a. (system-wide).  The output includes the
304 --per-die::
305 Aggregate counts per processor die for system-wide mode measurements.  This
307 use --per-die in addition to -a. (system-wide).  The output includes the
311 --per-cluster::
312 Aggregate counts per processor cluster for system-wide mode measurement.  This
314 use --per-cluster in addition to -a. (system-wide).  The output includes the
317 related CPUs can be gotten from /sys/devices/system/cpu/cpuX/topology/cluster_{id, cpus}.
319 --per-cache::
320 Aggregate counts per cache instance for system-wide mode measurements.  By
323 alongside the option in the format [Ll][1-9][0-9]*. For example:
324 Using option "--per-cache=l3" or "--per-cache=L3" will aggregate the
327 --per-core::
328 Aggregate counts per physical processor for system-wide mode measurements.  This
330 use --per-core in addition to -a. (system-wide).  The output includes the
333 --per-thread::
334 Aggregate counts per monitored threads, when monitoring threads (-t option)
335 or processes (-p option).
337 --per-node::
338 Aggregate counts per NUMA nodes for system-wide mode measurements. This
340 mode, use --per-node in addition to -a. (system-wide).
342 -D msecs::
343 --delay msecs::
344 After starting the program, wait msecs before measuring (-1: start with events
348 -T::
349 --transaction::
353 --metric-no-group::
356 --metric-no-group option places events outside of groups and may
357 increase the chance of the event being scheduled - leading to more
359 for metrics like instructions per cycle can be lower - as both metrics
362 --metric-no-merge::
372 --metric-no-threshold::
381 --quiet::
386 -----------
389 -o file::
390 --output file::
394 -----------
397 -i file::
398 --input file::
401 --per-socket::
402 Aggregate counts per processor socket for system-wide mode measurements.
404 --per-die::
405 Aggregate counts per processor die for system-wide mode measurements.
407 --per-cluster::
408 Aggregate counts perf processor cluster for system-wide mode measurements.
410 --per-cache::
411 Aggregate counts per cache instance for system-wide mode measurements.  By
414 alongside the option in the format [Ll][1-9][0-9]*. For example: Using
415 option "--per-cache=l3" or "--per-cache=L3" will aggregate the
418 --per-core::
419 Aggregate counts per physical processor for system-wide mode measurements.
421 -M::
422 --metrics::
434 -A::
435 --no-aggr::
436 --no-merge::
447    CPU. For example, a system with 8 SMT threads will have one event
459 --hybrid-merge::
463 a behavior closer to having a single CPU type in the system.
465 --topdown::
466 Print top-down metrics supported by the CPU. This allows to determine
467 bottle necks in the CPU pipeline for CPU bound workloads, by breaking
471 Frontend bound means that the CPU cannot fetch and decode instructions fast
473 neck. Bad Speculation means that the CPU wasted cycles due to branch
474 mispredictions and similar issues. Retiring means that the CPU computed without
476 if the workload is actually bound by the CPU and not by something else.
479 mode like -I 1000, as the bottleneck of workloads can change often.
481 This enables --metric-only, unless overridden with --no-metric-only.
486 The top down metrics are collected per core instead of per
487 CPU thread. Per core mode is automatically enabled
488 and -a (global monitoring) is needed, requiring root rights or
489 perf.perf_event_paranoid=-1.
501 --record-tpebs::
509 --td-level::
510 Print the top-down statistics that equal the input level. It allows
511 users to print the interested top-down metrics level instead of the
512 level 1 top-down metrics.
520 'perf stat -M tma_frontend_bound_group...'.
524 --smi-cost::
527 During the measurement, the /sys/device/cpu/freeze_on_smi will be set to
530 The cost of SMI can be measured by (aperf - unhalted core cycles).
533 oriented analysis. --metric_only will be applied by default.
534 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
536 Users who wants to get the actual value can apply --no-metric-only.
538 --all-kernel::
541 --all-user::
544 --percore-show-thread::
546 for all hardware threads in a core and show the counts per core.
549 counts for all hardware threads in a core but show the sum counts per
553 --summary::
554 Print summary for interval mode (-I).
556 --no-csv-summary::
558 This option must be used with -x and --summary.
561 'stat.no-csv-summary'.
563 $ perf config stat.no-csv-summary=true
565 --cputype::
566 Only enable events on applying cpu with this type for hybrid platform
570 --------
572 $ perf stat \-- make
576         83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
577                    0      context-switches:u        #    0.000 K/sec
578                    0      cpu-migrations:u          #    0.000 K/sec
579            3,228,188      page-faults:u             #    0.039 M/sec
581      313,163,853,778      instructions:u            #    1.36  insn per cycle
583        2,078,861,393      branch-misses:u           #    2.98% of all branches
591 -------
606 ----------
608 With -x, perf stat is able to output a not-quite-CSV format output
610 it is recommended to use a different character like -x \;
614 	- optional usec time stamp in fractions of second (with -I xxx)
615 	- optional CPU, core, or socket identifier
616 	- optional number of logical CPUs aggregated
617 	- counter value
618 	- unit of the counter value or empty
619 	- event name
620 	- run time of counter
621 	- percentage of measurement time the counter was running
622 	- optional variance if multiple values are collected with -r
623 	- optional metric value
624 	- optional unit of metric
628 include::intel-hybrid.txt[]
631 -----------
633 With -j, perf stat is able to print out a JSON format output
636 - timestamp : optional usec time stamp in fractions of second (with -I)
637 - optional aggregate options:
638 		- core : core identifier (with --per-core)
639 		- die : die identifier (with --per-die)
640 		- socket : socket identifier (with --per-socket)
641 		- node : node identifier (with --per-node)
642 		- thread : thread identifier (with --per-thread)
643 - counter-value : counter value
644 - unit : unit of the counter value or empty
645 - event : event name
646 - variance : optional variance if multiple values are collected (with -r)
647 - runtime : run time of counter
648 - metric-value : optional metric value
649 - metric-unit : optional unit of metric
652 --------
653 linkperf:perf-top[1], linkperf:perf-list[1]