Lines Matching +full:level +full:- +full:detect

1 perf-stat(1)
5 ----
6 perf-stat - Run a command and gather performance counter statistics
9 --------
11 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf stat' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
13 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] \-- <command> [<options>]
14 'perf stat' report [-i file]
17 -----------
23 -------
33 -e::
34 --event=::
37 - a symbolic event name (use 'perf list' to list all events)
39 - a raw PMU event in the form of rN where N is a hexadecimal value
44 - a symbolic or raw PMU event followed by an optional colon
45 and a list of event modifiers, e.g., cpu-cycles:p. See the
46 linkperf:perf-list[1] man page for details on event modifiers.
48 - a symbolically formed event like 'pmu/param1=0x3,param2/' where
54 perf stat -A -a -e cpu/event,percore=1/,otherevent ...
56 - a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
69 -i::
70 --no-inherit::
72 -p::
73 --pid=<pid>::
76 -t::
77 --tid=<tid>::
80 -b::
81 --bpf-prog::
83 requiring root rights. bpftool-prog could be used to find program
86 # bpftool prog | head -n 1
89 # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
98 --bpf-counters::
100 allows multiple perf-stat sessions that are counting the same metric (cycles,
103 "perf config stat.bpf-counter-events=<list_of_events>".
105 --bpf-attr-map::
106 With option "--bpf-counters", different perf-stat sessions share
108 Use "--bpf-attr-map" to specify the path of this pinned hashmap.
112 --pfm-events events::
114 including support for event filters. For example '--pfm-events
117 events cannot be mixed together. The latter must be used with the -e
118 option. The -e option and this one can be mixed and matched. Events
122 -a::
123 --all-cpus::
124 system-wide collection from all CPUs (default if no target is specified)
126 --no-scale::
129 -d::
130 --detailed::
133 -d: detailed events, L1 and LLC data cache
134 -d -d: more detailed events, dTLB and iTLB events
135 -d -d -d: very detailed events, adding prefetch events
137 -r::
138 --repeat=<n>::
141 -B::
142 --big-num::
144 Enabled by default. Use "--no-big-num" to disable.
145 Default setting can be changed with "perf config stat.big-num=false".
147 -C::
148 --cpu=::
150 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
151 In per-thread mode, this option is ignored. The -a option is still necessary
152 to activate system-wide monitoring. Default is to count on all CPUs.
154 -A::
155 --no-aggr::
158 -n::
159 --null::
160 null run - Don't start any counters.
162 This can be useful to measure just elapsed wall-clock time - or to assess the
165 -v::
166 --verbose::
169 -x SEP::
170 --field-separator SEP::
171 print counts using a CSV-style output to make it easy to import directly into
174 --table:: Display time for each run (-r option), in a table format, e.g.:
176 $ perf stat --null -r 5 --table perf bench sched pipe
181 5.189 (-0.293) #
182 5.189 (-0.294) #
183 5.186 (-0.296) #
188 5.483 +- 0.198 seconds time elapsed ( +- 3.62% )
190 -G name::
191 --cgroup name::
193 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
197 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
200 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
203 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
205 --for-each-cgroup name::
208 effect that repeating -e option and -G option for each event x name. This option
209 cannot be used with -G/--cgroup option.
211 -o file::
212 --output file::
215 --append::
216 Append to the output file designated with the -o option. Ignored if -o is not specified.
218 --log-fd::
220 Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive
221 with it. --append may be used here. Examples:
222 3>results perf stat --log-fd 3 \-- $cmd
223 3>>results perf stat --log-fd 3 --append \-- $cmd
225 --control=fifo:ctl-fifo[,ack-fifo]::
226 --control=fd:ctl-fd[,ack-fd]::
227 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
228 Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
230 --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
239 test -p ${ctl_fifo} && unlink ${ctl_fifo}
244 test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
248 perf stat -D -1 -e cpu-cycles -a -I 1000 \
249 --control fd:${ctl_fd},${ctl_fd_ack} \
250 \-- sleep 30 &
253 sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
254 sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
256 exec {ctl_fd_ack}>&-
259 exec {ctl_fd}>&-
262 wait -n ${perf_pid}
266 --pre::
267 --post::
270 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \-- make -s -j64 O=defc…
272 -I msecs::
273 --interval-print msecs::
276 example: 'perf stat -I 1000 -e cycles -a sleep 5'
280 --interval-count times::
282 This option should be used together with "-I" option.
283 example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
285 --interval-clear::
288 --timeout msecs::
290 This option is not supported with the "-I" option.
291 example: 'perf stat --time 2000 -e cycles -a'
293 --metric-only::
295 Don't show any raw values. Not supported with --per-thread.
297 --per-socket::
298 Aggregate counts per processor socket for system-wide mode measurements. This
299 is a useful mode to detect imbalance between sockets. To enable this mode,
300 use --per-socket in addition to -a. (system-wide). The output includes the
304 --per-die::
305 Aggregate counts per processor die for system-wide mode measurements. This
306 is a useful mode to detect imbalance between dies. To enable this mode,
307 use --per-die in addition to -a. (system-wide). The output includes the
311 --per-cluster::
312 Aggregate counts per processor cluster for system-wide mode measurement. This
313 is a useful mode to detect imbalance between clusters. To enable this mode,
314 use --per-cluster in addition to -a. (system-wide). The output includes the
319 --per-cache::
320 Aggregate counts per cache instance for system-wide mode measurements. By
321 default, the aggregation happens for the cache level at the highest index
322 in the system. To specify a particular level, mention the cache level
323 alongside the option in the format [Ll][1-9][0-9]*. For example:
324 Using option "--per-cache=l3" or "--per-cache=L3" will aggregate the
325 information at the boundary of the level 3 cache in the system.
327 --per-core::
328 Aggregate counts per physical processor for system-wide mode measurements. This
329 is a useful mode to detect imbalance between physical cores. To enable this mode,
330 use --per-core in addition to -a. (system-wide). The output includes the
333 --per-thread::
334 Aggregate counts per monitored threads, when monitoring threads (-t option)
335 or processes (-p option).
337 --per-node::
338 Aggregate counts per NUMA nodes for system-wide mode measurements. This
339 is a useful mode to detect imbalance between NUMA nodes. To enable this
340 mode, use --per-node in addition to -a. (system-wide).
342 -D msecs::
343 --delay msecs::
344 After starting the program, wait msecs before measuring (-1: start with events
348 -T::
349 --transaction::
353 --metric-no-group::
356 --metric-no-group option places events outside of groups and may
357 increase the chance of the event being scheduled - leading to more
359 for metrics like instructions per cycle can be lower - as both metrics
362 --metric-no-merge::
372 --metric-no-threshold::
381 --quiet::
386 -----------
389 -o file::
390 --output file::
394 -----------
397 -i file::
398 --input file::
401 --per-socket::
402 Aggregate counts per processor socket for system-wide mode measurements.
404 --per-die::
405 Aggregate counts per processor die for system-wide mode measurements.
407 --per-cluster::
408 Aggregate counts perf processor cluster for system-wide mode measurements.
410 --per-cache::
411 Aggregate counts per cache instance for system-wide mode measurements. By
412 default, the aggregation happens for the cache level at the highest index
413 in the system. To specify a particular level, mention the cache level
414 alongside the option in the format [Ll][1-9][0-9]*. For example: Using
415 option "--per-cache=l3" or "--per-cache=L3" will aggregate the
416 information at the boundary of the level 3 cache in the system.
418 --per-core::
419 Aggregate counts per physical processor for system-wide mode measurements.
421 -M::
422 --metrics::
434 -A::
435 --no-aggr::
436 --no-merge::
459 --hybrid-merge::
465 --topdown::
466 Print top-down metrics supported by the CPU. This allows to determine
479 mode like -I 1000, as the bottleneck of workloads can change often.
481 This enables --metric-only, unless overridden with --no-metric-only.
488 and -a (global monitoring) is needed, requiring root rights or
489 perf.perf_event_paranoid=-1.
501 --record-tpebs::
509 --td-level::
510 Print the top-down statistics that equal the input level. It allows
511 users to print the interested top-down metrics level instead of the
512 level 1 top-down metrics.
517 gathering all metrics for a level. For example, level 1 analysis may
520 'perf stat -M tma_frontend_bound_group...'.
522 Error out if the input is higher than the supported max level.
524 --smi-cost::
530 The cost of SMI can be measured by (aperf - unhalted core cycles).
533 oriented analysis. --metric_only will be applied by default.
534 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
536 Users who wants to get the actual value can apply --no-metric-only.
538 --all-kernel::
541 --all-user::
544 --percore-show-thread::
553 --summary::
554 Print summary for interval mode (-I).
556 --no-csv-summary::
558 This option must be used with -x and --summary.
561 'stat.no-csv-summary'.
563 $ perf config stat.no-csv-summary=true
565 --cputype::
570 --------
572 $ perf stat \-- make
576 83723.452481 task-clock:u (msec) # 1.004 CPUs utilized
577 0 context-switches:u # 0.000 K/sec
578 0 cpu-migrations:u # 0.000 K/sec
579 3,228,188 page-faults:u # 0.039 M/sec
583 2,078,861,393 branch-misses:u # 2.98% of all branches
591 -------
606 ----------
608 With -x, perf stat is able to output a not-quite-CSV format output
610 it is recommended to use a different character like -x \;
614 - optional usec time stamp in fractions of second (with -I xxx)
615 - optional CPU, core, or socket identifier
616 - optional number of logical CPUs aggregated
617 - counter value
618 - unit of the counter value or empty
619 - event name
620 - run time of counter
621 - percentage of measurement time the counter was running
622 - optional variance if multiple values are collected with -r
623 - optional metric value
624 - optional unit of metric
628 include::intel-hybrid.txt[]
631 -----------
633 With -j, perf stat is able to print out a JSON format output
636 - timestamp : optional usec time stamp in fractions of second (with -I)
637 - optional aggregate options:
638 - core : core identifier (with --per-core)
639 - die : die identifier (with --per-die)
640 - socket : socket identifier (with --per-socket)
641 - node : node identifier (with --per-node)
642 - thread : thread identifier (with --per-thread)
643 - counter-value : counter value
644 - unit : unit of the counter value or empty
645 - event : event name
646 - variance : optional variance if multiple values are collected (with -r)
647 - runtime : run time of counter
648 - metric-value : optional metric value
649 - metric-unit : optional unit of metric
652 --------
653 linkperf:perf-top[1], linkperf:perf-list[1]