perf/Documentation/topdown.txt

2 ---------------------
11 perf stat --topdown implements this using available metrics that vary
14 % perf stat -a --topdown -I1000
15 #           time      %  tma_retiring %  tma_backend_bound %  tma_frontend_bound %  tma_bad_specula…
38 On Ice Lake, there is a new fixed counter 3: SLOTS, which reports
39 "pipeline SLOTS" (cycles multiplied by core issue width) and a
40 metric register that reports slots ratios for the different bottleneck
52 The application opens a group with fixed counter 3 (SLOTS) and any
76 /* Open slots counter file descriptor for current task. */
77 struct perf_event_attr slots = {
84 int slots_fd = perf_event_open(&slots, 0, -1, -1, 0);
95  * Set slots event as the leader of the group.
104 int metrics_fd = perf_event_open(&metrics, 0, -1, slots_fd, 0);
118 to read slots and the topdown metrics at different points of the program:
147 _rdpmc calls should not be mixed with reading the metrics and slots counters
179 The ratios in the metric accumulate for the time when the counter
183 This can be done by scaling the metrics with the slots counter
184 read at the same time.
186 Then it's possible to take deltas of these slots counts
188 for that time period.
205 	retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a
206 	bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a
207 	fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
208 	be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
213 	slots_delta = slots_b - slots_a
236 	heavy_ops_slots = GET_METRIC(metric_b, 4) * slots_b - heavy_ops_slots_a
237 	br_mispredict_slots = GET_METRIC(metric_b, 5) * slots_b - br_mispredict_slots_a
238 	fetch_lat_slots = GET_METRIC(metric_b, 6) * slots_b - fetch_lat_slots_a
239 	mem_bound_slots = GET_METRIC(metric_b, 7) * slots_b - mem_bound_slots_a
241 	slots_delta = slots_b - slots_a
243 	light_ops_ratio = retiring_ratio - heavy_ops_ratio;
246 	machine_clears_ratio = bad_spec_ratio - br_mispredict_ratio;
249 	fetch_bw_ratio = fe_bound_ratio - fetch_lat_ratio;
252 	core_bound_ratio = be_bound_ratio - mem_bound_ratio;
271 short regions over time because the number of cycles covered by each
278 When using perf stat it is recommended to always use the -I option,
281 	perf stat -I 1000 --topdown ...
296 Four pseudo TopDown metric events are exposed for the end-users,
297 topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
300 - All the TopDown metric events must be in a group with the SLOTS event.
301 - The SLOTS event must be the leader of the group.
302 - The PERF_FORMAT_GROUP flag must be applied for each TopDown metric
305 The SLOTS event and the TopDown metric events can be counting members of
306 a sampling read group. Since the SLOTS event must be the leader of a TopDown
308 For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
314 The upper half is also divided into four 8-bit fields for the new level 2
315 metrics. Four more TopDown metric events are exposed for the end-users,
316 topdown-heavy-ops, topdown-br-mispredict, topdown-fetch-lat and
317 topdown-mem-bound.
323     Light_Operations = Retiring - Heavy_Operations
324     Machine_Clears = Bad_Speculation - Branch_Mispredicts
325     Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
326     Core_Bound = Backend_Bound - Memory_Bound
340 	perf record -e event_name -W ...
347 time are more accurate. Therefore, new TMA metrics that use TPEBS will provide
355 	perf stat -M metric_name --record-tpebs ...
359 [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
360 [2] https://sites.google.com/site/analysismethods/yasin-pubs
361 [3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
362 [4] https://github.com/andikleen/pmu-tools/tree/master/jevents