Lines Matching +full:double +full:- +full:phase
1 perf-c2c(1)
5 ----
6 perf-c2c - Shared Data C2C/HITM Analyzer.
9 --------
12 'perf c2c record' [<options>] \-- [<record command options>] <command>
16 -----------
27 required. See linkperf:perf-arm-spe[1] for a setup guide. Due to the
32 - memory address of the access
33 - type of the access (load and store details)
34 - latency (in cycles) of the load access
37 for cachelines with highest contention - highest number of HITM accesses.
39 The basic workflow with this tool follows the standard record/report phase.
45 --------------
46 -e::
47 --event=::
48 Select the PMU event. Use 'perf c2c record -e list'
51 -v::
52 --verbose::
55 -l::
56 --ldlat::
57 Configure mem-loads latency. Supported on Intel and Arm64 processors
60 -k::
61 --all-kernel::
64 -u::
65 --all-user::
69 --------------
70 -k::
71 --vmlinux=<file>::
74 -v::
75 --verbose::
78 -i::
79 --input::
82 -N::
83 --node-info::
86 -c::
87 --coalesce::
92 -g::
93 --call-graph::
95 Please refer to perf-report man page for details.
97 --stdio::
100 --stats::
103 --full-symbols::
106 --no-source::
109 --show-all::
112 -f::
113 --force::
116 -d::
117 --display::
122 --stitch-lbr::
125 perf c2c record --call-graph lbr.
133 --double-cl::
134 Group the detection of shared cacheline events into double cacheline
140 ----------
147 -W,-d,--phys-data,--sample-cpu
149 Unless specified otherwise with '-e' option, following events are monitored by
152 cpu/mem-loads,ldlat=30/P
153 cpu/mem-stores/P
161 cpu/mem-loads/
162 cpu/mem-stores/
164 User can pass any 'perf record' option behind '--' mark, like (to enable
167 $ perf c2c record -- -g -a
172 ----------
177 - sort all the data based on the cacheline address
178 - store access details for each cacheline
179 - sort all cachelines based on user settings
180 - display data
190 - zero based index to identify the cacheline
193 - cacheline address (hex number)
196 - cacheline percentage of all Remote/Local HITM accesses
199 - cacheline percentage of all peer accesses
201 LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
202 - count of Total/Local/Remote load HITMs
204 Load Peer - Total, Local, Remote (For display with peer type)
205 - count of Total/Local/Remote load from peer cache or DRAM
208 - sum of all cachelines accesses
211 - sum of all load accesses
214 - sum of all store accesses
216 Store Reference - L1Hit, L1Miss, N/A
217 L1Hit - store accesses that hit L1
218 L1Miss - store accesses that missed L1
219 N/A - store accesses with memory level is not available
221 Core Load Hit - FB, L1, L2
222 - count of load hits in FB (Fill Buffer), L1 and L2 cache
224 LLC Load Hit - LlcHit, LclHitm
225 - count of LLC load accesses, includes LLC hits and LLC HITMs
227 RMT Load Hit - RmtHit, RmtHitm
228 - count of remote load accesses, includes remote hits and remote HITMs;
232 Load Dram - Lcl, Rmt
233 - count of local and remote DRAM accesses
237 HITM - Rmt, Lcl (Display with HITM types)
238 - % of Remote/Local HITM accesses for given offset within cacheline
240 Peer Snoop - Rmt, Lcl (Display with peer type)
241 - % of Remote/Local peer accesses for given offset within cacheline
243 Store Refs - L1 Hit, L1 Miss, N/A
244 - % of store accesses that hit L1, missed L1 and N/A (no available) memory
247 Data address - Offset
248 - offset address
251 - pid of the process responsible for the accesses
254 - tid of the process responsible for the accesses
257 - code address responsible for the accesses
259 cycles - rmt hitm, lcl hitm, load (Display with HITM types)
260 - sum of cycles for given accesses - Remote/Local HITM and generic load
262 cycles - rmt peer, lcl peer, load (Display with peer type)
263 - sum of cycles for given accesses - Remote/Local peer load and generic load
266 - number of cpus that participated on the access
269 - code symbol related to the 'Code address' value
272 - shared object name related to the 'Code address' value
275 - source information related to the 'Code address' value
278 - nodes participating on the access (see NODE INFO section)
281 ---------
284 - node IDs separated by ','
285 - node IDs with stats for each ID, in following format:
288 - node IDs with list of affected CPUs in following format:
291 User can switch between above flavors with -N option or
295 --------
301 tid - coalesced by process TIDs
302 pid - coalesced by process PIDs
303 iaddr - coalesced by code address, following fields are displayed:
305 dso - coalesced by shared object
310 ------------
315 - overall statistics of memory accesses
318 - overall statistics on shared cachelines
321 - list of most expensive cachelines
324 - list of all accessed offsets for each cacheline
327 ----------
334 -------
340 --------
342 https://joemario.github.io/blog/2016/09/01/c2c-blog/
345 --------
346 linkperf:perf-record[1], linkperf:perf-mem[1], linkperf:perf-arm-spe[1]