Lines Matching +full:max +full:- +full:memory +full:- +full:bandwidth

4         "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
11 "MetricExpr": "cstate_core@c3\\-residency@ / TSC",
18 "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
25 "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
32 "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
39 "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
46 "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
59 "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
73 …"BriefDescription": "This metric estimates how often memory load accesses were aliased by precedin…
78memory load accesses were aliased by preceding stores (in program order) with a 4K address offset.…
95 …er-cases for operations that cannot be handled natively by the execution pipeline. For example; wh…
100 …"MetricExpr": "1 - tma_frontend_bound - (UOPS_ISSUED.ANY + 4 * (INT_MISC.RECOVERY_CYCLES_ANY / 2 i…
105-of-order scheduler dispatches ready uops into their respective execution units; and once complete…
110 …"MetricExpr": "(UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (INT_MISC.RECOVERY_CYCLES_ANY / …
115-pipeline was blocked due to recovery from earlier incorrect speculation. For example; wasted work…
126 …etched from an incorrectly speculated program path; or stalls when the out-of-order part of the ma…
135 … corrected path; following all sorts of miss-predicted branches. For example; branchy code with lo…
140 "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
144 … as in the case of read-modify-write as an example. Since these instructions require multiple uops…
149 …"MetricExpr": "(1 - BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.…
157 …"BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handl…
163 …"PublicDescription": "This metric estimates fraction of cycles while the memory subsystem was hand…
167 …"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of…
169 "MetricExpr": "tma_backend_bound - tma_memory_bound",
174-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in s…
178 … metric estimates fraction of cycles while the memory subsystem was handling synchronizations due …
184 … metric estimates fraction of cycles while the memory subsystem was handling synchronizations due …
188 …"BriefDescription": "This metric represents fraction of cycles where decoder-0 was the only active…
189 …"MetricExpr": "(cpu@INST_DECODED.DECODERS\\,cmask\\=1@ - cpu@INST_DECODED.DECODERS\\,cmask\\=2@) /…
193 …"PublicDescription": "This metric represents fraction of cycles where decoder-0 was the only activ…
206 …his metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads",
208 …o_thread_clks + (CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread…
212 …s metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads. B…
217 "MetricExpr": "(IDQ.DSB_CYCLES_ANY - IDQ.DSB_CYCLES_OK) / tma_info_core_core_clks / 2",
230-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heav…
236 …SSES.STLB_HIT\\,cmask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYC…
240-aside Buffers) are processor caches for recently used entries out of the Page Tables that are use…
244 …: "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store …
249-level data TLB store misses. As with ordinary data caching; focus on improving data locality and…
259 …hreading hiccup; where multiple Logical Processors contend on different data-elements mapped into …
263 … of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to…
269memory access requests to proceed. The higher the metric value; the deeper the memory hierarchy le…
273 …": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues",
274 "MetricExpr": "tma_frontend_bound - tma_fetch_latency",
279bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for ca…
289 …he CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB mi…
295 "MetricExpr": "tma_heavy_operations - tma_microcode_sequencer",
299 …t are decoder into two or up to ([SNB+] four; [ADL+] five) uops. This highly-correlates with the n…
303 …"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations frac…
309-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may excee…
318 …ts. FP Assist may apply when working with very small floating point values (so-called Denormals).",
322 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction …
327 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction…
331 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction …
337 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction…
341 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors",
346 … approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors. May…
350 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors",
355 … approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors. May…
365-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into mi…
369 …represents fraction of slots where the CPU was retiring fused instructions -- where one uop can re…
374 …represents fraction of slots where the CPU was retiring fused instructions -- where one uop can re…
378 … slots where the CPU was retiring heavy-weight operations -- instructions that require two or more…
379 …"MetricExpr": "(UOPS_RETIRED.RETIRE_SLOTS + UOPS_RETIRED.MACRO_FUSED - INST_RETIRED.ANY) / tma_inf…
384 …he CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro
397 …"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative b…
401 …"PublicDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative …
411 …"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lo…
424 … "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
426 …"MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_ut…
432 …"BriefDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch…
437 …"PublicDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetc…
440 …"BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fet…
446 …"PublicDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fe…
449 …"BriefDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bott…
454 …"PublicDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bot…
457 …of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and B…
465 …"BriefDescription": "Total pipeline cost of instructions used for program control-flow - a subset …
470 …"PublicDescription": "Total pipeline cost of instructions used for program control-flow - a subset…
473 …"BriefDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottleneck…
478 …"PublicDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenec…
481 …"BriefDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks",
486 …"PublicDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks…
489 … "BriefDescription": "Total pipeline cost when the execution is compute-bound - an estimation",
494 …ine cost when the execution is compute-bound - an estimation. Covers Core Bound when High ILP as w…
497 …cost of instruction fetch bandwidth related bottlenecks (when the front-end could not sustain oper…
499- (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_…
510 …"PublicDescription": "Total pipeline cost of irregular execution (e.g. FP-assists in HPC, Wait tim…
513 …"BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-s…
515 …ma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + …
519 …"PublicDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-
522 …"BriefDescription": "Total pipeline cost of Memory Synchronization related bottlenecks (data trans…
523 …alse_sharing + tma_split_stores + tma_store_latency - tma_store_latency)) + tma_machine_clears * (…
527 …"PublicDescription": "Total pipeline cost of Memory Synchronization related bottlenecks (data tran…
532 …"MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispre…
539 "BriefDescription": "Total pipeline cost of remaining bottlenecks in the back-end",
540 …"MetricExpr": "100 - (tma_info_bottleneck_big_code + tma_info_bottleneck_instruction_fetch_bw + tm…
544 …aining bottlenecks in the back-end. Examples include data-dependencies (Core Bound when Low ILP) a…
547 …"BriefDescription": "Total pipeline cost of \"useful operations\" - the portion of Retiring catego…
548 … "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIR…
560 "BriefDescription": "Fraction of branches that are non-taken conditionals",
567 …"MetricExpr": "(BR_INST_RETIRED.CONDITIONAL - BR_INST_RETIRED.NOT_TAKEN) / BR_INST_RETIRED.ALL_BRA…
574 …"MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - (BR_INST_RETIRED.COND - BR_INST_RETIRED.NOT_TAKEN) -
585 "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
604 …BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardles…
608-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width…
611 …efDescription": "Instruction-Level-Parallelism (average number of uops executed when there is exec…
625 …tion": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_…
631 "BriefDescription": "Average number of Uops issued by front-end when it issued something",
643 …"BriefDescription": "Instructions per non-speculative DSB miss (lower number means higher occurren…
690 …"BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mean…
695 …"PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mea…
698 …"BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means h…
703 …"PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means …
706 …"BriefDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower num…
711 …"PublicDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower nu…
714 …"BriefDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower num…
719 …"PublicDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower nu…
773 "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
779 "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
785 "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
791 "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
797 … instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
803 … "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
821 "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
828 "MetricExpr": "1e3 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_RETIRED.ANY",
863 "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
869 "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
899 …"BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core…
905 "BriefDescription": "Un-cacheable retired load per kilo instruction",
911 …"BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is…
915 …"PublicDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there i…
918 … level TLB) code speculative misses per kilo instruction (misses of any page-size that complete th…
924 …l TLB) data load speculative misses per kilo instruction (misses of any page-size that complete th…
938 … TLB) data store speculative misses per kilo instruction (misses of any page-size that complete th…
944 …"BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is …
994 "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
998 …"PublicDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]. Relat…
1006 …egate across all supported options of: FP precisions, scalar and vector instructions, vector-width"
1029 "BriefDescription": "Average number of parallel data read requests to external memory",
1033 …"PublicDescription": "Average number of parallel data read requests to external memory. Accounts f…
1036 … "BriefDescription": "Average latency of data read request to external memory (in nanoseconds)",
1040 …of data read request to external memory (in nanoseconds). Accounts for demand loads and L1/L2 pref…
1044 …"MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / (CPU_CLK_UNHALTED.REF_XCLK_ANY / 2) if #S…
1061 … "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
1073 "BriefDescription": "The ratio of Executed- by Issued-Uops",
1077 …"PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop m…
1086 …"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor…
1116 …"MetricExpr": "max((CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS) / tma_info_thr…
1120 … TLB. These cases are characterized by execution unit stalls; while some non-completed demand load…
1125 …EM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETIRED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CY…
1129 …che. The short latency of the L1 data cache may be exposed in pointer-chasing memory access patter…
1135 … cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=1@) * ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALL…
1144 …"MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS) / tma_info_thread_c…
1170 …slots where the CPU was retiring light-weight operations -- instructions that require no more than…
1171 "MetricExpr": "tma_retiring - tma_heavy_operations",
1176-weight operations -- instructions that require no more than one uop (micro-operation). This corre…
1181 …HED_PORT.PORT_2 + UOPS_DISPATCHED_PORT.PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT…
1189 … the (first level) DTLB was missed by load accesses, that later on hit in second-level TLB (STLB)",
1191 "MetricExpr": "tma_dtlb_load - tma_load_stlb_miss",
1198 …"BriefDescription": "This metric estimates the fraction of cycles where the Second-level TLB (STLB…
1207 …"MetricExpr": "(12 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS.ALL_RFO) + MEM_INST_RETIRED.LOC…
1211 …re handling of locks; they are classified as L1_Bound regardless of what memory source satisfied t…
1217 "MetricExpr": "tma_bad_speculation - tma_branch_mispredicts",
1222-of-order portion of the machine needs to recover its state after the clear. For example; this can…
1226 …re's performance was likely hurt due to approaching bandwidth limits of external memory - DRAM ([S…
1231bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM). The underlying heuristic assum…
1235 …here the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or …
1236 …EAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tma_info_thread_clks - tma_mem_bandwidth",
1240 …here the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or …
1244 …"BriefDescription": "This metric represents fraction of slots the Memory subsystem within the Back…
1251Memory subsystem within the Backend was a bottleneck. Memory Bound estimates fraction of slots wh…
1255 …c represents fraction of slots where the CPU was retiring memory operations -- uops for memory loa…
1283 …"MetricExpr": "(IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS) / tma_info_core_core_cl…
1287 …re-cached in the DSB or LSD. For example; inefficiencies due to asymmetric decoders; use of long i…
1291 …n terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [A…
1296 …n terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [A…
1305 … Commonly used instructions are optimized for delivery by the DSB (decoded i-cache) or MITE (legac…
1310 …"MetricExpr": "tma_light_operations * (BR_INST_RETIRED.ALL_BRANCHES - UOPS_RETIRED.MACRO_FUSED) / …
1314 …lots where the CPU was retiring branch instructions that were not fused. Non-conditional branches …
1323 …o op) instructions. Compilers often use NOPs for certain address alignments - e.g. start address o…
1327 …is metric represents the remaining light uops fraction the CPU has executed - remaining means not …
1328 …"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_fused_ins…
1332 …is metric represents the remaining light uops fraction the CPU has executed - remaining means not …
1336 …action of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches …
1337 …"MetricExpr": "max(tma_branch_mispredicts * (1 - BR_MISP_RETIRED.ALL_BRANCHES / (INT_MISC.CLEARS_C…
1344 …raction of slots the CPU has wasted due to Nukes (Machine Clears) not related to memory ordering.",
1345 …"MetricExpr": "max(tma_machine_clears * (1 - MACHINE_CLEARS.MEMORY_ORDERING / MACHINE_CLEARS.COUNT…
1370 …ion of cycles CPU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads…
1375 …ion of cycles CPU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads…
1379 …ion of cycles CPU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads…
1384 …ion of cycles CPU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads…
1388 …is metric represents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data)",
1393 …sents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data). Sample with: U…
1415 …ents Core fraction of cycles CPU dispatched uops on execution port 7 ([HSW+]simple Store-address)",
1420 …action of cycles CPU dispatched uops on execution port 7 ([HSW+]simple Store-address). Sample with…
1424 … the CPU performance was potentially limited due to Core computation issues (non divider-related)",
1425 … tma_info_thread_clks if ARITH.DIVIDER_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALL…
1429-related). Two distinct categories can be attributed into this metric: (1) heavy data-dependency …
1438 …t (Logical Processor cycles since ICL, Physical Core cycles otherwise). Long-latency instructions …
1443 …"MetricExpr": "((UOPS_EXECUTED.CORE_CYCLES_GE_1 - UOPS_EXECUTED.CORE_CYCLES_GE_2) / 2 if #SMT_on e…
1447-dependency among software instructions; or over oversubscribing a particular hardware resource. I…
1452 …"MetricExpr": "((UOPS_EXECUTED.CORE_CYCLES_GE_2 - UOPS_EXECUTED.CORE_CYCLES_GE_3) / 2 if #SMT_on e…
1456 …cal Core cycles otherwise). Loop Vectorization -most compilers feature auto-Vectorization options…
1474 …ions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is …
1478 …"BriefDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled …
1483 …ycles the CPU issue-pipeline was stalled due to serializing operations. Instructions like CPUID; W…
1487 … "This metric estimates fraction of cycles handling memory load split accesses - load that cross 6…
1493 … "This metric estimates fraction of cycles handling memory load split accesses - load that cross 6…
1502 …resents rate of split store accesses. Consider aligning your data to the 64-byte cache line granu…
1506 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
1511 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
1515 …estimates how often CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-
1520 …O store memory accesses; RFO store issue a read-for-ownership request before the write. Even thoug…
1524 …"BriefDescription": "This metric roughly estimates fraction of cycles when the memory subsystem ha…
1529memory subsystem had loads blocked since they could not forward data from earlier (in program orde…
1535 …"MetricExpr": "(L2_RQSTS.RFO_HIT * 9 * (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STO…
1539-of-order core performance; however; holding resources for longer time can lead into undesired imp…
1551 …tion of cycles where the TLB was missed by store accesses, hitting in the second-level TLB (STLB)",
1552 "MetricExpr": "tma_dtlb_store - tma_store_stlb_miss",
1586 "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
1593 "MetricExpr": "(cycles\\-t / el\\-start if has_event(el\\-start) else 0)",
1600 "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
1607 "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",