x86/cascadelakex/clx-metrics.json

4         "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
5         "MetricGroup": "Power",
11         "MetricExpr": "cstate_core@c3\\-residency@ / TSC",
12         "MetricGroup": "Power",
18         "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
19         "MetricGroup": "Power",
25         "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
26         "MetricGroup": "Power",
32         "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
33         "MetricGroup": "Power",
39         "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
40         "MetricGroup": "Power",
46         "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
47         "MetricGroup": "Power",
70         "BriefDescription": "Percentage of time spent in the active CPU power state C0",
171 …"BriefDescription": "Average latency of a last level cache (LLC) demand and prefetch data read mis…
177 …"BriefDescription": "Average latency of a last level cache (LLC) demand and prefetch data read mis…
183 …"BriefDescription": "Average latency of a last level cache (LLC) demand and prefetch data read mis…
261 …"BriefDescription": "Uops delivered from legacy decode pipeline (Micro-instruction Translation Eng…
292         "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
317 …sible; which incur a few cycles load re-issue. However; the short re-issue duration is often hidde…
334 …-cases for operations that cannot be handled natively by the execution pipeline. For example; when…
339 …"MetricExpr": "1 - tma_frontend_bound - (UOPS_ISSUED.ANY + 4 * (INT_MISC.RECOVERY_CYCLES_ANY / 2 i…
344 …-of-order scheduler dispatches ready uops into their respective execution units; and once complete…
349 …"MetricExpr": "(UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (INT_MISC.RECOVERY_CYCLES_ANY / …
354 …s for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For…
365 …ram path; or stalls when the out-of-order part of the machine needs to recover its state from a sp…
374 …-predicted branches. For example; branchy code with lots of miss-predictions might get categorized…
379         "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
383 … as in the case of read-modify-write as an example. Since these instructions require multiple uops…
388 …"MetricExpr": "(1 - BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS.…
392 …he CPU was stalled due to Branch Resteers as a result of Machine Clears. Sample with: INT_MISC.CLE…
402 … true data sharing such as modified locked variables; and false sharing. Sample with: MEM_LOAD_L3_…
406 …"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of…
408         "MetricExpr": "tma_backend_bound - tma_memory_bound",
413 …-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in s…
417 …n of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
419 … (MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT + MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM * (1 - OCR.DEMAND_DATA_RD.L…
423 …-sharing accesses. Data shared by multiple Logical Processors (even just read shared) may cause in…
427 …"BriefDescription": "This metric represents fraction of cycles where decoder-0 was the only active…
428 …"MetricExpr": "(cpu@INST_DECODED.DECODERS\\,cmask\\=1@ - cpu@INST_DECODED.DECODERS\\,cmask\\=2@) /…
432 …"PublicDescription": "This metric represents fraction of cycles where decoder-0 was the only activ…
441 …than integer or Floating Point addition; subtraction; or multiplication. Sample with: ARITH.DIVIDE…
447 …- CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_clks - tma_l2_bound - tma_pmm_bound if #has_pme…
451 … loads. Better caching can improve the latency and increase performance. Sample with: MEM_LOAD_RET…
456         "MetricExpr": "(IDQ.DSB_CYCLES_ANY - IDQ.DSB_CYCLES_OK) / tma_info_core_core_clks / 2",
469 …-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heav…
475 …mask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLE…
479 …-aside Buffers) are processor caches for recently used entries out of the Page Tables that are use…
483 …: "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store …
488 …-level data TLB store misses.  As with ordinary data caching; focus on improving data locality and…
498 …ultiple Logical Processors contend on different data-elements mapped into the same cache line. Sam…
513         "MetricExpr": "tma_frontend_bound - tma_fetch_latency",
518 …he Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RET…
528 …-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Fron…
534         "MetricExpr": "tma_heavy_operations - tma_microcode_sequencer",
538 …t are decoder into two or up to ([SNB+] four; [ADL+] five) uops. This highly-correlates with the n…
542 …"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations frac…
548 …-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may excee…
557 …ts. FP Assist may apply when working with very small floating point values (so-called Denormals).",
561 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction …
566 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction…
570 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction …
576 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction…
580 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors",
585 … approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors. May…
589 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors",
594 … approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors. May…
598 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 512-bit wide vectors",
603 … approximates arithmetic FP vector uops fraction the CPU has retired for 512-bit wide vectors. May…
613 …-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into mi…
617 …represents fraction of slots where the CPU was retiring fused instructions -- where one uop can re…
622 …represents fraction of slots where the CPU was retiring fused instructions -- where one uop can re…
626 … slots where the CPU was retiring heavy-weight operations -- instructions that require two or more…
627 …"MetricExpr": "(UOPS_RETIRED.RETIRE_SLOTS + UOPS_RETIRED.MACRO_FUSED - INST_RETIRED.ANY) / tma_inf…
632 …he CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro…
641 … fraction of cycles the CPU was stalled due to instruction cache misses. Sample with: FRONTEND_RET…
645 …"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative b…
649 …"PublicDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative …
659 …"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lo…
672 …      "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
674 …"MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_ut…
680 …"BriefDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch…
685 …"PublicDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetc…
688 …"BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fet…
694 …"PublicDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fe…
697 …"BriefDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bott…
702 …"PublicDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bot…
705 …of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and B…
713 …"BriefDescription": "Total pipeline cost of instructions used for program control-flow - a subset …
718 …"PublicDescription": "Total pipeline cost of instructions used for program control-flow - a subset…
721 …"BriefDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottleneck…
726 …"PublicDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenec…
729 …"BriefDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks",
734 …"PublicDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks…
737 …     "BriefDescription": "Total pipeline cost when the execution is compute-bound - an estimation",
742 …ine cost when the execution is compute-bound - an estimation. Covers Core Bound when High ILP as w…
745 …tch bandwidth related bottlenecks (when the front-end could not sustain operations delivery to the…
747 …- (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_…
758 …"PublicDescription": "Total pipeline cost of irregular execution (e.g. FP-assists in HPC, Wait tim…
761 …ription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
767 …"Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs). Related m…
771 …alse_sharing + tma_split_stores + tma_store_latency - tma_store_latency)) + tma_machine_clears * (…
780 …"MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispre…
787         "BriefDescription": "Total pipeline cost of remaining bottlenecks in the back-end",
788 …"MetricExpr": "100 - (tma_info_bottleneck_big_code + tma_info_bottleneck_instruction_fetch_bw + tm…
792 …aining bottlenecks in the back-end. Examples include data-dependencies (Core Bound when Low ILP) a…
795 …"BriefDescription": "Total pipeline cost of \"useful operations\" - the portion of Retiring catego…
796 … "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIR…
808         "BriefDescription": "Fraction of branches that are non-taken conditionals",
815 …"MetricExpr": "(BR_INST_RETIRED.CONDITIONAL - BR_INST_RETIRED.NOT_TAKEN) / BR_INST_RETIRED.ALL_BRA…
822 …"MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - (BR_INST_RETIRED.COND - BR_INST_RETIRED.NOT_TAKEN) - …
833         "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
841         "MetricGroup": "Power",
852 …BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardles…
856 …-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width…
859 …efDescription": "Instruction-Level-Parallelism (average number of uops executed when there is exec…
873 …"BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch un…
879         "BriefDescription": "Average number of Uops issued by front-end when it issued something",
885         "BriefDescription": "Average Latency for L1 instruction cache misses",
891 …"BriefDescription": "Instructions per non-speculative DSB miss (lower number means higher occurren…
926 …   "PublicDescription": "Total number of retired Instructions. Sample with: INST_RETIRED.PREC_DIST"
938 …"BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mean…
943 …"PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mea…
946 …"BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means h…
951 …"PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means …
954 …"BriefDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means hi…
959 …"PublicDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means h…
962 …"BriefDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower num…
967 …"PublicDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower nu…
970 …"BriefDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower num…
975 …"PublicDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower nu…
1035         "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
1041         "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
1059         "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
1065         "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
1071 … instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
1077 …      "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
1095         "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
1102         "MetricExpr": "1e3 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_RETIRED.ANY",
1137         "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
1143         "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
1155         "BriefDescription": "Average Parallel L2 cache miss data reads",
1161         "BriefDescription": "Average Latency for L2 cache miss demand Loads",
1167         "BriefDescription": "Average Parallel L2 cache miss demand Loads",
1173 …"BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core…
1179         "BriefDescription": "Un-cacheable retired load per kilo instruction",
1185 …"BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is…
1189 …ublicDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is …
1192 … level TLB) code speculative misses per kilo instruction (misses of any page-size that complete th…
1198 …l TLB) data load speculative misses per kilo instruction (misses of any page-size that complete th…
1212 … TLB) data store speculative misses per kilo instruction (misses of any page-size that complete th…
1218 …"BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is …
1224         "BriefDescription": "Average number of uops fetched from DSB per cycle",
1230         "BriefDescription": "Average number of uops fetched from MITE per cycle",
1244 …"BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
1250         "BriefDescription": "Measured Average Core Frequency for unhalted processors [GHz]",
1252         "MetricGroup": "Power;Summary",
1256         "BriefDescription": "Average CPU Utilization (percentage)",
1262         "BriefDescription": "Average number of utilized CPUs",
1268         "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
1272 …"PublicDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]. Relat…
1280 …egate across all supported options of: FP precisions, scalar and vector instructions, vector-width"
1283         "BriefDescription": "Average IO (network or disk) Bandwidth Use for Reads [GB / sec]",
1287 …"PublicDescription": "Average IO (network or disk) Bandwidth Use for Reads [GB / sec]. Bandwidth o…
1290         "BriefDescription": "Average IO (network or disk) Bandwidth Use for Writes [GB / sec]",
1294 …"PublicDescription": "Average IO (network or disk) Bandwidth Use for Writes [GB / sec]. Bandwidth …
1317 …"BriefDescription": "Average latency of data read request to external DRAM memory [in nanoseconds]…
1321 …cDescription": "Average latency of data read request to external DRAM memory [in nanoseconds]. Acc…
1324         "BriefDescription": "Average number of parallel data read requests to external memory",
1328 …"PublicDescription": "Average number of parallel data read requests to external memory. Accounts f…
1331 …"BriefDescription": "Average latency of data read request to external 3D X-Point memory [in nanose…
1335 …scription": "Average latency of data read request to external 3D X-Point memory [in nanoseconds]. …
1338 …    "BriefDescription": "Average latency of data read request to external memory (in nanoseconds)",
1342 …tion": "Average latency of data read request to external memory (in nanoseconds). Accounts for dem…
1345         "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [GB / sec]",
1351         "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes [GB / sec]",
1357 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for ba…
1359         "MetricGroup": "Power",
1361 … was running with power-delivery for baseline license level 0.  This includes non-AVX codes, SSE, …
1364 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for li…
1366         "MetricGroup": "Power",
1369 …as running with power-delivery for license level 1.  This includes high current AVX 256-bit instru…
1372 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for li…
1374         "MetricGroup": "Power",
1377 …ere the core was running with power-delivery for license level 2 (introduced in SKX).  This includ…
1381 …"MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / (CPU_CLK_UNHALTED.REF_XCLK_ANY / 2) if #S…
1392         "BriefDescription": "Average Frequency Utilization relative nominal frequency",
1394         "MetricGroup": "Power",
1398         "BriefDescription": "Measured Average Uncore Frequency for the SoC [GHz]",
1404 …   "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
1416         "BriefDescription": "The ratio of Executed- by Issued-Uops",
1420 …"PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop m…
1429 …"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor…
1454 …tion of cycles the CPU was stalled due to Instruction TLB (ITLB) misses. Sample with: FRONTEND_RET…
1459 …"MetricExpr": "max((CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS) / tma_info_thr…
1463 …t stalls; while some non-completed demand load lives in the machine without having that demand loa…
1468 …EM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETIRED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CY…
1472 …ncy of the L1 data cache may be exposed in pointer-chasing memory access patterns as an example. S…
1478 … cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=1@) * ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALL…
1482 ….e. L1 misses/L2 hits) can improve the latency and increase performance. Sample with: MEM_LOAD_RET…
1487 …"MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS) / tma_info_thread_c…
1491 ….e. L2 misses/L3 hits) can improve the latency and increase performance. Sample with: MEM_LOAD_RET…
1500 …performance.  Note the value of this node may overlap with its siblings. Sample with: MEM_LOAD_RET…
1513 …slots where the CPU was retiring light-weight operations -- instructions that require no more than…
1514         "MetricExpr": "tma_retiring - tma_heavy_operations",
1519 …-weight operations -- instructions that require no more than one uop (micro-operation). This corre…
1524 …HED_PORT.PORT_2 + UOPS_DISPATCHED_PORT.PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT…
1528 …ion of cycles CPU dispatched uops on execution port for Load operations. Sample with: UOPS_DISPATC…
1532 … the (first level) DTLB was missed by load accesses, that later on hit in second-level TLB (STLB)",
1534         "MetricExpr": "tma_dtlb_load - tma_load_stlb_miss",
1541 …"BriefDescription": "This metric estimates the fraction of cycles where the Second-level TLB (STLB…
1554 …local memory. Caching will improve the latency and increase performance. Sample with: MEM_LOAD_L3_…
1559 …"MetricExpr": "(12 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS.ALL_RFO) + MEM_INST_RETIRED.LOC…
1563 … classified as L1_Bound regardless of what memory source satisfied them. Sample with: MEM_INST_RET…
1569         "MetricExpr": "tma_bad_speculation - tma_branch_mispredicts",
1574 …-of-order portion of the machine needs to recover its state after the clear. For example; this can…
1578 …as likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM…
1583 …- DRAM ([SPR-HBM] and/or HBM).  The underlying heuristic assumes that a similar off-core traffic i…
1587 …e the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM…
1588 …EAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tma_info_thread_clks - tma_mem_bandwidth",
1592 …e the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM…
1603 …o demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory d…
1607 … represents fraction of slots where the CPU was retiring memory operations -- uops for memory load…
1621 …odes (like in Floating Point assists). These cases can often be avoided. Sample with: IDQ.MS_UOPS.…
1630 … Branch Resteers as a result of Branch Misprediction at execution stage. Sample with: INT_MISC.CLE…
1635 …"MetricExpr": "(IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS) / tma_info_core_core_cl…
1639 …-cached in the DSB or LSD. For example; inefficiencies due to asymmetric decoders; use of long imm…
1643 …n terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [A…
1648 …n terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [A…
1657 …-cache) or MITE (legacy instruction decode) pipelines. Certain operations cannot be handled native…
1662 …"MetricExpr": "tma_light_operations * (BR_INST_RETIRED.ALL_BRANCHES - UOPS_RETIRED.MACRO_FUSED) / …
1666 …lots where the CPU was retiring branch instructions that were not fused. Non-conditional branches …
1675 …rs often use NOPs for certain address alignments - e.g. start address of a function or loop body. …
1679 …is metric represents the remaining light uops fraction the CPU has executed - remaining means not …
1680 …"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_fused_ins…
1684 …is metric represents the remaining light uops fraction the CPU has executed - remaining means not …
1688 …action of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches …
1689 …"MetricExpr": "max(tma_branch_mispredicts * (1 - BR_MISP_RETIRED.ALL_BRANCHES / (INT_MISC.CLEARS_C…
1697 …"MetricExpr": "max(tma_machine_clears * (1 - MACHINE_CLEARS.MEMORY_ORDERING / MACHINE_CLEARS.COUNT…
1704 … on idle latencies) how often the CPU was stalled on accesses to external 3D-Xpoint (Crystal Ridge…
1706 …- (19 * (MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1…
1710 … on idle latencies) how often the CPU was stalled on accesses to external 3D-Xpoint (Crystal Ridge…
1719 …atched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch). Sample with: UOPS_DISPATC…
1728 …s Core fraction of cycles CPU dispatched uops on execution port 1 (ALU). Sample with: UOPS_DISPATC…
1732 …ion of cycles CPU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads…
1737 …PU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads). Sample with:…
1741 …ion of cycles CPU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads…
1746 …PU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads). Sample with:…
1750 …is metric represents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data)",
1755 …ore fraction of cycles CPU dispatched uops on execution port 4 (Store-data). Sample with: UOPS_DIS…
1764 …spatched uops on execution port 5 ([SNB+] Branches and ALU; [HSW+] ALU). Sample with: UOPS_DISPATC…
1773 …patched uops on execution port 6 ([HSW+] Primary Branch and simple ALU). Sample with: UOPS_DISPATC…
1777 …ents Core fraction of cycles CPU dispatched uops on execution port 7 ([HSW+]simple Store-address)",
1782 …f cycles CPU dispatched uops on execution port 7 ([HSW+]simple Store-address). Sample with: UOPS_D…
1786 … the CPU performance was potentially limited due to Core computation issues (non divider-related)",
1787 … tma_info_thread_clks if ARITH.DIVIDER_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALL…
1791 …-related).  Two distinct categories can be attributed into this metric: (1) heavy data-dependency …
1800 …t (Logical Processor cycles since ICL, Physical Core cycles otherwise). Long-latency instructions …
1805 …"MetricExpr": "((UOPS_EXECUTED.CORE_CYCLES_GE_1 - UOPS_EXECUTED.CORE_CYCLES_GE_2) / 2 if #SMT_on e…
1809 …-dependency among software instructions; or over oversubscribing a particular hardware resource. I…
1814 …"MetricExpr": "((UOPS_EXECUTED.CORE_CYCLES_GE_2 - UOPS_EXECUTED.CORE_CYCLES_GE_3) / 2 if #SMT_on e…
1818 …cal Core cycles otherwise).  Loop Vectorization -most compilers feature auto-Vectorization options…
1836 …izations issues. This is caused often due to non-optimal NUMA allocations. #link to NUMA article. …
1845 …m remote memory. This is caused often due to non-optimal NUMA allocations. #link to NUMA article. …
1855 …-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no r…
1859 …"BriefDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled …
1864 …-pipeline was stalled due to serializing operations. Instructions like CPUID; WRMSR or LFENCE seri…
1873 …esents fraction of cycles the CPU was stalled due to PAUSE Instructions. Sample with: MISC_RETIRED…
1877 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
1883 …ion of cycles handling memory load split accesses - load that cross 64-byte cache line boundary. S…
1892 …lit store accesses.  Consider aligning your data to the 64-byte cache line granularity. Sample wit…
1896 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
1901 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
1905 … CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request b…
1910 …-for-ownership request before the write. Even though store accesses do not typically stall out-of-…
1919 …perations in the pipeline; a load can avoid waiting for memory if a prior in-flight store is writi…
1925 …"MetricExpr": "(L2_RQSTS.RFO_HIT * 11 * (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_ST…
1929 …-of-order core performance; however; holding resources for longer time can lead into undesired imp…
1941 …tion of cycles where the TLB was missed by store accesses, hitting in the second-level TLB (STLB)",
1942         "MetricExpr": "tma_dtlb_store - tma_store_stlb_miss",
1962 …is fetched or hitting BTB capacity limit) hence called Unknown Branches. Sample with: BACLEARS.ANY…
1976         "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
1983         "MetricExpr": "(cycles\\-t / el\\-start if has_event(el\\-start) else 0)",
1990         "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
1997         "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",