Lines Matching +full:power +full:- +full:sample +full:- +full:average

4         "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
5 "MetricGroup": "Power",
11 "MetricExpr": "cstate_core@c3\\-residency@ / TSC",
12 "MetricGroup": "Power",
18 "MetricExpr": "cstate_pkg@c3\\-residency@ / TSC",
19 "MetricGroup": "Power",
25 "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
26 "MetricGroup": "Power",
32 "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
33 "MetricGroup": "Power",
39 "MetricExpr": "cstate_core@c7\\-residency@ / TSC",
40 "MetricGroup": "Power",
46 "MetricExpr": "cstate_pkg@c7\\-residency@ / TSC",
47 "MetricGroup": "Power",
59 "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
78 …sible; which incur a few cycles load re-issue. However; the short re-issue duration is often hidde…
96-cases for operations that cannot be handled natively by the execution pipeline. For example; when…
102 "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma_retiring)",
107-of-order scheduler dispatches ready uops into their respective execution units; and once complete…
112 …"MetricExpr": "(UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * (INT_MISC.RECOVERY_CYCLES_ANY / …
117 …s for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For…
128 …ram path; or stalls when the out-of-order part of the machine needs to recover its state from a sp…
137-predicted branches. For example; branchy code with lots of miss-predictions might get categorized…
143 "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
147 … as in the case of read-modify-write as an example. Since these instructions require multiple uops…
156 …he CPU was stalled due to Branch Resteers as a result of Machine Clears. Sample with: INT_MISC.CLE…
166 … true data sharing such as modified locked variables; and false sharing. Sample with: MEM_LOAD_L3_…
170 …"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of…
172 "MetricExpr": "tma_backend_bound - tma_memory_bound",
177-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in s…
181 …n of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
187-sharing accesses. Data shared by multiple Logical Processors (even just read shared) may cause in…
196 …than integer or Floating Point addition; subtraction; or multiplication. Sample with: ARITH.DIVIDE…
202 …"MetricExpr": "(1 - MEM_LOAD_UOPS_RETIRED.L3_HIT / (MEM_LOAD_UOPS_RETIRED.L3_HIT + 7 * MEM_LOAD_UO…
206 … loads. Better caching can improve the latency and increase performance. Sample with: MEM_LOAD_RET…
211 …"MetricExpr": "(IDQ.ALL_DSB_CYCLES_ANY_UOPS - IDQ.ALL_DSB_CYCLES_4_UOPS) / tma_info_core_core_clks…
224-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heav…
233-aside Buffers) are processor caches for recently used entries out of the Page Tables that are use…
237 …: "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store …
242-level data TLB store misses. As with ordinary data caching; focus on improving data locality and…
257 "MetricExpr": "tma_frontend_bound - tma_fetch_latency",
262 …he Frontend typically delivers suboptimal amount of uops to the Backend. Sample with: FRONTEND_RET…
272-cache misses; iTLB misses or fetch stalls after a branch misprediction are categorized under Fron…
276 …"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations frac…
281-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may excee…
285 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction …
290 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction…
294 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction …
299 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction…
303 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors",
308 … approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors. May…
312 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors",
317 … approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors. May…
327-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into mi…
331 … slots where the CPU was retiring heavy-weight operations -- instructions that require two or more…
337 …he CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro
346 … fraction of cycles the CPU was stalled due to instruction cache misses. Sample with: FRONTEND_RET…
357 …"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lo…
370 "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
382 …BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardles…
386-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width…
389 …efDescription": "Instruction-Level-Parallelism (average number of uops executed when there is exec…
419 … "PublicDescription": "Total number of retired Instructions. Sample with: INST_RETIRED.PREC_DIST"
430 …"BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mean…
435 …"PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mea…
438 …"BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means h…
443 …"PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means …
446 …"BriefDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower num…
451 …"PublicDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower nu…
454 …"BriefDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower num…
459 …"PublicDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower nu…
505 "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
511 "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
517 "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
523 … "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
535 "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
542 "MetricExpr": "1e3 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_RETIRED.ANY",
577 "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
589 "BriefDescription": "Average Parallel L2 cache miss data reads",
595 "BriefDescription": "Average Latency for L2 cache miss demand Loads",
601 "BriefDescription": "Average Parallel L2 cache miss demand Loads",
607 …"BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core…
614 …"BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is…
619 …ublicDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is …
629 …"BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is …
635 …"BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
641 "BriefDescription": "Measured Average Core Frequency for unhalted processors [GHz]",
643 "MetricGroup": "Power;Summary",
647 "BriefDescription": "Average CPU Utilization (percentage)",
653 "BriefDescription": "Average number of utilized CPUs",
659 "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
663 …"PublicDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]. Relat…
670 …egate across all supported options of: FP precisions, scalar and vector instructions, vector-width"
694 …"MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / (CPU_CLK_UNHALTED.REF_XCLK_ANY / 2) if #S…
705 "BriefDescription": "Average Frequency Utilization relative nominal frequency",
707 "MetricGroup": "Power",
711 … "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
723 "BriefDescription": "The ratio of Executed- by Issued-Uops",
727 …"PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop m…
736 …"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor…
761 …tion of cycles the CPU was stalled due to Instruction TLB (ITLB) misses. Sample with: FRONTEND_RET…
766 …"MetricExpr": "max((CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS) / tma_info_thr…
770 …t stalls; while some non-completed demand load lives in the machine without having that demand loa…
775 …"MetricExpr": "(CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_…
779 ….e. L1 misses/L2 hits) can improve the latency and increase performance. Sample with: MEM_LOAD_RET…
789 ….e. L2 misses/L3 hits) can improve the latency and increase performance. Sample with: MEM_LOAD_RET…
799 …performance. Note the value of this node may overlap with its siblings. Sample with: MEM_LOAD_RET…
812 …slots where the CPU was retiring light-weight operations -- instructions that require no more than…
813 "MetricExpr": "tma_retiring - tma_heavy_operations",
818-weight operations -- instructions that require no more than one uop (micro-operation). This corre…
824 …HED_PORT.PORT_2 + UOPS_DISPATCHED_PORT.PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT…
828 …ion of cycles CPU dispatched uops on execution port for Load operations. Sample with: UOPS_DISPATC…
838 … classified as L1_Bound regardless of what memory source satisfied them. Sample with: MEM_INST_RET…
844 "MetricExpr": "tma_bad_speculation - tma_branch_mispredicts",
849-of-order portion of the machine needs to recover its state after the clear. For example; this can…
853 …as likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM…
858- DRAM ([SPR-HBM] and/or HBM). The underlying heuristic assumes that a similar off-core traffic i…
862 …e the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM…
863 …EAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tma_info_thread_clks - tma_mem_bandwidth",
867 …e the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM…
873 …CYCLES_GE_1_UOP_EXEC - (UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if tma_info_thread_ipc > 1.8 else UOPS…
878 …o demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory d…
887 …odes (like in Floating Point assists). These cases can often be avoided. Sample with: UOPS_RETIRED…
896 … Branch Resteers as a result of Branch Misprediction at execution stage. Sample with: INT_MISC.CLE…
901 …"MetricExpr": "(IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS) / tma_info_core_core_cl…
905-cached in the DSB or LSD. For example; inefficiencies due to asymmetric decoders; use of long imm…
914-cache) or MITE (legacy instruction decode) pipelines. Certain operations cannot be handled native…
923 …atched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch). Sample with: UOPS_DISPATC…
932 …s Core fraction of cycles CPU dispatched uops on execution port 1 (ALU). Sample with: UOPS_DISPATC…
936 …ion of cycles CPU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads…
944 …ion of cycles CPU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads…
952 …is metric represents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data)",
957 …sents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data). Related metric…
975 …patched uops on execution port 6 ([HSW+] Primary Branch and simple ALU). Sample with: UOPS_DISPATC…
979 …ents Core fraction of cycles CPU dispatched uops on execution port 7 ([HSW+]simple Store-address)",
987 … the CPU performance was potentially limited due to Core computation issues (non divider-related)",
989- (UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if tma_info_thread_ipc > 1.8 else UOPS_EXECUTED.CYCLES_GE_2…
993-related). Two distinct categories can be attributed into this metric: (1) heavy data-dependency …
998 …ED.CORE\\,inv\\,cmask\\=1@ / 2 if #SMT_on else (CYCLE_ACTIVITY.STALLS_TOTAL - (RS_EVENTS.EMPTY_CYC…
1002 …t (Logical Processor cycles since ICL, Physical Core cycles otherwise). Long-latency instructions …
1007 …_EXECUTED.CORE\\,cmask\\=1@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=2@) / 2 if #SMT_on else (UOPS_EXECU…
1011-dependency among software instructions; or over oversubscribing a particular hardware resource. I…
1016 …_EXECUTED.CORE\\,cmask\\=2@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3@) / 2 if #SMT_on else (UOPS_EXECU…
1020-most compilers feature auto-Vectorization options today- reduces pressure on the execution ports …
1029 …ts (Logical Processor cycles since ICL, Physical Core cycles otherwise). Sample with: UOPS_EXECUTE…
1039-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is no r…
1043 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
1049 …ion of cycles handling memory load split accesses - load that cross 64-byte cache line boundary. S…
1058 …lit store accesses. Consider aligning your data to the 64-byte cache line granularity. Sample wit…
1062 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
1067 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
1071 … CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request b…
1076-for-ownership request before the write. Even though store accesses do not typically stall out-of-
1085 …perations in the pipeline; a load can avoid waiting for memory if a prior in-flight store is writi…
1091 …"MetricExpr": "(L2_RQSTS.RFO_HIT * 9 * (1 - MEM_UOPS_RETIRED.LOCK_LOADS / MEM_UOPS_RETIRED.ALL_STO…
1095-of-order core performance; however; holding resources for longer time can lead into undesired imp…
1104 …on of cycles CPU dispatched uops on execution port for Store operations. Sample with: UOPS_DISPATC…
1109 "MetricExpr": "tma_branch_resteers - tma_mispredicts_resteers - tma_clears_resteers",
1113 …is fetched or hitting BTB capacity limit) hence called Unknown Branches. Sample with: FRONTEND_RET…