Lines Matching +full:performance +full:- +full:affecting
1 .. SPDX-License-Identifier: GPL-2.0
22 +-----------+ +-----------+
24 +-----------+ +-----------+
28 +----------------------+ +----------------------+
30 +----------------------+ +----------------------+
32 ---------------------------+------------------+-----------------------------
34 +----------------------+
36 +----------------------+
38 +----------------------+
47 There are many real-world cases of performance regressions caused by
83 Following 'mitigation' section provides real-world examples.
86 checked, and it is valuable to run specific tools for performance
87 critical workloads to detect false sharing affecting performance case
93 perf record/report/stat are widely used for performance tuning, and
94 once hotspots are detected, tools like 'perf-c2c' and 'pahole' can
99 perf-c2c can capture the cache lines with most false sharing hits,
101 and in-line offset of the data. Simple commands are::
103 $ perf c2c record -ag sleep 3
104 $ perf c2c report --call-graph none -k vmlinux
106 When running above during testing will-it-scale's tlb_flush1 case,
115 #----------------------------------------------------------------------
117 #----------------------------------------------------------------------
124 A nice introduction for perf-c2c is [3]_.
127 granularity. Users can match the offset in perf-c2c output with
135 mitigations should balance performance gains with complexity and
136 space consumption. Sometimes, lower performance is OK, and it's
137 unnecessary to hyper-optimize every rarely used data structure or
140 False sharing hurting performance cases are seen more frequently with
150 - Commit 91b6d3256356 ("net: cache align tcp_memory_allocated, tcp_sockets_allocated")
156 - Commit 802f1d522d5f ("mm: page_counter: re-layout structure to reduce false sharing")
159 Like for some global variable, use compare(read)-then-write instead
170 …- Commit 7b1002f7cfe5 ("bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false shari…
171 - Commit 292648ac5cf1 ("mm: gup: allow FOLL_PIN to scale in SMP")
173 * Turn hot global data to 'per-cpu data + global data' when possible,
174 or reasonably increase the threshold for syncing per-cpu data to
177 - Commit 520f897a3554 ("ext4: use percpu_counters for extent_status cache hits/misses")
178 - Commit 56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy")
185 * Group mostly read-only fields together
193 and solved, the performance may still have no obvious improvement as
205 .. [2] https://lore.kernel.org/lkml/CAHk-=whoqV=cX5VC80mmR9rr+Z+yQ6fiQZm36Fb-izsanHg23w@mail.gmail.…
206 .. [3] https://joemario.github.io/blog/2016/09/01/c2c-blog/