Lines Matching +full:many +full:- +full:to +full:- +full:one
1 .. SPDX-License-Identifier: GPL-2.0
10 coherence of one cache line stored in multiple CPU's caches; then
20 Member 'refcount'(A) and 'name'(B) _share_ one cache line like below::
22 +-----------+ +-----------+
24 +-----------+ +-----------+
28 +----------------------+ +----------------------+
30 +----------------------+ +----------------------+
32 ---------------------------+------------------+-----------------------------
34 +----------------------+
36 +----------------------+
38 +----------------------+
41 creation time and is never modified. When many CPUs access 'foo' at
42 the same time, with 'refcount' being only bumped by one CPU frequently
43 and 'name' being read by other CPUs, all those reading CPUs have to
44 reload the whole cache line over and over due to the 'sharing', even
47 There are many real-world cases of performance regressions caused by
48 false sharing. One of these is a rw_semaphore 'mmap_lock' inside
54 * A global datum accessed (shared) by many CPUs
55 * In the concurrent accesses to the data, there is at least one write
64 Back in time when one platform had only one or a few CPUs, hot data
65 members could be purposely put in the same cache line to make them
69 could write to the data, while other CPUs are busy spinning the lock.
75 purposely put in one cache line.
76 * global data being put together in one cache line. Some kernel
77 subsystems have many global parameters of small size (4 bytes),
78 which can easily be grouped together and put into one cache line.
83 Following 'mitigation' section provides real-world examples.
86 checked, and it is valuable to run specific tools for performance
87 critical workloads to detect false sharing affecting performance case
91 How to detect and analyze False Sharing
94 once hotspots are detected, tools like 'perf-c2c' and 'pahole' can
95 be further used to detect and pinpoint the possible false sharing
99 perf-c2c can capture the cache lines with most false sharing hits,
101 and in-line offset of the data. Simple commands are::
103 $ perf c2c record -ag sleep 3
104 $ perf c2c report --call-graph none -k vmlinux
106 When running above during testing will-it-scale's tlb_flush1 case,
115 #----------------------------------------------------------------------
117 #----------------------------------------------------------------------
124 A nice introduction for perf-c2c is [3]_.
127 granularity. Users can match the offset in perf-c2c output with
128 pahole's decoding to locate the exact data members. For global
134 False sharing does not always need to be mitigated. False sharing
137 unnecessary to hyper-optimize every rarely used data structure or
141 core count increasing. Because of these detrimental effects, many
150 - Commit 91b6d3256356 ("net: cache align tcp_memory_allocated, tcp_sockets_allocated")
152 * Reorganize the data structure, separate the interfering members to
153 different cache lines. One downside is it may introduce new false
156 - Commit 802f1d522d5f ("mm: page_counter: re-layout structure to reduce false sharing")
159 Like for some global variable, use compare(read)-then-write instead
170 …- Commit 7b1002f7cfe5 ("bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false shari…
171 - Commit 292648ac5cf1 ("mm: gup: allow FOLL_PIN to scale in SMP")
173 * Turn hot global data to 'per-cpu data + global data' when possible,
174 or reasonably increase the threshold for syncing per-cpu data to
175 global data, to reduce or postpone the 'write' to that global data.
177 - Commit 520f897a3554 ("ext4: use percpu_counters for extent_status cache hits/misses")
178 - Commit 56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy")
180 Surely, all mitigations should be carefully verified to not cause side
181 effects. To avoid introducing false sharing when coding, it's better
182 to:
185 * Group mostly read-only fields together
192 One note is, sometimes even after a severe false sharing is detected
194 the hotspot switches to a new place.
199 One open issue is that kernel has an optional data structure
205 .. [2] https://lore.kernel.org/lkml/CAHk-=whoqV=cX5VC80mmR9rr+Z+yQ6fiQZm36Fb-izsanHg23w@mail.gmail.…
206 .. [3] https://joemario.github.io/blog/2016/09/01/c2c-blog/