Lines Matching +full:sense +full:- +full:bitfield +full:- +full:width
19 documentation at tools/memory-model/. Nevertheless, even this memory
37 Note also that it is possible that a barrier may be a no-op for an
48 - Device operations.
49 - Guarantees.
53 - Varieties of memory barrier.
54 - What may not be assumed about memory barriers?
55 - Address-dependency barriers (historical).
56 - Control dependencies.
57 - SMP barrier pairing.
58 - Examples of memory barrier sequences.
59 - Read memory barriers vs load speculation.
60 - Multicopy atomicity.
64 - Compiler barrier.
65 - CPU memory barriers.
69 - Lock acquisition functions.
70 - Interrupt disabling functions.
71 - Sleep and wake-up functions.
72 - Miscellaneous functions.
74 (*) Inter-CPU acquiring barrier effects.
76 - Acquires vs memory accesses.
80 - Interprocessor interaction.
81 - Atomic operations.
82 - Accessing devices.
83 - Interrupts.
91 - Cache coherency vs DMA.
92 - Cache coherency vs MMIO.
96 - And then there's the Alpha.
97 - Virtual Machine Guests.
101 - Circular buffers.
115 +-------+ : +--------+ : +-------+
118 | CPU 1 |<----->| Memory |<----->| CPU 2 |
121 +-------+ : +--------+ : +-------+
126 | : +--------+ : |
129 +---------->| Device |<----------+
132 : +--------+ :
158 STORE A=3, STORE B=4, y=LOAD A->3, x=LOAD B->4
159 STORE A=3, STORE B=4, x=LOAD B->4, y=LOAD A->3
160 STORE A=3, y=LOAD A->3, STORE B=4, x=LOAD B->4
161 STORE A=3, y=LOAD A->3, x=LOAD B->2, STORE B=4
162 STORE A=3, x=LOAD B->2, STORE B=4, y=LOAD A->3
163 STORE A=3, x=LOAD B->2, y=LOAD A->3, STORE B=4
164 STORE B=4, STORE A=3, y=LOAD A->3, x=LOAD B->4
202 -----------------
224 ----------
238 emits a memory-barrier instruction, so that a DEC Alpha CPU will
309 And there are anti-guarantees:
312 generate code to modify these using non-atomic read-modify-write
317 in a given bitfield must be protected by one lock. If two fields
318 in a given bitfield are protected by different locks, the compiler's
319 non-atomic read-modify-write sequences can cause an update to one
326 "char", two-byte alignment for "short", four-byte alignment for
327 "int", and either four-byte or eight-byte alignment for "long",
328 on 32-bit and 64-bit systems, respectively. Note that these
330 using older pre-C11 compilers (for example, gcc 4.6). The portion
336 of adjacent bit-fields all having nonzero width
342 NOTE 2: A bit-field and an adjacent non-bit-field member
344 to two bit-fields, if one is declared inside a nested
346 are separated by a zero-length bit-field declaration,
347 or if they are separated by a non-bit-field member
349 bit-fields in the same structure if all members declared
350 between them are also bit-fields, no matter what the
351 sizes of those intervening bit-fields happen to be.
359 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
375 ---------------------------
394 address-dependency barriers; see the "SMP barrier pairing" subsection.
397 (2) Address-dependency barriers (historical).
398 [!] This section is marked as HISTORICAL: it covers the long-obsolete
400 implicit in all marked accesses. For more up-to-date information,
404 An address-dependency barrier is a weaker form of read barrier. In the
407 the second load will be directed), an address-dependency barrier would
411 An address-dependency barrier is a partial ordering on interdependent
417 considered can then perceive. An address-dependency barrier issued by
422 the address-dependency barrier.
434 [!] Note that address-dependency barriers should normally be paired with
437 [!] Kernel release v5.9 removed kernel APIs for explicit address-
440 address-dependency barriers.
444 A read barrier is an address-dependency barrier plus a guarantee that all
452 Read memory barriers imply address-dependency barriers, and so can
476 This acts as a one-way permeable barrier. It guarantees that all memory
491 This also acts as a one-way permeable barrier. It guarantees that all
502 -not- guaranteed to act as a full memory barrier. However, after an
513 RELEASE variants in addition to fully-ordered and relaxed (no barrier
530 ----------------------------------------------
549 (*) There is no guarantee that some intervening piece of off-the-CPU
556 Documentation/driver-api/pci/pci.rst
557 Documentation/core-api/dma-api-howto.rst
558 Documentation/core-api/dma-api.rst
561 ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
562 ----------------------------------------
563 [!] This section is marked as HISTORICAL: it covers the long-obsolete
565 in all marked accesses. For more up-to-date information, including
571 to this section are those working on DEC Alpha architecture-specific code
574 address-dependency barriers.
576 [!] While address dependencies are observed in both load-to-load and
577 load-to-store relations, address-dependency barriers are not necessary
578 for load-to-store situations.
580 The requirement of address-dependency barriers is a little subtle, and
593 [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
594 doesn't imply an address-dependency barrier.
611 To deal with this, READ_ONCE() provides an implicit address-dependency barrier
621 <implicit address-dependency barrier>
630 even-numbered cache lines and the other bank processes odd-numbered cache
631 lines. The pointer P might be stored in an odd-numbered cache line, and the
632 variable B might be stored in an even-numbered cache line. Then, if the
633 even-numbered bank of the reading CPU's cache is extremely busy while the
634 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
638 An address-dependency barrier is not required to order dependent writes
655 Therefore, no address-dependency barrier is required to order the read into
657 even without an implicit address-dependency barrier of modern READ_ONCE():
662 of dependency ordering is to -prevent- writes to the data structure, along
673 The address-dependency barrier is very important to the RCU system,
681 --------------------
687 A load-load control dependency requires a full read memory barrier, not
688 simply an (implicit) address-dependency barrier to make it work correctly.
692 <implicit address-dependency barrier>
699 dependency, but rather a control dependency that the CPU may short-circuit
710 However, stores are not speculated. This means that ordering -is- provided
711 for load-store control dependencies, as in the following example:
726 variable 'a' is always non-zero, it would be well within its rights
756 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
759 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
779 In contrast, without explicit memory barriers, two-legged-if control
836 You must also be careful not to rely too much on boolean short-circuit
851 out-guess your code. More generally, although READ_ONCE() does force
855 In addition, control dependencies apply only to the then-clause and
856 else-clause of the if-statement in question. In particular, it does
857 not necessarily apply to code following the if-statement:
871 conditional-move instructions, as in this fanciful pseudo-assembly
884 In short, control dependencies apply only to the stores in the then-clause
885 and else-clause of the if-statement in question (including functions
886 invoked by those two clauses), not to code following that if-statement.
897 However, they do -not- guarantee any other sort of ordering:
906 to carry out the stores. Please note that it is -not- sufficient
912 (*) Control dependencies require at least one run-time conditional
924 (*) Control dependencies apply only to the then-clause and else-clause
925 of the if-statement containing the control dependency, including
927 do -not- apply to code following the if-statement containing the
932 (*) Control dependencies do -not- provide multicopy atomicity. If you
940 -------------------
942 When dealing with CPU-CPU interactions, certain types of memory barrier should
949 with an address-dependency barrier, a control dependency, an acquire barrier,
951 read barrier, control dependency, or an address-dependency barrier pairs
970 <implicit address-dependency barrier>
990 match the loads after the read barrier or the address-dependency barrier, and
995 WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
999 WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
1003 ------------------------------------
1022 +-------+ : :
1023 | | +------+
1024 | |------>| C=3 | } /\
1025 | | : +------+ }----- \ -----> Events perceptible to
1027 | | : +------+ }
1029 | | +------+ }
1030 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier
1031 | | +------+ } requires all stores prior to the
1033 | | : +------+ } further stores may take place
1034 | |------>| D=4 | }
1035 | | +------+
1036 +-------+ : :
1043 Secondly, address-dependency barriers act as partial orderings on address-
1059 +-------+ : : : :
1060 | | +------+ +-------+ | Sequence of update
1061 | |------>| B=2 |----- --->| Y->8 | | of perception on
1062 | | : +------+ \ +-------+ | CPU 2
1063 | CPU 1 | : | A=1 | \ --->| C->&Y | V
1064 | | +------+ | +-------+
1066 | | +------+ | : :
1067 | | : | C=&B |--- | : : +-------+
1068 | | : +------+ \ | +-------+ | |
1069 | |------>| D=4 | ----------->| C->&B |------>| |
1070 | | +------+ | +-------+ | |
1071 +-------+ : : | : : | |
1074 | +-------+ | |
1075 Apparently incorrect ---> | | B->7 |------>| |
1076 perception of B (!) | +-------+ | |
1078 | +-------+ | |
1079 The load of X holds ---> \ | X->9 |------>| |
1080 up the maintenance \ +-------+ | |
1081 of coherence of B ----->| B->2 | +-------+
1082 +-------+
1089 If, however, an address-dependency barrier were to be placed between the load
1100 <address-dependency barrier>
1105 +-------+ : : : :
1106 | | +------+ +-------+
1107 | |------>| B=2 |----- --->| Y->8 |
1108 | | : +------+ \ +-------+
1109 | CPU 1 | : | A=1 | \ --->| C->&Y |
1110 | | +------+ | +-------+
1112 | | +------+ | : :
1113 | | : | C=&B |--- | : : +-------+
1114 | | : +------+ \ | +-------+ | |
1115 | |------>| D=4 | ----------->| C->&B |------>| |
1116 | | +------+ | +-------+ | |
1117 +-------+ : : | : : | |
1120 | +-------+ | |
1121 | | X->9 |------>| |
1122 | +-------+ | |
1123 Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
1124 prior to the store of C \ +-------+ | |
1125 are perceptible to ----->| B->2 |------>| |
1126 subsequent loads +-------+ | |
1127 : : +-------+
1145 +-------+ : : : :
1146 | | +------+ +-------+
1147 | |------>| A=1 |------ --->| A->0 |
1148 | | +------+ \ +-------+
1149 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1150 | | +------+ | +-------+
1151 | |------>| B=2 |--- | : :
1152 | | +------+ \ | : : +-------+
1153 +-------+ : : \ | +-------+ | |
1154 ---------->| B->2 |------>| |
1155 | +-------+ | CPU 2 |
1156 | | A->0 |------>| |
1157 | +-------+ | |
1158 | : : +-------+
1160 \ +-------+
1161 ---->| A->1 |
1162 +-------+
1182 +-------+ : : : :
1183 | | +------+ +-------+
1184 | |------>| A=1 |------ --->| A->0 |
1185 | | +------+ \ +-------+
1186 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1187 | | +------+ | +-------+
1188 | |------>| B=2 |--- | : :
1189 | | +------+ \ | : : +-------+
1190 +-------+ : : \ | +-------+ | |
1191 ---------->| B->2 |------>| |
1192 | +-------+ | CPU 2 |
1195 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1196 barrier causes all effects \ +-------+ | |
1197 prior to the storage of B ---->| A->1 |------>| |
1198 to be perceptible to CPU 2 +-------+ | |
1199 : : +-------+
1219 +-------+ : : : :
1220 | | +------+ +-------+
1221 | |------>| A=1 |------ --->| A->0 |
1222 | | +------+ \ +-------+
1223 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1224 | | +------+ | +-------+
1225 | |------>| B=2 |--- | : :
1226 | | +------+ \ | : : +-------+
1227 +-------+ : : \ | +-------+ | |
1228 ---------->| B->2 |------>| |
1229 | +-------+ | CPU 2 |
1232 | +-------+ | |
1233 | | A->0 |------>| 1st |
1234 | +-------+ | |
1235 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1236 barrier causes all effects \ +-------+ | |
1237 prior to the storage of B ---->| A->1 |------>| 2nd |
1238 to be perceptible to CPU 2 +-------+ | |
1239 : : +-------+
1245 +-------+ : : : :
1246 | | +------+ +-------+
1247 | |------>| A=1 |------ --->| A->0 |
1248 | | +------+ \ +-------+
1249 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1250 | | +------+ | +-------+
1251 | |------>| B=2 |--- | : :
1252 | | +------+ \ | : : +-------+
1253 +-------+ : : \ | +-------+ | |
1254 ---------->| B->2 |------>| |
1255 | +-------+ | CPU 2 |
1258 \ +-------+ | |
1259 ---->| A->1 |------>| 1st |
1260 +-------+ | |
1262 +-------+ | |
1263 | A->1 |------>| 2nd |
1264 +-------+ | |
1265 : : +-------+
1274 ----------------------------------------
1278 other loads, and so do the load in advance - even though they haven't actually
1283 It may turn out that the CPU didn't actually need the value - perhaps because a
1284 branch circumvented the load - in which case it can discard the value or just
1298 : : +-------+
1299 +-------+ | |
1300 --->| B->2 |------>| |
1301 +-------+ | CPU 2 |
1303 +-------+ | |
1304 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1305 division speculates on the +-------+ ~ | |
1309 Once the divisions are complete --> : : ~-->| |
1311 LOAD with immediate effect : : +-------+
1314 Placing a read barrier or an address-dependency barrier just before the second
1329 : : +-------+
1330 +-------+ | |
1331 --->| B->2 |------>| |
1332 +-------+ | CPU 2 |
1334 +-------+ | |
1335 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1336 division speculates on the +-------+ ~ | |
1343 : : ~-->| |
1345 : : +-------+
1351 : : +-------+
1352 +-------+ | |
1353 --->| B->2 |------>| |
1354 +-------+ | CPU 2 |
1356 +-------+ | |
1357 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1358 division speculates on the +-------+ ~ | |
1364 +-------+ | |
1365 The speculation is discarded ---> --->| A->1 |------>| |
1366 and an updated value is +-------+ | |
1367 retrieved : : +-------+
1371 --------------------
1380 time to all -other- CPUs. The remainder of this document discusses this
1399 Because CPU 3's load from X in some sense comes after CPU 2's load, it
1404 multicopy-atomic systems, CPU B's load must return either the same value
1414 able to compensate for non-multicopy atomicity. For example, suppose
1425 This substitution allows non-multicopy atomicity to run rampant: in
1431 example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a
1436 General barriers can compensate not only for non-multicopy atomicity,
1437 but can also generate additional ordering that can ensure that -all-
1438 CPUs will perceive the same order of -all- operations. In contrast, a
1439 chain of release-acquire pairs do not provide this additional ordering,
1480 Furthermore, because of the release-acquire relationship between cpu0()
1486 However, the ordering provided by a release-acquire chain is local
1497 writes in order, CPUs not involved in the release-acquire chain might
1499 the weak memory-barrier instructions used to implement smp_load_acquire()
1502 store to u as happening -after- cpu1()'s load from v, even though
1508 -not- ensure that any particular value will be read. Therefore, the
1533 ----------------
1540 This is a general barrier -- there are no read-read or write-write
1550 interrupt-handler code and the code that was interrupted.
1556 optimizations that, while perfectly safe in single-threaded code, can
1584 into the following code, which, although in some sense legitimate
1585 for single-threaded code, is almost certainly not what the developer
1606 single-threaded code, but can be fatal in concurrent code:
1624 single-threaded code, so you need to tell the compiler about cases
1638 This transformation is a win for single-threaded code because it
1657 the code into near-nonexistence. (It will still load from the
1685 between process-level code and an interrupt handler:
1701 win for single-threaded code:
1762 In single-threaded code, this is not only safe, but also saves
1764 could cause some other CPU to see a spurious value of 42 -- even
1765 if variable 'a' was never zero -- when loading variable 'b'.
1774 damaging, but they can result in cache-line bouncing and thus in
1779 with a single memory-reference instruction, prevents "load tearing"
1782 16-bit store instructions with 7-bit immediate fields, the compiler
1783 might be tempted to use two 16-bit store-immediate instructions to
1784 implement the following 32-bit store:
1791 This optimization can therefore be a win in single-threaded code.
1815 implement these three assignment statements as a pair of 32-bit
1816 loads followed by a pair of 32-bit stores. This would result in
1836 -------------------
1848 All memory barriers except the address-dependency barriers imply a compiler
1862 systems because it is assumed that a CPU will appear to be self-consistent,
1873 windows. These barriers are required even on non-SMP systems as they affect
1904 obj->dead = 1;
1906 atomic_dec(&obj->ref_count);
1920 DMA capable device. See Documentation/core-api/dma-api.rst file for more
1928 if (desc->status != DEVICE_OWN) {
1933 read_data = desc->data;
1934 desc->data = write_data;
1940 desc->status = DEVICE_OWN;
1964 For example, after a non-temporal write to pmem region, we use pmem_wmb()
1975 For memory accesses with write-combining attributes (e.g. those returned
1978 write-combining memory accesses before this macro with those after it when
1994 --------------------------
2041 one-way barriers is that the effects of instructions outside of a critical
2062 RELEASE may -not- be assumed to be a full memory barrier.
2087 -could- occur.
2102 a sleep-unlock race, but the locking primitive needs to resolve
2107 anything at all - especially with respect to I/O accesses - unless combined
2110 See also the section on "Inter-CPU acquiring barrier effects".
2140 -----------------------------
2148 SLEEP AND WAKE-UP FUNCTIONS
2149 ---------------------------
2174 STORE current->state
2217 STORE current->state ...
2219 LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL)
2220 STORE task->state
2265 order multiple stores before the wake-up with respect to loads of those stored
2301 -----------------------
2309 INTER-CPU ACQUIRING BARRIER EFFECTS
2318 ---------------------------
2351 be a problem as a single-threaded linear piece of code will still appear to
2365 --------------------------
2405 LOAD waiter->list.next;
2406 LOAD waiter->task;
2407 STORE waiter->task;
2429 LOAD waiter->task;
2430 STORE waiter->task;
2438 LOAD waiter->list.next;
2439 --- OOPS ---
2446 LOAD waiter->list.next;
2447 LOAD waiter->task;
2449 STORE waiter->task;
2459 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
2466 -----------------
2477 -----------------
2486 efficient to reorder, combine or merge accesses - something that would cause
2490 routines - such as inb() or writel() - which know how to make such accesses
2496 See Documentation/driver-api/device-io.rst for more information.
2500 ----------
2506 This may be alleviated - at least in part - by disabling local interrupts (a
2508 the interrupt-disabled section in the driver. While the driver's interrupt
2515 under interrupt-disablement and then the driver's interrupt handler is invoked:
2534 accesses performed in an interrupt - and vice versa - unless implicit or
2544 likely, then interrupt-disabling locks should be used to guarantee ordering.
2552 specific. Therefore, drivers which are inherently non-portable may rely on
2604 The ordering properties of __iomem pointers obtained with non-default
2614 bullets 2-5 above) but they are still guaranteed to be ordered with
2622 register-based, memory-mapped FIFOs residing on peripherals that are not
2628 The inX() and outX() accessors are intended to access legacy port-mapped
2639 Device drivers may expect outX() to emit a non-posted write transaction
2657 little-endian and will therefore perform byte-swapping operations on big-endian
2665 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2669 of arch-specific code.
2672 stream in any order it feels like - or even in parallel - provided that if an
2678 [*] Some instructions have more than one effect - such as changing the
2679 condition codes, changing registers or changing memory - and different
2705 <--- CPU ---> : <----------- Memory ----------->
2707 +--------+ +--------+ : +--------+ +-----------+
2708 | | | | : | | | | +--------+
2710 | Core |--->| Access |----->| Cache |<-->| | | |
2711 | | | Queue | : | | | |--->| Memory |
2713 +--------+ +--------+ : +--------+ | | | |
2714 : | Cache | +--------+
2716 : | Mechanism | +--------+
2717 +--------+ +--------+ : +--------+ | | | |
2719 | CPU | | Memory | : | CPU | | |--->| Device |
2720 | Core |--->| Access |----->| Cache |<-->| | | |
2722 | | | | : | | | | +--------+
2723 +--------+ +--------+ : +--------+ +-----------+
2754 ----------------------
2771 See Documentation/core-api/cachetlb.rst for more information on cache
2776 -----------------------
2832 (*) the CPU's data cache may affect the ordering, and while cache-coherency
2833 mechanisms may alleviate this - once the store has actually hit the cache
2834 - there's no guarantee that the coherency management will be propagated in
2845 However, it is guaranteed that a CPU will be self-consistent: it will see its
2872 are -not- optional in the above example, as there are architectures
2907 --------------------------
2911 two semantically-related cache lines updated at separate times. This is where
2912 the address-dependency barrier really becomes necessary as this synchronises
2922 ----------------------
2927 barriers for this use-case would be possible but is often suboptimal.
2929 To handle this case optimally, low-level virt_mb() etc macros are available.
2931 identical code for SMP and non-SMP systems. For example, virtual machine guests
2945 ----------------
2950 Documentation/core-api/circular-buffers.rst
2967 Chapter 7.1: Memory-Access Ordering
2970 ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
2973 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
2988 Chapter 15: Sparc-V9 Memory Models
3004 Solaris Internals, Core Kernel Architecture, p63-68: