linux-6.12.1/Documentation/memory-barriers.txt

59      - Read memory barriers vs load speculation.
158 	STORE A=3,	STORE B=4,	y=LOAD A->3,	x=LOAD B->4
159 	STORE A=3,	STORE B=4,	x=LOAD B->4,	y=LOAD A->3
160 	STORE A=3,	y=LOAD A->3,	STORE B=4,	x=LOAD B->4
161 	STORE A=3,	y=LOAD A->3,	x=LOAD B->2,	STORE B=4
162 	STORE A=3,	x=LOAD B->2,	STORE B=4,	y=LOAD A->3
163 	STORE A=3,	x=LOAD B->2,	y=LOAD A->3,	STORE B=4
164 	STORE B=4,	STORE A=3,	y=LOAD A->3,	x=LOAD B->4
197 Note that CPU 2 will never try and load C into D because the CPU will load P
198 into Q before issuing the load of *Q.
216 	STORE *A = 5, x = LOAD *D
217 	x = LOAD *D, STORE *A = 5
235 	Q = LOAD P, D = LOAD *Q
241 	Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER
253 	a = LOAD *X, STORE *X = b
261 	STORE *X = c, d = LOAD *X
281 	X = LOAD *A,  Y = LOAD *B,  STORE *D = Z
282 	X = LOAD *A,  STORE *D = Z, Y = LOAD *B
283 	Y = LOAD *B,  X = LOAD *A,  STORE *D = Z
284 	Y = LOAD *B,  STORE *D = Z, X = LOAD *A
285 	STORE *D = Z, X = LOAD *A,  Y = LOAD *B
286 	STORE *D = Z, Y = LOAD *B,  X = LOAD *A
295 	X = LOAD *A; Y = LOAD *(A + 4);
296 	Y = LOAD *(A + 4); X = LOAD *A;
297 	{X, Y} = LOAD {*A, *(A + 4) };
406      result of the first (eg: the first load retrieves the address to which
407      the second load will be directed), an address-dependency barrier would
408      be required to make sure that the target of the second load is updated
409      after the address obtained by the first load is accessed.
418      the CPU under consideration guarantees that for any load preceding it,
419      if that load touches one of a sequence of stores from another CPU, then
421      that touched by the load will be perceptible to any loads issued after
427      [!] Note that the first load really has to have an _address_ dependency and
428      not a control dependency.  If the address for the second load is dependent
429      on the first load, but the dependency is through a conditional rather than
442  (3) Read (or load) memory barriers.
445      the LOAD operations specified before the barrier will appear to happen
446      before all the LOAD operations specified after the barrier with respect to
461      A general memory barrier gives a guarantee that all the LOAD and STORE
463      the LOAD and STORE operations specified after the barrier with respect to
514 semantics) definitions.  For compound atomics performing both a load and a
515 store, ACQUIRE semantics apply only to the load and RELEASE semantics apply
576 [!] While address dependencies are observed in both load-to-load and
577 load-to-store relations, address-dependency barriers are not necessary
578 for load-to-store situations.
687 A load-load control dependency requires a full read memory barrier, not
701 the load from b as having happened before the load from a.  In such a case
711 for load-store control dependencies, as in the following example:
721 load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),
754 	WRITE_ONCE(b, 1);  /* BUG: No ordering vs. load from a!!! */
763 Now there is no conditional between the load from 'a' and the store to
816 between the load from variable 'a' and the store to variable 'b'.  It is
823 	BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
852 the compiler to actually emit code for a given load, it does not force
881 A weakly ordered CPU would have no dependency of any sort between the load
913       between the prior load and the subsequent store, and this
914       conditional must involve the prior load.  If the compiler is able
1052 	STORE C = &B		LOAD X
1053 	STORE D = 4		LOAD C (gets &B)
1054 				LOAD *C (reads B)
1079 	    The load of X holds --->    \       | X->9  |------>|       |
1086 In the above example, CPU 2 perceives that B is 7, despite the load of *C
1087 (which would be B) coming after the LOAD of C.
1089 If, however, an address-dependency barrier were to be placed between the load
1090 of C and the load of *C (ie: B) on CPU 2:
1098 	STORE C = &B		LOAD X
1099 	STORE D = 4		LOAD C (gets &B)
1101 				LOAD *C (reads B)
1139 				LOAD B
1140 				LOAD A
1166 If, however, a read barrier were to be placed between the load of B and the
1167 load of A on CPU 2:
1175 				LOAD B
1177 				LOAD A
1203 contained a load of A either side of the read barrier:
1211 				LOAD B
1212 				LOAD A [first load of A]
1214 				LOAD A [second load of A]
1216 Even though the two loads of A both occur after the load of B, they may both
1268 The guarantee is that the second load will always come up with A == 1 if the
1269 load of B came up with B == 2.  No such guarantee exists for the first load of
1273 READ MEMORY BARRIERS VS LOAD SPECULATION
1276 Many CPUs speculate with loads: that is they see that they will need to load an
1278 other loads, and so do the load in advance - even though they haven't actually
1280 actual load instruction to potentially complete immediately because the CPU
1284 branch circumvented the load - in which case it can discard the value or just
1291 				LOAD B
1294 				LOAD A
1306 	LOAD of A                               :       :   ~   |       |
1311 	LOAD with immediate effect              :       :       +-------+
1315 load:
1319 				LOAD B
1323 				LOAD A
1337 	LOAD of A                               :       :   ~   |       |
1359 	LOAD of A                               :       :   ~   |       |
1388 	STORE X=1		r1=LOAD X (reads 1)	LOAD Y (reads 1)
1390 				STORE Y=r1		LOAD X
1392 Suppose that CPU 2's load from X returns 1, which it then stores to Y,
1393 and CPU 3's load from Y returns 1.  This indicates that CPU 1's store
1394 to X precedes CPU 2's load from X and that CPU 2's store to Y precedes
1395 CPU 3's load from Y.  In addition, the memory barriers guarantee that
1396 CPU 2 executes its load before its store, and CPU 3 loads from Y before
1397 it loads from X.  The question is then "Can CPU 3's load from X return 0?"
1399 Because CPU 3's load from X in some sense comes after CPU 2's load, it
1400 is natural to expect that CPU 3's load from X must therefore return 1.
1401 This expectation follows from multicopy atomicity: if a load executing
1402 on CPU B follows a load from the same variable executing on CPU A (and
1404 multicopy-atomic systems, CPU B's load must return either the same value
1405 that CPU A's load did or some later value.  However, the Linux kernel
1409 for any lack of multicopy atomicity.  In the example, if CPU 2's load
1410 from X returns 1 and CPU 3's load from Y returns 1, then CPU 3's load
1421 	STORE X=1		r1=LOAD X (reads 1)	LOAD Y (reads 1)
1423 				STORE Y=r1		LOAD X (reads 0)
1426 this example, it is perfectly legal for CPU 2's load from X to return 1,
1427 CPU 3's load from Y to return 1, and its load from X to return 0.
1429 The key point is that although CPU 2's data dependency orders its load
1502 store to u as happening -after- cpu1()'s load from v, even though
1552  (*) Within a loop, forces the compiler to load the variables used
1627  (*) The compiler is within its rights to omit a load entirely if it knows
1639      gets rid of a load and a branch.  The problem is that the compiler
1657      the code into near-nonexistence.  (It will still load from the
1779      with a single memory-reference instruction, prevents "load tearing"
1798      Use of packed structures can also result in load and store tearing,
1817      load tearing on 'foo1.b' and store tearing on 'foo2.b'.  READ_ONCE()
1852 to issue the loads in the correct order (eg. `a[b]` would have to load
1855 (eg. is equal to 1) and load a[b] before b (eg. tmp = a[1]; if (b != 1)
1970      For load from persistent memory, existing read memory barriers are sufficient
2176 	LOAD event_indicated
2219 	LOAD event_indicated		  if ((LOAD task->state) & TASK_NORMAL)
2233 	LOAD Y				LOAD X
2405 	LOAD waiter->list.next;
2406 	LOAD waiter->task;
2429 	LOAD waiter->task;
2438 	LOAD waiter->list.next;
2446 	LOAD waiter->list.next;
2447 	LOAD waiter->task;
2529 	STORE *ADDR = 3, STORE *ADDR = 4, STORE *DATA = y, q = LOAD *DATA
2538 sections will include synchronous load operations on strictly ordered I/O
2683 ultimate effect.  For example, if two adjacent instructions both load an
2727 Although any particular load or store may not actually appear outside of the
2735 generate load and store operations which then go into the queue of memory
2808 	LOAD *A, STORE *B, LOAD *C, LOAD *D, STORE *E.
2840 	LOAD *A, ..., LOAD {*C,*D}, STORE *E, STORE *B
2842 	(Where "LOAD {*C,*D}" is a combined load)
2867 	U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A
2903 and the LOAD operation never appear outside of the CPU.