arch/arm/vlocks.rst

2 vlocks for Bare-Metal Mutual Exclusion
5 Voting Locks, or "vlocks" provide a simple low-level mutual exclusion
9 These are intended to be used to coordinate critical activity among CPUs
10 which are otherwise non-coherent, in situations where the hardware
11 provides no other mechanism to support this and ordinary spinlocks
16 writes to a single memory location.  To arbitrate, every CPU "votes for
17 itself", by storing a unique number to a common memory location.  The
21 In order to make sure that the election produces an unambiguous result
23 no winner has been chosen and the election does not appear to have
28 ---------
30 The easiest way to explain the vlocks algorithm is with some pseudo-code::
34 	int last_vote = -1; /* no votes yet */
38 		/* signal our desire to vote */
40 		if (last_vote != -1) {
64 		last_vote = -1;
68 The currently_voting[] array provides a way for the CPUs to determine
69 whether an election is in progress, and plays a role analogous to the
73 atomicity is used to pick the winner.  This avoids the need for a static
74 priority rule to act as a tie-breaker, or any counters which could
77 As long as the last_vote variable is globally visible to all CPUs, it
78 will contain only one value that won't change once every CPU has cleared
83 ------------------------
85  * vlocks are not intended to be fair.  In the contended case, it is the
86    _last_ CPU which attempts to get the lock which will be most likely
87    to win.
89    vlocks are therefore best suited to situations where it is necessary
90    to pick a unique winner, but it does not matter which CPU actually
93  * Like other similar mechanisms, vlocks will not scale well to a large
96    vlocks can be cascaded in a voting hierarchy to permit better scaling
120 ------------------
126    we can read the whole array in one transaction (providing the number
128    reduces the number of round-trips required to external memory.
136    ...in place of code equivalent to::
147    This cuts down on the fast-path latency, as well as potentially
152    different sizes, similarly to many other architectures.  Note that
154    bits of Rt, so there is no need to worry about endianness in this
157    If there are too many CPUs to read the currently_voting array in
158    one transaction then multiple transactions are still required.  The
159    implementation uses a simple loop of word-sized loads for this
165    to keep the code simple this was not attempted in the initial
169  * vlocks are currently only used to coordinate between CPUs which are
170    unable to enable their caches yet.  This means that the
171    implementation removes many of the barriers which would be required
175    memory unless all CPUs contending the lock are cache-coherent, due
176    to cache writebacks from one CPU clobbering values written by other
177    CPUs.  (Though if all the CPUs are cache-coherent, you should be
182    -1 as in the pseudocode).  This allows statically-allocated vlocks
183    to be implicitly initialised to an unlocked state simply by putting
186    An offset is added to each CPU's ID for the purpose of setting this
191 --------
194 use in ARM-based big.LITTLE platforms, with review and input gratefully
195 received from Nicolas Pitre and Achin Gupta.  Thanks to Nicolas for
199 Copyright (C) 2012-2013  Linaro Limited
205 ----------
208     Problem", Communications of the ACM 17, 8 (August 1974), 453-455.