1 ======================================
2 Sequence counters and sequential locks
3 ======================================
4 
5 Introduction
6 ============
7 
8 Sequence counters are a reader-writer consistency mechanism with
9 lockless readers (read-only retry loops), and no writer starvation. They
10 are used for data that's rarely written to (e.g. system time), where the
11 reader wants a consistent set of information and is willing to retry if
12 that information changes.
13 
14 A data set is consistent when the sequence count at the beginning of the
15 read side critical section is even and the same sequence count value is
16 read again at the end of the critical section. The data in the set must
17 be copied out inside the read side critical section. If the sequence
18 count has changed between the start and the end of the critical section,
19 the reader must retry.
20 
21 Writers increment the sequence count at the start and the end of their
22 critical section. After starting the critical section the sequence count
23 is odd and indicates to the readers that an update is in progress. At
24 the end of the write side critical section the sequence count becomes
25 even again which lets readers make progress.
26 
27 A sequence counter write side critical section must never be preempted
28 or interrupted by read side sections. Otherwise the reader will spin for
29 the entire scheduler tick due to the odd sequence count value and the
30 interrupted writer. If that reader belongs to a real-time scheduling
31 class, it can spin forever and the kernel will livelock.
32 
33 This mechanism cannot be used if the protected data contains pointers,
34 as the writer can invalidate a pointer that the reader is following.
35 
36 
37 .. _seqcount_t:
38 
39 Sequence counters (``seqcount_t``)
40 ==================================
41 
42 This is the raw counting mechanism, which does not protect against
43 multiple writers.  Write side critical sections must thus be serialized
44 by an external lock.
45 
46 If the write serialization primitive is not implicitly disabling
47 preemption, preemption must be explicitly disabled before entering the
48 write side section. If the read section can be invoked from hardirq or
49 softirq contexts, interrupts or bottom halves must also be respectively
50 disabled before entering the write section.
51 
52 If it's desired to automatically handle the sequence counter
53 requirements of writer serialization and non-preemptibility, use
54 :ref:`seqlock_t` instead.
55 
56 Initialization::
57 
58 	/* dynamic */
59 	seqcount_t foo_seqcount;
60 	seqcount_init(&foo_seqcount);
61 
62 	/* static */
63 	static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
64 
65 	/* C99 struct init */
66 	struct {
67 		.seq   = SEQCNT_ZERO(foo.seq),
68 	} foo;
69 
70 Write path::
71 
72 	/* Serialized context with disabled preemption */
73 
74 	write_seqcount_begin(&foo_seqcount);
75 
76 	/* ... [[write-side critical section]] ... */
77 
78 	write_seqcount_end(&foo_seqcount);
79 
80 Read path::
81 
82 	do {
83 		seq = read_seqcount_begin(&foo_seqcount);
84 
85 		/* ... [[read-side critical section]] ... */
86 
87 	} while (read_seqcount_retry(&foo_seqcount, seq));
88 
89 
90 .. _seqcount_locktype_t:
91 
92 Sequence counters with associated locks (``seqcount_LOCKNAME_t``)
93 -----------------------------------------------------------------
94 
95 As discussed at :ref:`seqcount_t`, sequence count write side critical
96 sections must be serialized and non-preemptible. This variant of
97 sequence counters associate the lock used for writer serialization at
98 initialization time, which enables lockdep to validate that the write
99 side critical sections are properly serialized.
100 
101 This lock association is a NOOP if lockdep is disabled and has neither
102 storage nor runtime overhead. If lockdep is enabled, the lock pointer is
103 stored in struct seqcount and lockdep's "lock is held" assertions are
104 injected at the beginning of the write side critical section to validate
105 that it is properly protected.
106 
107 For lock types which do not implicitly disable preemption, preemption
108 protection is enforced in the write side function.
109 
110 The following sequence counters with associated locks are defined:
111 
112   - ``seqcount_spinlock_t``
113   - ``seqcount_raw_spinlock_t``
114   - ``seqcount_rwlock_t``
115   - ``seqcount_mutex_t``
116   - ``seqcount_ww_mutex_t``
117 
118 The sequence counter read and write APIs can take either a plain
119 seqcount_t or any of the seqcount_LOCKNAME_t variants above.
120 
121 Initialization (replace "LOCKNAME" with one of the supported locks)::
122 
123 	/* dynamic */
124 	seqcount_LOCKNAME_t foo_seqcount;
125 	seqcount_LOCKNAME_init(&foo_seqcount, &lock);
126 
127 	/* static */
128 	static seqcount_LOCKNAME_t foo_seqcount =
129 		SEQCNT_LOCKNAME_ZERO(foo_seqcount, &lock);
130 
131 	/* C99 struct init */
132 	struct {
133 		.seq   = SEQCNT_LOCKNAME_ZERO(foo.seq, &lock),
134 	} foo;
135 
136 Write path: same as in :ref:`seqcount_t`, while running from a context
137 with the associated write serialization lock acquired.
138 
139 Read path: same as in :ref:`seqcount_t`.
140 
141 
142 .. _seqcount_latch_t:
143 
144 Latch sequence counters (``seqcount_latch_t``)
145 ----------------------------------------------
146 
147 Latch sequence counters are a multiversion concurrency control mechanism
148 where the embedded seqcount_t counter even/odd value is used to switch
149 between two copies of protected data. This allows the sequence counter
150 read path to safely interrupt its own write side critical section.
151 
152 Use seqcount_latch_t when the write side sections cannot be protected
153 from interruption by readers. This is typically the case when the read
154 side can be invoked from NMI handlers.
155 
156 Check `raw_write_seqcount_latch()` for more information.
157 
158 
159 .. _seqlock_t:
160 
161 Sequential locks (``seqlock_t``)
162 ================================
163 
164 This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
165 embedded spinlock for writer serialization and non-preemptibility.
166 
167 If the read side section can be invoked from hardirq or softirq context,
168 use the write side function variants which disable interrupts or bottom
169 halves respectively.
170 
171 Initialization::
172 
173 	/* dynamic */
174 	seqlock_t foo_seqlock;
175 	seqlock_init(&foo_seqlock);
176 
177 	/* static */
178 	static DEFINE_SEQLOCK(foo_seqlock);
179 
180 	/* C99 struct init */
181 	struct {
182 		.seql   = __SEQLOCK_UNLOCKED(foo.seql)
183 	} foo;
184 
185 Write path::
186 
187 	write_seqlock(&foo_seqlock);
188 
189 	/* ... [[write-side critical section]] ... */
190 
191 	write_sequnlock(&foo_seqlock);
192 
193 Read path, three categories:
194 
195 1. Normal Sequence readers which never block a writer but they must
196    retry if a writer is in progress by detecting change in the sequence
197    number.  Writers do not wait for a sequence reader::
198 
199 	do {
200 		seq = read_seqbegin(&foo_seqlock);
201 
202 		/* ... [[read-side critical section]] ... */
203 
204 	} while (read_seqretry(&foo_seqlock, seq));
205 
206 2. Locking readers which will wait if a writer or another locking reader
207    is in progress. A locking reader in progress will also block a writer
208    from entering its critical section. This read lock is
209    exclusive. Unlike rwlock_t, only one locking reader can acquire it::
210 
211 	read_seqlock_excl(&foo_seqlock);
212 
213 	/* ... [[read-side critical section]] ... */
214 
215 	read_sequnlock_excl(&foo_seqlock);
216 
217 3. Conditional lockless reader (as in 1), or locking reader (as in 2),
218    according to a passed marker. This is used to avoid lockless readers
219    starvation (too much retry loops) in case of a sharp spike in write
220    activity. First, a lockless read is tried (even marker passed). If
221    that trial fails (odd sequence counter is returned, which is used as
222    the next iteration marker), the lockless read is transformed to a
223    full locking read and no retry loop is necessary::
224 
225 	/* marker; even initialization */
226 	int seq = 0;
227 	do {
228 		read_seqbegin_or_lock(&foo_seqlock, &seq);
229 
230 		/* ... [[read-side critical section]] ... */
231 
232 	} while (need_seqretry(&foo_seqlock, seq));
233 	done_seqretry(&foo_seqlock, seq);
234 
235 
236 API documentation
237 =================
238 
239 .. kernel-doc:: include/linux/seqlock.h
240