1.. SPDX-License-Identifier: GPL-2.0
2.. Copyright (C) 2022, Google LLC.
3
4===============================
5Kernel Memory Sanitizer (KMSAN)
6===============================
7
8KMSAN is a dynamic error detector aimed at finding uses of uninitialized
9values. It is based on compiler instrumentation, and is quite similar to the
10userspace `MemorySanitizer tool`_.
11
12An important note is that KMSAN is not intended for production use, because it
13drastically increases kernel memory footprint and slows the whole system down.
14
15Usage
16=====
17
18Building the kernel
19-------------------
20
21In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+).
22Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
23
24Now configure and build the kernel with CONFIG_KMSAN enabled.
25
26Example report
27--------------
28
29Here is an example of a KMSAN report::
30
31  =====================================================
32  BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
33   test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
34   kunit_run_case_internal lib/kunit/test.c:333
35   kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
36   kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
37   kthread+0x721/0x850 kernel/kthread.c:327
38   ret_from_fork+0x1f/0x30 ??:?
39
40  Uninit was stored to memory at:
41   do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
42   test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
43   kunit_run_case_internal lib/kunit/test.c:333
44   kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
45   kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
46   kthread+0x721/0x850 kernel/kthread.c:327
47   ret_from_fork+0x1f/0x30 ??:?
48
49  Local variable uninit created at:
50   do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
51   test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
52
53  Bytes 4-7 of 8 are uninitialized
54  Memory access of size 8 starts at ffff888083fe3da0
55
56  CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G    B       E     5.16.0-rc3+ #104
57  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
58  =====================================================
59
60The report says that the local variable ``uninit`` was created uninitialized in
61``do_uninit_local_array()``. The third stack trace corresponds to the place
62where this variable was created.
63
64The first stack trace shows where the uninit value was used (in
65``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left
66uninitialized in the local variable, as well as the stack where the value was
67copied to another memory location before use.
68
69A use of uninitialized value ``v`` is reported by KMSAN in the following cases:
70
71 - in a condition, e.g. ``if (v) { ... }``;
72 - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``;
73 - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``;
74 - when it is passed as an argument to a function, and
75   ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below).
76
77The mentioned cases (apart from copying data to userspace or hardware, which is
78a security issue) are considered undefined behavior from the C11 Standard point
79of view.
80
81Disabling the instrumentation
82-----------------------------
83
84A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
85ignore uninitialized values in that function and mark its output as initialized.
86As a result, the user will not get KMSAN reports related to that function.
87
88Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
89Applying this attribute to a function will result in KMSAN not instrumenting
90it, which can be helpful if we do not want the compiler to interfere with some
91low-level code (e.g. that marked with ``noinstr`` which implicitly adds
92``__no_sanitize_memory``).
93
94This however comes at a cost: stack allocations from such functions will have
95incorrect shadow/origin values, likely leading to false positives. Functions
96called from non-instrumented code may also receive incorrect metadata for their
97parameters.
98
99As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
100
101It is also possible to disable KMSAN for a single file (e.g. main.o)::
102
103  KMSAN_SANITIZE_main.o := n
104
105or for the whole directory::
106
107  KMSAN_SANITIZE := n
108
109in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
110function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
111their code gets broken by KMSAN (e.g. runs at early boot time).
112
113KMSAN checks can also be temporarily disabled for the current task using
114``kmsan_disable_current()`` and ``kmsan_enable_current()`` calls. Each
115``kmsan_enable_current()`` call must be preceded by a
116``kmsan_disable_current()`` call; these call pairs may be nested. One needs to
117be careful with these calls, keeping the regions short and preferring other
118ways to disable instrumentation, where possible.
119
120Support
121=======
122
123In order for KMSAN to work the kernel must be built with Clang, which so far is
124the only compiler that has KMSAN support. The kernel instrumentation pass is
125based on the userspace `MemorySanitizer tool`_.
126
127The runtime library only supports x86_64 at the moment.
128
129How KMSAN works
130===============
131
132KMSAN shadow memory
133-------------------
134
135KMSAN associates a metadata byte (also called shadow byte) with every byte of
136kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
137kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
138setting its shadow bytes to ``0xff``) is called poisoning, marking it
139initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
140
141When a new variable is allocated on the stack, it is poisoned by default by
142instrumentation code inserted by the compiler (unless it is a stack variable
143that is immediately initialized). Any new heap allocation done without
144``__GFP_ZERO`` is also poisoned.
145
146Compiler instrumentation also tracks the shadow values as they are used along
147the code. When needed, instrumentation code invokes the runtime library in
148``mm/kmsan/`` to persist shadow values.
149
150The shadow value of a basic or compound type is an array of bytes of the same
151length. When a constant value is written into memory, that memory is unpoisoned.
152When a value is read from memory, its shadow memory is also obtained and
153propagated into all the operations which use that value. For every instruction
154that takes one or more values the compiler generates code that calculates the
155shadow of the result depending on those values and their shadows.
156
157Example::
158
159  int a = 0xff;  // i.e. 0x000000ff
160  int b;
161  int c = a | b;
162
163In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
164shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
165``c`` are uninitialized, while the lower byte is initialized.
166
167Origin tracking
168---------------
169
170Every four bytes of kernel memory also have a so-called origin mapped to them.
171This origin describes the point in program execution at which the uninitialized
172value was created. Every origin is associated with either the full allocation
173stack (for heap-allocated memory), or the function containing the uninitialized
174variable (for locals).
175
176When an uninitialized variable is allocated on stack or heap, a new origin
177value is created, and that variable's origin is filled with that value. When a
178value is read from memory, its origin is also read and kept together with the
179shadow. For every instruction that takes one or more values, the origin of the
180result is one of the origins corresponding to any of the uninitialized inputs.
181If a poisoned value is written into memory, its origin is written to the
182corresponding storage as well.
183
184Example 1::
185
186  int a = 42;
187  int b;
188  int c = a + b;
189
190In this case the origin of ``b`` is generated upon function entry, and is
191stored to the origin of ``c`` right before the addition result is written into
192memory.
193
194Several variables may share the same origin address, if they are stored in the
195same four-byte chunk. In this case every write to either variable updates the
196origin for all of them. We have to sacrifice precision in this case, because
197storing origins for individual bits (and even bytes) would be too costly.
198
199Example 2::
200
201  int combine(short a, short b) {
202    union ret_t {
203      int i;
204      short s[2];
205    } ret;
206    ret.s[0] = a;
207    ret.s[1] = b;
208    return ret.i;
209  }
210
211If ``a`` is initialized and ``b`` is not, the shadow of the result would be
2120xffff0000, and the origin of the result would be the origin of ``b``.
213``ret.s[0]`` would have the same origin, but it will never be used, because
214that variable is initialized.
215
216If both function arguments are uninitialized, only the origin of the second
217argument is preserved.
218
219Origin chaining
220~~~~~~~~~~~~~~~
221
222To ease debugging, KMSAN creates a new origin for every store of an
223uninitialized value to memory. The new origin references both its creation stack
224and the previous origin the value had. This may cause increased memory
225consumption, so we limit the length of origin chains in the runtime.
226
227Clang instrumentation API
228-------------------------
229
230Clang instrumentation pass inserts calls to functions defined in
231``mm/kmsan/nstrumentation.c`` into the kernel code.
232
233Shadow manipulation
234~~~~~~~~~~~~~~~~~~~
235
236For every memory access the compiler emits a call to a function that returns a
237pair of pointers to the shadow and origin addresses of the given memory::
238
239  typedef struct {
240    void *shadow, *origin;
241  } shadow_origin_ptr_t
242
243  shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
244  shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
245  shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
246  shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
247
248The function name depends on the memory access size.
249
250The compiler makes sure that for every loaded value its shadow and origin
251values are read from memory. When a value is stored to memory, its shadow and
252origin are also stored using the metadata pointers.
253
254Handling locals
255~~~~~~~~~~~~~~~
256
257A special function is used to create a new origin value for a local variable and
258set the origin of that variable to that value::
259
260  void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
261
262Access to per-task data
263~~~~~~~~~~~~~~~~~~~~~~~
264
265At the beginning of every instrumented function KMSAN inserts a call to
266``__msan_get_context_state()``::
267
268  kmsan_context_state *__msan_get_context_state(void)
269
270``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
271
272  struct kmsan_context_state {
273    char param_tls[KMSAN_PARAM_SIZE];
274    char retval_tls[KMSAN_RETVAL_SIZE];
275    char va_arg_tls[KMSAN_PARAM_SIZE];
276    char va_arg_origin_tls[KMSAN_PARAM_SIZE];
277    u64 va_arg_overflow_size_tls;
278    char param_origin_tls[KMSAN_PARAM_SIZE];
279    depot_stack_handle_t retval_origin_tls;
280  };
281
282This structure is used by KMSAN to pass parameter shadows and origins between
283instrumented functions (unless the parameters are checked immediately by
284``CONFIG_KMSAN_CHECK_PARAM_RETVAL``).
285
286Passing uninitialized values to functions
287~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
288
289Clang's MemorySanitizer instrumentation has an option,
290``-fsanitize-memory-param-retval``, which makes the compiler check function
291parameters passed by value, as well as function return values.
292
293The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is
294enabled by default to let KMSAN report uninitialized values earlier.
295Please refer to the `LKML discussion`_ for more details.
296
297Because of the way the checks are implemented in LLVM (they are only applied to
298parameters marked as ``noundef``), not all parameters are guaranteed to be
299checked, so we cannot give up the metadata storage in ``kmsan_context_state``.
300
301String functions
302~~~~~~~~~~~~~~~~
303
304The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
305following functions. These functions are also called when data structures are
306initialized or copied, making sure shadow and origin values are copied alongside
307with the data::
308
309  void *__msan_memcpy(void *dst, void *src, uintptr_t n)
310  void *__msan_memmove(void *dst, void *src, uintptr_t n)
311  void *__msan_memset(void *dst, int c, uintptr_t n)
312
313Error reporting
314~~~~~~~~~~~~~~~
315
316For each use of a value the compiler emits a shadow check that calls
317``__msan_warning()`` in the case that value is poisoned::
318
319  void __msan_warning(u32 origin)
320
321``__msan_warning()`` causes KMSAN runtime to print an error report.
322
323Inline assembly instrumentation
324~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
325
326KMSAN instruments every inline assembly output with a call to::
327
328  void __msan_instrument_asm_store(void *addr, uintptr_t size)
329
330, which unpoisons the memory region.
331
332This approach may mask certain errors, but it also helps to avoid a lot of
333false positives in bitwise operations, atomics etc.
334
335Sometimes the pointers passed into inline assembly do not point to valid memory.
336In such cases they are ignored at runtime.
337
338
339Runtime library
340---------------
341
342The code is located in ``mm/kmsan/``.
343
344Per-task KMSAN state
345~~~~~~~~~~~~~~~~~~~~
346
347Every task_struct has an associated KMSAN task state that holds the KMSAN
348context (see above) and a per-task counter disallowing KMSAN reports::
349
350  struct kmsan_context {
351    ...
352    unsigned int depth;
353    struct kmsan_context_state cstate;
354    ...
355  }
356
357  struct task_struct {
358    ...
359    struct kmsan_context kmsan;
360    ...
361  }
362
363KMSAN contexts
364~~~~~~~~~~~~~~
365
366When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
367hold the metadata for function parameters and return values.
368
369But in the case the kernel is running in the interrupt, softirq or NMI context,
370where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
371
372  DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
373
374Metadata allocation
375~~~~~~~~~~~~~~~~~~~
376
377There are several places in the kernel for which the metadata is stored.
378
3791. Each ``struct page`` instance contains two pointers to its shadow and
380origin pages::
381
382  struct page {
383    ...
384    struct page *shadow, *origin;
385    ...
386  };
387
388At boot-time, the kernel allocates shadow and origin pages for every available
389kernel page. This is done quite late, when the kernel address space is already
390fragmented, so normal data pages may arbitrarily interleave with the metadata
391pages.
392
393This means that in general for two contiguous memory pages their shadow/origin
394pages may not be contiguous. Consequently, if a memory access crosses the
395boundary of a memory block, accesses to shadow/origin memory may potentially
396corrupt other pages or read incorrect values from them.
397
398In practice, contiguous memory pages returned by the same ``alloc_pages()``
399call will have contiguous metadata, whereas if these pages belong to two
400different allocations their metadata pages can be fragmented.
401
402For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
403there also are no guarantees on metadata contiguity.
404
405In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
406pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
407
408  char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
409  char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
410
411``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
412All stores to ``dummy_store_page`` are ignored.
413
4142. For vmalloc memory and modules, there is a direct mapping between the memory
415range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
416the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
417area contains shadow memory for the first quarter, the third one holds the
418origins. A small part of the fourth quarter contains shadow and origins for the
419kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
420more details.
421
422When an array of pages is mapped into a contiguous virtual memory space, their
423shadow and origin pages are similarly mapped into contiguous regions.
424
425References
426==========
427
428E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
429memory use in C++
430<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
431In Proceedings of CGO 2015.
432
433.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
434.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
435.. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/
436