Lines Matching +full:in +full:- +full:kernel
1 .. SPDX-License-Identifier: GPL-2.0
11 countermeasure against attacks on the shared user/kernel address
16 the kernel is entered via syscalls, interrupts or exceptions, the
17 page tables are switched to the full "kernel" copy. When the system
20 The userspace page tables contain only a minimal amount of kernel
21 data: only what is needed to enter/exit the kernel such as the
25 comments in pti.c).
27 This approach helps to ensure that side-channel attacks leveraging
30 time. Once enabled at compile-time, it can be disabled at boot with
31 the 'nopti' or 'pti=' kernel parameters (see kernel-parameters.txt).
36 When PTI is enabled, the kernel manages two sets of page tables.
37 The first set is very similar to the single set which is present in
39 that the kernel can use for things like copy_to_user().
41 Although _complete_, the user portion of the kernel page tables is
42 crippled by setting the NX bit in the top level. This ensures
43 that any missed kernel->user CR3 switch will immediately crash
46 The userspace page tables map only the kernel data needed to enter
47 and exit the kernel. This data is entirely contained in the 'struct
48 cpu_entry_area' structure which is placed in the fixmap which gives
49 each CPU's copy of the area a compile-time-fixed virtual address.
51 For new userspace mappings, the kernel makes the entries in its
52 page tables like normal. The only difference is when the kernel
53 makes entries in the top (PGD) level. In addition to setting the
54 entry in the main kernel PGD, a copy of the entry is made in the
65 Protection against side-channel attacks is important. But,
70 a. Each process now needs an order-1 PGD instead of order-0.
72 b. The 'cpu_entry_area' structure must be 2MB in size and 2MB
74 entry. This consumes nearly 2MB of RAM once the kernel
75 is decompressed, but no space in the kernel image itself.
81 and exit (it can be skipped when the kernel is interrupted,
87 c. Global pages are disabled for all kernel structures not
88 mapped into both kernel and userspace page tables. This
90 entries mapping the kernel. Losing the feature means more
95 tables by setting a special bit in CR3 when the page tables
97 switch, or kernel entry/exit) cheaper. But, on systems with
99 and kernel entries out of the TLB. The user PCID TLB flush is
103 process. Even without PTI, the shared kernel mappings
104 are created by copying top-level (PGD) entries into each
105 new process. But, with PTI, there are now *two* kernel
106 mappings: one in the kernel page tables that maps everything
109 f. In addition to the fork()-time copying, there must also
111 on a PGD used to map userspace. This ensures that the kernel
117 h. INVPCID is a TLB-flushing instruction which allows flushing
118 of TLB entries for non-current PCIDs. Some systems support
121 flushing a kernel address, we need to flush all PCIDs, so a
122 single kernel address flush will require a TLB-flushing CR3
129 2. Allow PTI to be enabled/disabled at runtime in addition to the
130 boot-time switching.
136 ideally doing all of these in parallel:
140 (excluding MPX and protection_keys) in a loop on multiple CPUs for
141 several minutes. These tests frequently uncover corner cases in the
142 kernel entry code. In general, old kernels might cause these tests
143 themselves to crash, but they should never crash the kernel.
144 3. Run the 'perf' tool in a mode (top or record) that generates many
145 frequent performance monitoring non-maskable interrupts (see "NMI"
146 in /proc/interrupts). This exercises the NMI entry/exit code which
147 is known to trigger bugs in code paths that did not expect to be
148 interrupted, including nested NMIs. Using "-c" boosts the rate of
149 NMIs, and using two -c with separate counters encourages nested NMIs
153 while true; do perf record -c 10000 -e instructions,cycles -a sleep 10; done
156 5. Run 32-bit binaries on systems supporting the SYSCALL instruction.
157 This has been a lightly-tested code path and needs extra scrutiny.
162 Bugs in PTI cause a few different signatures of crashes
165 * Failures of the selftests/x86 code. Usually a bug in one of the
167 * Crashes in early boot, especially around CPU bringup. Bugs
168 in the mappings cause these.
169 * Crashes at the first interrupt. Caused by bugs in entry_64.S,
178 * Kernel crashes at the first exit to userspace. entry_64.S
181 in entry_64.S that return to userspace are sometimes separate
182 from the ones that return to the kernel.
183 * Double faults: overflowing the kernel stack because of page
184 faults upon page faults. Caused by touching non-pti-mapped
185 data in the entry code, or forgetting to switch to kernel
186 CR3 before calling into C functions which are not pti-mapped.
187 * Userspace segfaults early in boot, sometimes manifesting