1==============================
2Memory Layout on AArch64 Linux
3==============================
4
5Author: Catalin Marinas <catalin.marinas@arm.com>
6
7This document describes the virtual memory layout used by the AArch64
8Linux kernel. The architecture allows up to 4 levels of translation
9tables with a 4KB page size and up to 3 levels with a 64KB page size.
10
11AArch64 Linux uses either 3 levels or 4 levels of translation tables
12with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
13(256TB) virtual addresses, respectively, for both user and kernel. With
1464KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
15virtual address, are used but the memory layout is the same.
16
17ARMv8.2 adds optional support for Large Virtual Address space. This is
18only available when running with a 64KB page size and expands the
19number of descriptors in the first level of translation.
20
21TTBRx selection is given by bit 55 of the virtual address. The
22swapper_pg_dir contains only kernel (global) mappings while the user pgd
23contains only user (non-global) mappings.  The swapper_pg_dir address is
24written to TTBR1 and never written to TTBR0.
25
26
27AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
28
29  Start			End			Size		Use
30  -----------------------------------------------------------------------
31  0000000000000000	0000ffffffffffff	 256TB		user
32  ffff000000000000	ffff7fffffffffff	 128TB		kernel logical memory map
33 [ffff600000000000	ffff7fffffffffff]	  32TB		[kasan shadow region]
34  ffff800000000000	ffff80007fffffff	   2GB		modules
35  ffff800080000000	fffffbffefffffff	 124TB		vmalloc
36  fffffbfff0000000	fffffbfffdffffff	 224MB		fixed mappings (top down)
37  fffffbfffe000000	fffffbfffe7fffff	   8MB		[guard region]
38  fffffbfffe800000	fffffbffff7fffff	  16MB		PCI I/O space
39  fffffbffff800000	fffffbffffffffff	   8MB		[guard region]
40  fffffc0000000000	fffffdffffffffff	   2TB		vmemmap
41  fffffe0000000000	ffffffffffffffff	   2TB		[guard region]
42
43
44AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
45
46  Start			End			Size		Use
47  -----------------------------------------------------------------------
48  0000000000000000	000fffffffffffff	   4PB		user
49  fff0000000000000	ffff7fffffffffff	  ~4PB		kernel logical memory map
50 [fffd800000000000	ffff7fffffffffff]	 512TB		[kasan shadow region]
51  ffff800000000000	ffff80007fffffff	   2GB		modules
52  ffff800080000000	fffffbffefffffff	 124TB		vmalloc
53  fffffbfff0000000	fffffbfffdffffff	 224MB		fixed mappings (top down)
54  fffffbfffe000000	fffffbfffe7fffff	   8MB		[guard region]
55  fffffbfffe800000	fffffbffff7fffff	  16MB		PCI I/O space
56  fffffbffff800000	fffffbffffffffff	   8MB		[guard region]
57  fffffc0000000000	ffffffdfffffffff	  ~4TB		vmemmap
58  ffffffe000000000	ffffffffffffffff	 128GB		[guard region]
59
60
61Translation table lookup with 4KB pages::
62
63  +--------+--------+--------+--------+--------+--------+--------+--------+
64  |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
65  +--------+--------+--------+--------+--------+--------+--------+--------+
66            |        |         |         |         |         |
67            |        |         |         |         |         v
68            |        |         |         |         |   [11:0]  in-page offset
69            |        |         |         |         +-> [20:12] L3 index
70            |        |         |         +-----------> [29:21] L2 index
71            |        |         +---------------------> [38:30] L1 index
72            |        +-------------------------------> [47:39] L0 index
73            +----------------------------------------> [55] TTBR0/1
74
75
76Translation table lookup with 64KB pages::
77
78  +--------+--------+--------+--------+--------+--------+--------+--------+
79  |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|
80  +--------+--------+--------+--------+--------+--------+--------+--------+
81            |        |    |               |              |
82            |        |    |               |              v
83            |        |    |               |            [15:0]  in-page offset
84            |        |    |               +----------> [28:16] L3 index
85            |        |    +--------------------------> [41:29] L2 index
86            |        +-------------------------------> [47:42] L1 index (48-bit)
87            |                                          [51:42] L1 index (52-bit)
88            +----------------------------------------> [55] TTBR0/1
89
90
91When using KVM without the Virtualization Host Extensions, the
92hypervisor maps kernel pages in EL2 at a fixed (and potentially
93random) offset from the linear mapping. See the kern_hyp_va macro and
94kvm_update_va_mask function for more details. MMIO devices such as
95GICv2 gets mapped next to the HYP idmap page, as do vectors when
96ARM64_SPECTRE_V3A is enabled for particular CPUs.
97
98When using KVM with the Virtualization Host Extensions, no additional
99mappings are created, since the host kernel runs directly in EL2.
100
10152-bit VA support in the kernel
102-------------------------------
103If the ARMv8.2-LVA optional feature is present, and we are running
104with a 64KB page size; then it is possible to use 52-bits of address
105space for both userspace and kernel addresses. However, any kernel
106binary that supports 52-bit must also be able to fall back to 48-bit
107at early boot time if the hardware feature is not present.
108
109This fallback mechanism necessitates the kernel .text to be in the
110higher addresses such that they are invariant to 48/52-bit VAs. Due
111to the kasan shadow being a fraction of the entire kernel VA space,
112the end of the kasan shadow must also be in the higher half of the
113kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
114the end of the kasan shadow is invariant and dependent on ~0UL,
115whilst the start address will "grow" towards the lower addresses).
116
117In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
118is kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
119this obviates the need for an extra variable read. The physvirt
120offset and vmemmap offsets are computed at early boot to enable
121this logic.
122
123As a single binary will need to support both 48-bit and 52-bit VA
124spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
125also must be sized large enough to accommodate a fixed PAGE_OFFSET.
126
127Most code in the kernel should not need to consider the VA_BITS, for
128code that does need to know the VA size the variables are
129defined as follows:
130
131VA_BITS		constant	the *maximum* VA space size
132
133VA_BITS_MIN	constant	the *minimum* VA space size
134
135vabits_actual	variable	the *actual* VA space size
136
137
138Maximum and minimum sizes can be useful to ensure that buffers are
139sized large enough or that addresses are positioned close enough for
140the "worst" case.
141
14252-bit userspace VAs
143--------------------
144To maintain compatibility with software that relies on the ARMv8.0
145VA space maximum size of 48-bits, the kernel will, by default,
146return virtual addresses to userspace from a 48-bit range.
147
148Software can "opt-in" to receiving VAs from a 52-bit space by
149specifying an mmap hint parameter that is larger than 48-bit.
150
151For example:
152
153.. code-block:: c
154
155   maybe_high_address = mmap(~0UL, size, prot, flags,...);
156
157It is also possible to build a debug kernel that returns addresses
158from a 52-bit space by enabling the following kernel config options:
159
160.. code-block:: sh
161
162   CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
163
164Note that this option is only intended for debugging applications
165and should not be used in production.
166