Documentation/core-api/swiotlb.rst

1 .. SPDX-License-Identifier: GPL-2.0
7 swiotlb is a memory buffer allocator used by the Linux kernel DMA layer. It is
9 buffer because of hardware limitations or other requirements. In such a case,
10 the DMA layer calls swiotlb to allocate a temporary memory buffer that conforms
11 to the limitations. The DMA is done to/from this temporary memory buffer, and
12 the CPU copies the data between the temporary buffer and the original target
13 memory buffer. This approach is generically called "bounce buffering", and the
14 temporary memory buffer is called a "bounce buffer".
19 These APIs use the device DMA attributes and kernel-wide settings to determine
24 Because the CPU copies data between the bounce buffer and the original target
25 memory buffer, doing bounce buffering is slower than doing DMA directly to the
26 original memory buffer, and it consumes more CPU resources. So it is used only
30 ---------------
33 only provide 32-bit DMA addresses. By allocating bounce buffer memory below
40 directed to guest memory that is unencrypted. CoCo VMs set a kernel-wide option
41 to force all DMA I/O to use bounce buffers, and the bounce buffer memory is set
42 up as unencrypted. The host does DMA I/O to/from the bounce buffer memory, and
44 data to/from the original target memory buffer. The CPU copying bridges between
54 IOMMU access control is per-granule, the untrusted device can gain access to
60 ------------------
62 swiotlb_tbl_unmap_single(). The "map" API allocates a bounce buffer of a
63 specified size in bytes and returns the physical address of the buffer. The
64 buffer memory is physically contiguous. The expectation is that the DMA layer
67 multiple memory buffer segments, a separate bounce buffer must be allocated for
69 CPU copy) to initialize the bounce buffer to match the contents of the original
70 buffer.
73 updated the bounce buffer memory and DMA_ATTR_SKIP_CPU_SYNC is not set, the
75 buffer back to the original buffer. Then the bounce buffer memory is freed.
78 a driver may use when control of a buffer transitions between the CPU and the
80 original buffer and the bounce buffer. Like the dma_sync_*() APIs, the swiotlb
82 buffer is copied to/from the original buffer.
85 ------------------------------
89 pre-allocated at boot time (but see Dynamic swiotlb below). Because swiotlb
93 The need to pre-allocate the default swiotlb pool creates a boot-time tradeoff.
94 The pool should be large enough to ensure that bounce buffer requests can
95 always be satisfied, as the non-blocking requirement means requests can't wait
97 this pre-allocated memory is not available for other uses in the system. The
99 I/O. These VMs use a heuristic to set the default pool size to ~6% of memory,
101 Conversely, the heuristic might produce a size that is insufficient, depending
104 default memory pool size remains an open issue.
108 are such that the device might use swiotlb, the maximum size of a DMA segment
109 must be limited to that 256 KiB. This value is communicated to higher-level
111 higher-level code fails to account for this limit, it may make requests that
117 bounce buffer match the same bits in the address of the original buffer. When
118 min_align_mask is non-zero, it may produce an "alignment offset" in the address
119 of the bounce buffer that slightly reduces the maximum size of an allocation.
124 swiotlb, max_sectors_kb will be 256 KiB. When min_align_mask is non-zero,
128 parameter specifies the allocation of bounce buffer space must start at a
130 bounce buffer might start at a larger address if min_align_mask is non-zero.
131 Hence there may be pre-padding space that is allocated prior to the start of
132 the bounce buffer. Similarly, the end of the bounce buffer is rounded up to an
133 alloc_align_mask boundary, potentially resulting in post-padding space. Any
134 pre-padding or post-padding space is not initialized by swiotlb code. The
136 devices. It is set to the granule size - 1 so that the bounce buffer is
140 ------------------------
143 default size of 64 MiB. The default pool size may be modified with the
144 "swiotlb=" kernel boot line parameter. The default size may also be adjusted
149 it works for devices that can only address 32-bits of physical memory (unless
150 architecture-specific code provides the SWIOTLB_ANY flag). In a CoCo VM, the
153 Each pool is divided into "slots" of size IO_TLB_SIZE, which is 2 KiB with
155 what might be called a "slot set". When a bounce buffer is allocated, it
157 bounce buffers. Furthermore, a bounce buffer must be allocated from a single
158 slot set, which leads to the maximum bounce buffer size being IO_TLB_SIZE *
159 IO_TLB_SEGSIZE. Multiple smaller bounce buffers may co-exist in a single slot
160 set if the alignment and size constraints can be met.
171 When allocating a bounce buffer, if the area associated with the calling CPU
186 Because a bounce buffer allocation can't cross a slot set boundary, eliminating
187 those initial slots effectively reduces the max size of a bounce buffer.
189 granule size, and granules cannot be larger than PAGE_SIZE. But if that were to
194 ---------------
195 When CONFIG_SWIOTLB_DYNAMIC is enabled, swiotlb can do on-demand expansion of
197 buffer request fails due to lack of available space, an asynchronous background
202 buffer request creates a "transient pool" to avoid returning an "swiotlb full"
203 error. A transient pool has the size of the bounce buffer request, and is
204 deleted when the bounce buffer is freed. Memory for this transient pool comes
208 background task can add another non-transient pool.
211 must be physically contiguous, so the size is limited to MAX_PAGE_ORDER pages
212 (e.g., 4 MiB on a typical x86 system). Due to memory fragmentation, a max size
214 until it succeeds, but with a minimum size of 1 MiB. Given sufficient system
218 in the default pool. Because the new pool size is typically a few MiB at most,
219 the number of areas will likely be smaller. For example, with a new pool size
220 of 4 MiB and the 256 KiB minimum area size, only 16 areas can be created. If
236 ----------------------
251 entry for each area, and is accessed using a 0-based area index derived from the
255 io_tlb_slot describes an individual memory slot in the pool, with size
257 index computed from the bounce buffer address relative to the starting memory
258 address of the pool. The size of struct io_tlb_slot is 24 bytes, so the
259 overhead is about 1% of the slot size.
262 APIs and the corresponding swiotlb APIs use the bounce buffer address as the
263 identifier for a bounce buffer. This address is returned by
266 memory buffer address obviously must be passed as an argument to
268 swiotlb data structures must save the original memory buffer address so that it
274 buffer but an address somewhere in the middle of the bounce buffer, and the
275 address of the start of the bounce buffer isn't known to swiotlb code. But
276 swiotlb code must be able to calculate the corresponding original memory buffer
278 memory buffer address is populated into the struct io_tlb_slot for each slot
279 occupied by the bounce buffer. An adjusted "alloc_size" of the bounce buffer is
281 the size of the "sync" operation. The "alloc_size" field is not used except for
291 available slots to use for a new bounce buffer. They are updated when allocating
292 a new bounce buffer and when freeing a bounce buffer. At pool creation time, the
298 swiotlb_tlb_map_single() allocates bounce buffer space to meet alloc_align_mask
299 requirements, it may allocate pre-padding space across zero or more slots. But
300 when swiotbl_tlb_unmap_single() is called with the bounce buffer address, the
304 The "pad_slots" value is recorded only in the first non-padding slot allocated
305 to the bounce buffer.
308 ----------------