1.. SPDX-License-Identifier: GPL-2.0-only
2
3dm-vdo
4======
5
6The dm-vdo (virtual data optimizer) device mapper target provides
7block-level deduplication, compression, and thin provisioning. As a device
8mapper target, it can add these features to the storage stack, compatible
9with any file system. The vdo target does not protect against data
10corruption, relying instead on integrity protection of the storage below
11it. It is strongly recommended that lvm be used to manage vdo volumes. See
12lvmvdo(7).
13
14Userspace component
15===================
16
17Formatting a vdo volume requires the use of the 'vdoformat' tool, available
18at:
19
20https://github.com/dm-vdo/vdo/
21
22In most cases, a vdo target will recover from a crash automatically the
23next time it is started. In cases where it encountered an unrecoverable
24error (either during normal operation or crash recovery) the target will
25enter or come up in read-only mode. Because read-only mode is indicative of
26data-loss, a positive action must be taken to bring vdo out of read-only
27mode. The 'vdoforcerebuild' tool, available from the same repo, is used to
28prepare a read-only vdo to exit read-only mode. After running this tool,
29the vdo target will rebuild its metadata the next time it is
30started. Although some data may be lost, the rebuilt vdo's metadata will be
31internally consistent and the target will be writable again.
32
33The repo also contains additional userspace tools which can be used to
34inspect a vdo target's on-disk metadata. Fortunately, these tools are
35rarely needed except by dm-vdo developers.
36
37Metadata requirements
38=====================
39
40Each vdo volume reserves 3GB of space for metadata, or more depending on
41its configuration. It is helpful to check that the space saved by
42deduplication and compression is not cancelled out by the metadata
43requirements. An estimation of the space saved for a specific dataset can
44be computed with the vdo estimator tool, which is available at:
45
46https://github.com/dm-vdo/vdoestimator/
47
48Target interface
49================
50
51Table line
52----------
53
54::
55
56	<offset> <logical device size> vdo V4 <storage device>
57	<storage device size> <minimum I/O size> <block map cache size>
58	<block map era length> [optional arguments]
59
60
61Required parameters:
62
63	offset:
64		The offset, in sectors, at which the vdo volume's logical
65		space begins.
66
67	logical device size:
68		The size of the device which the vdo volume will service,
69		in sectors. Must match the current logical size of the vdo
70		volume.
71
72	storage device:
73		The device holding the vdo volume's data and metadata.
74
75	storage device size:
76		The size of the device holding the vdo volume, as a number
77		of 4096-byte blocks. Must match the current size of the vdo
78		volume.
79
80	minimum I/O size:
81		The minimum I/O size for this vdo volume to accept, in
82		bytes. Valid values are 512 or 4096. The recommended value
83		is 4096.
84
85	block map cache size:
86		The size of the block map cache, as a number of 4096-byte
87		blocks. The minimum and recommended value is 32768 blocks.
88		If the logical thread count is non-zero, the cache size
89		must be at least 4096 blocks per logical thread.
90
91	block map era length:
92		The speed with which the block map cache writes out
93		modified block map pages. A smaller era length is likely to
94		reduce the amount of time spent rebuilding, at the cost of
95		increased block map writes during normal operation. The
96		maximum and recommended value is 16380; the minimum value
97		is 1.
98
99Optional parameters:
100--------------------
101Some or all of these parameters may be specified as <key> <value> pairs.
102
103Thread related parameters:
104
105Different categories of work are assigned to separate thread groups, and
106the number of threads in each group can be configured separately.
107
108If <hash>, <logical>, and <physical> are all set to 0, the work handled by
109all three thread types will be handled by a single thread. If any of these
110values are non-zero, all of them must be non-zero.
111
112	ack:
113		The number of threads used to complete bios. Since
114		completing a bio calls an arbitrary completion function
115		outside the vdo volume, threads of this type allow the vdo
116		volume to continue processing requests even when bio
117		completion is slow. The default is 1.
118
119	bio:
120		The number of threads used to issue bios to the underlying
121		storage. Threads of this type allow the vdo volume to
122		continue processing requests even when bio submission is
123		slow. The default is 4.
124
125	bioRotationInterval:
126		The number of bios to enqueue on each bio thread before
127		switching to the next thread. The value must be greater
128		than 0 and not more than 1024; the default is 64.
129
130	cpu:
131		The number of threads used to do CPU-intensive work, such
132		as hashing and compression. The default is 1.
133
134	hash:
135		The number of threads used to manage data comparisons for
136		deduplication based on the hash value of data blocks. The
137		default is 0.
138
139	logical:
140		The number of threads used to manage caching and locking
141		based on the logical address of incoming bios. The default
142		is 0; the maximum is 60.
143
144	physical:
145		The number of threads used to manage administration of the
146		underlying storage device. At format time, a slab size for
147		the vdo is chosen; the vdo storage device must be large
148		enough to have at least 1 slab per physical thread. The
149		default is 0; the maximum is 16.
150
151Miscellaneous parameters:
152
153	maxDiscard:
154		The maximum size of discard bio accepted, in 4096-byte
155		blocks. I/O requests to a vdo volume are normally split
156		into 4096-byte blocks, and processed up to 2048 at a time.
157		However, discard requests to a vdo volume can be
158		automatically split to a larger size, up to <maxDiscard>
159		4096-byte blocks in a single bio, and are limited to 1500
160		at a time. Increasing this value may provide better overall
161		performance, at the cost of increased latency for the
162		individual discard requests. The default and minimum is 1;
163		the maximum is UINT_MAX / 4096.
164
165	deduplication:
166		Whether deduplication is enabled. The default is 'on'; the
167		acceptable values are 'on' and 'off'.
168
169	compression:
170		Whether compression is enabled. The default is 'off'; the
171		acceptable values are 'on' and 'off'.
172
173Device modification
174-------------------
175
176A modified table may be loaded into a running, non-suspended vdo volume.
177The modifications will take effect when the device is next resumed. The
178modifiable parameters are <logical device size>, <physical device size>,
179<maxDiscard>, <compression>, and <deduplication>.
180
181If the logical device size or physical device size are changed, upon
182successful resume vdo will store the new values and require them on future
183startups. These two parameters may not be decreased. The logical device
184size may not exceed 4 PB. The physical device size must increase by at
185least 32832 4096-byte blocks if at all, and must not exceed the size of the
186underlying storage device. Additionally, when formatting the vdo device, a
187slab size is chosen: the physical device size may never increase above the
188size which provides 8192 slabs, and each increase must be large enough to
189add at least one new slab.
190
191Examples:
192
193Start a previously-formatted vdo volume with 1 GB logical space and 1 GB
194physical space, storing to /dev/dm-1 which has more than 1 GB of space.
195
196::
197
198	dmsetup create vdo0 --table \
199	"0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380"
200
201Grow the logical size to 4 GB.
202
203::
204
205	dmsetup reload vdo0 --table \
206	"0 8388608 vdo V4 /dev/dm-1 262144 4096 32768 16380"
207	dmsetup resume vdo0
208
209Grow the physical size to 2 GB.
210
211::
212
213	dmsetup reload vdo0 --table \
214	"0 8388608 vdo V4 /dev/dm-1 524288 4096 32768 16380"
215	dmsetup resume vdo0
216
217Grow the physical size by 1 GB more and increase max discard sectors.
218
219::
220
221	dmsetup reload vdo0 --table \
222	"0 10485760 vdo V4 /dev/dm-1 786432 4096 32768 16380 maxDiscard 8"
223	dmsetup resume vdo0
224
225Stop the vdo volume.
226
227::
228
229	dmsetup remove vdo0
230
231Start the vdo volume again. Note that the logical and physical device sizes
232must still match, but other parameters can change.
233
234::
235
236	dmsetup create vdo1 --table \
237	"0 10485760 vdo V4 /dev/dm-1 786432 512 65550 5000 hash 1 logical 3 physical 2"
238
239Messages
240--------
241All vdo devices accept messages in the form:
242
243::
244
245        dmsetup message <target-name> 0 <message-name> <message-parameters>
246
247The messages are:
248
249        stats:
250		Outputs the current view of the vdo statistics. Mostly used
251		by the vdostats userspace program to interpret the output
252		buffer.
253
254	config:
255		Outputs useful vdo configuration information. Mostly used
256		by users who want to recreate a similar VDO volume and
257		want to know the creation configuration used.
258
259	dump:
260		Dumps many internal structures to the system log. This is
261		not always safe to run, so it should only be used to debug
262		a hung vdo. Optional parameters to specify structures to
263		dump are:
264
265			viopool: The pool of I/O requests incoming bios
266			pools: A synonym of 'viopool'
267			vdo: Most of the structures managing on-disk data
268			queues: Basic information about each vdo thread
269			threads: A synonym of 'queues'
270			default: Equivalent to 'queues vdo'
271			all: All of the above.
272
273        dump-on-shutdown:
274		Perform a default dump next time vdo shuts down.
275
276
277Status
278------
279
280::
281
282    <device> <operating mode> <in recovery> <index state>
283    <compression state> <physical blocks used> <total physical blocks>
284
285	device:
286		The name of the vdo volume.
287
288	operating mode:
289		The current operating mode of the vdo volume; values may be
290		'normal', 'recovering' (the volume has detected an issue
291		with its metadata and is attempting to repair itself), and
292		'read-only' (an error has occurred that forces the vdo
293		volume to only support read operations and not writes).
294
295	in recovery:
296		Whether the vdo volume is currently in recovery mode;
297		values may be 'recovering' or '-' which indicates not
298		recovering.
299
300	index state:
301		The current state of the deduplication index in the vdo
302		volume; values may be 'closed', 'closing', 'error',
303		'offline', 'online', 'opening', and 'unknown'.
304
305	compression state:
306		The current state of compression in the vdo volume; values
307		may be 'offline' and 'online'.
308
309	used physical blocks:
310		The number of physical blocks in use by the vdo volume.
311
312	total physical blocks:
313		The total number of physical blocks the vdo volume may use;
314		the difference between this value and the
315		<used physical blocks> is the number of blocks the vdo
316		volume has left before being full.
317
318Memory Requirements
319===================
320
321A vdo target requires a fixed 38 MB of RAM along with the following amounts
322that scale with the target:
323
324- 1.15 MB of RAM for each 1 MB of configured block map cache size. The
325  block map cache requires a minimum of 150 MB.
326- 1.6 MB of RAM for each 1 TB of logical space.
327- 268 MB of RAM for each 1 TB of physical storage managed by the volume.
328
329The deduplication index requires additional memory which scales with the
330size of the deduplication window. For dense indexes, the index requires 1
331GB of RAM per 1 TB of window. For sparse indexes, the index requires 1 GB
332of RAM per 10 TB of window. The index configuration is set when the target
333is formatted and may not be modified.
334
335Module Parameters
336=================
337
338The vdo driver has a numeric parameter 'log_level' which controls the
339verbosity of logging from the driver. The default setting is 6
340(LOGLEVEL_INFO and more severe messages).
341
342Run-time Usage
343==============
344
345When using dm-vdo, it is important to be aware of the ways in which its
346behavior differs from other storage targets.
347
348- There is no guarantee that over-writes of existing blocks will succeed.
349  Because the underlying storage may be multiply referenced, over-writing
350  an existing block generally requires a vdo to have a free block
351  available.
352
353- When blocks are no longer in use, sending a discard request for those
354  blocks lets the vdo release references for those blocks. If the vdo is
355  thinly provisioned, discarding unused blocks is essential to prevent the
356  target from running out of space. However, due to the sharing of
357  duplicate blocks, no discard request for any given logical block is
358  guaranteed to reclaim space.
359
360- Assuming the underlying storage properly implements flush requests, vdo
361  is resilient against crashes, however, unflushed writes may or may not
362  persist after a crash.
363
364- Each write to a vdo target entails a significant amount of processing.
365  However, much of the work is paralellizable. Therefore, vdo targets
366  achieve better throughput at higher I/O depths, and can support up 2048
367  requests in parallel.
368
369Tuning
370======
371
372The vdo device has many options, and it can be difficult to make optimal
373choices without perfect knowledge of the workload. Additionally, most
374configuration options must be set when a vdo target is started, and cannot
375be changed without shutting it down completely; the configuration cannot be
376changed while the target is active. Ideally, tuning with simulated
377workloads should be performed before deploying vdo in production
378environments.
379
380The most important value to adjust is the block map cache size. In order to
381service a request for any logical address, a vdo must load the portion of
382the block map which holds the relevant mapping. These mappings are cached.
383Performance will suffer when the working set does not fit in the cache. By
384default, a vdo allocates 128 MB of metadata cache in RAM to support
385efficient access to 100 GB of logical space at a time. It should be scaled
386up proportionally for larger working sets.
387
388The logical and physical thread counts should also be adjusted. A logical
389thread controls a disjoint section of the block map, so additional logical
390threads increase parallelism and can increase throughput. Physical threads
391control a disjoint section of the data blocks, so additional physical
392threads can also increase throughput. However, excess threads can waste
393resources and increase contention.
394
395Bio submission threads control the parallelism involved in sending I/O to
396the underlying storage; fewer threads mean there is more opportunity to
397reorder I/O requests for performance benefit, but also that each I/O
398request has to wait longer before being submitted.
399
400Bio acknowledgment threads are used for finishing I/O requests. This is
401done on dedicated threads since the amount of work required to execute a
402bio's callback can not be controlled by the vdo itself. Usually one thread
403is sufficient but additional threads may be beneficial, particularly when
404bios have CPU-heavy callbacks.
405
406CPU threads are used for hashing and for compression; in workloads with
407compression enabled, more threads may result in higher throughput.
408
409Hash threads are used to sort active requests by hash and determine whether
410they should deduplicate; the most CPU intensive actions done by these
411threads are comparison of 4096-byte data blocks. In most cases, a single
412hash thread is sufficient.
413