Documentation/filesystems/idmappings.rst

1 .. SPDX-License-Identifier: GPL-2.0
12 ------------
16 in userspace is::
20 ``u`` indicates the first element in the upper idmapset ``U`` and ``k``
21 indicates the first element in the lower idmapset ``K``. The ``r`` parameter
24 we're talking about an id in the upper or lower idmapset.
26 To see what this looks like in practice, let's take the following idmapping::
32  u22 -> k10000
33  u23 -> k10001
34  u24 -> k10002
36 From a mathematical viewpoint ``U`` and ``K`` are well-ordered sets and an
38 order isomorphic. In fact, ``U`` and ``K`` are always well-ordered subsets of
45  k10000 -> u22
46  k10001 -> u23
47  k10002 -> u24
59 ``u1000`` from the upper idmapset down to ``k11000`` in the lower idmapset.
62 what id ``k11000`` corresponds to in the second or third idmapping. The
78 contain ``u1000`` in the upper idmapset ``U``. This is equivalent to not having
79 an id mapped. We can simply say that ``u1000`` is unmapped in the second and
80 third idmapping. The kernel will report unmapped ids as the overflowuid
81 ``(uid_t)-1`` or overflowgid ``(gid_t)-1`` to userspace.
88 - If we want to map from left to right::
91    id - u + k = n
93 - If we want to map from right to left::
96    id - k + u = n
107 Assume we are given ``k21000`` in the lower idmapset of the first idmapping. We
108 want to know what id this was mapped from in the upper idmapset of the first
109 idmapping. So we're mapping up in the first idmapping::
111  id     - k      + u  = n
112  k21000 - k20000 + u0 = u1000
114 Now assume we are given the id ``u1100`` in the upper idmapset of the second
115 idmapping and we want to know what this id maps down to in the lower idmapset
116 of the second idmapping. This means we're mapping down in the second
119  id    - u    + k      = n
120  u1100 - u500 + k30000 = k30600
123 -------------
125 In the context of the kernel an idmapping can be interpreted as mapping a range
126 of userspace ids into a range of kernel ids::
128  userspace-id:kernel-id:range
130 A userspace id is always an element in the upper idmapset of an idmapping of
131 type ``uid_t`` or ``gid_t`` and a kernel id is always an element in the lower
134 types and "kernel id" will be used to refer to ``kuid_t`` and ``kgid_t``.
136 The kernel is mostly concerned with kernel ids. They are used when performing
137 permission checks and are stored in an inode's ``i_uid`` and ``i_gid`` field.
139 kernel, or is passed by userspace to the kernel, or a raw device id that is
142 Note that we are only concerned with idmappings as the kernel stores them not
146 all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So
149 For example, within this idmapping, the id ``u1000`` is an id in the upper
151 ``k11000`` which is a kernel id in the lower idmapset or "kernel idmapset"
154 A kernel id is always created by an idmapping. Such idmappings are associated
168 Other user namespaces usually have non-identity idmappings such as::
174 immediately translated into a kernel id according to the idmapping associated
180 - If a filesystem were to be mounted in the initial user namespaces (as most
186 - If a filesystem were to be mounted with an idmapping of ``u0:k10000:r10000``
191 ----------------------
199 This translation algorithm is used by the kernel in quite a few places. For
203 If we've been given ``k11000`` from one idmapping we can map that id up in
204 another idmapping. In order for this to work both idmappings need to contain
205 the same kernel id in their kernel idmapsets. For example, consider the
211 and we are mapping ``u1000`` down to ``k11000`` in the first idmapping . We can
212 then translate ``k11000`` into a userspace id in the second idmapping using the
213 kernel idmapset of the second idmapping::
215  /* Map the kernel id up into a userspace id in the second idmapping. */
218 Note, how we can get back to the kernel id in the first idmapping by inverting
221  /* Map the userspace id down into a kernel id in the second idmapping. */
224  /* Map the kernel id up into a userspace id in the first idmapping. */
228 kernel id corresponds to in a given idmapping. In order to be able to answer
229 this question both idmappings need to contain the same kernel id in their
230 respective kernel idmapsets.
232 For example, when the kernel reads a raw userspace id from disk it maps it down
233 into a kernel id according to the idmapping associated with the filesystem.
236 means ``u1000`` will be mapped to ``k21000`` which is what will be stored in
239 When someone in userspace calls ``stat()`` or a related function to get
240 ownership information about the file the kernel can't simply map the id back up
244 So the kernel will map the id back up in the idmapping of the caller. Let's
252 It is possible to translate a kernel id from one idmapping to another one via
254 a kernel id.
261 and we are given ``k11000`` in the first idmapping. In order to translate this
262 kernel id in the first idmapping into a kernel id in the second idmapping we
265 1. Map the kernel id up into a userspace id in the first idmapping::
267     /* Map the kernel id up into a userspace id in the first idmapping. */
270 2. Map the userspace id down into a kernel id in the second idmapping::
272     /* Map the userspace id down into a kernel id in the second idmapping. */
275 As you can see we used the userspace idmapset in both idmappings to translate
276 the kernel id in one idmapping to a kernel id in another idmapping.
278 This allows us to answer the question what kernel id we would need to use to
279 get the same userspace id in another idmapping. In order to be able to answer
280 this question both idmappings need to contain the same userspace id in their
283 Note, how we can easily get back to the kernel id in the first idmapping by
286 1. Map the kernel id up into a userspace id in the second idmapping::
288     /* Map the kernel id up into a userspace id in the second idmapping. */
291 2. Map the userspace id down into a kernel id in the first idmapping::
293     /* Map the userspace id down into a kernel id in the first idmapping. */
298 userspace id mapped. This will come in handy when working with idmapped mounts.
303 It is never valid to use an id in the kernel idmapset of one idmapping as the
304 id in the userspace idmapset of another or the same idmapping. While the kernel
305 idmapset always indicates an idmapset in the kernel id space the userspace
308  /* Map the userspace id down into a kernel id in the first idmapping. */
311  /* INVALID: Map the kernel id down into a kernel id in the second idmapping. */
317  /* Map the kernel id up into a userspace id in the first idmapping. */
320  /* INVALID: Map the userspace id up into a userspace id in the second idmapping. */
324 Since userspace ids have type ``uid_t`` and ``gid_t`` and kernel ids have type
329 -------------------------------------------
331 The concepts of mapping an id down or mapping an id up are expressed in the two
332 kernel functions filesystem developers are rather familiar with and which we've
333 already used in this document::
335  /* Map the userspace id down into a kernel id. */
338  /* Map the kernel id up into a userspace id. */
346 objects in is readable and writable for everyone.
353 get lost in too many details.
355 When the caller enters the kernel two things happen:
357 1. Map the caller's userspace ids down into kernel ids in the caller's
359    (To be precise, the kernel will simply look at the kernel ids stashed in the
361    translation happens just in time.)
362 2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
366 the kernel id back up into a userspace id when writing to disk.
367 So with the second step the kernel guarantees that a valid userspace id can be
368 written to disk. If it can't the kernel will refuse the creation request to not
372 crossmapping algorithm we mentioned above in a previous section. First, the
373 kernel maps the caller's userspace id down into a kernel id according to the
374 caller's idmapping and then maps that kernel id up according to the
380     - caller's idmapping (usually taken from ``current_user_ns()``)
381     - filesystem's idmapping (``sb->s_user_ns``)
382     - mount's idmapping (``mnt_idmap(vfsmnt)``)
400 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
404 2. Verify that the caller's kernel ids can be mapped to userspace ids in the
407    For this second step the kernel will call the function
413 In this example both idmappings are the same so there's nothing exciting going
425 1. Map the caller's userspace ids down into kernel ids in the caller's
430 2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
433     from_kuid(u0:k20000:r10000, k11000) = u-1
436 successfully mapped down into kernel ids in the caller's idmapping the kernel
438 kernel will deny this creation request.
441 mounted with non-initial idmappings this is a general problem as we can see in
453 1. Map the caller's userspace ids down into kernel ids in the caller's
458 2. Verify that the caller's kernel ids can be mapped up to userspace ids in the
465 the kernel id that was created in the caller's idmapping. This has mainly two
471 filesystems and not very flexible. But this is a use-case that is pretty
472 important in containerized workloads.
476 kernel ids map up into valid userspace ids in the caller's idmapping
478 1. Map raw userspace ids down to kernel ids in the filesystem's idmapping::
482 2. Map kernel ids up to userspace ids in the caller's idmapping::
484     from_kuid(u0:k10000:r10000, k1000) = u-1
495 In order to report ownership to userspace the kernel uses the crossmapping
496 algorithm introduced in a previous section:
498 1. Map the userspace id on disk down into a kernel id in the filesystem's
503 2. Map the kernel id up into a userspace id in the caller's idmapping::
505     from_kuid(u0:k10000:r10000, k1000) = u-1
507 The crossmapping algorithm fails in this case because the kernel id in the
508 filesystem idmapping cannot be mapped up to a userspace id in the caller's
509 idmapping. Thus, the kernel will report the ownership of this file as the
521 In order to report ownership to userspace the kernel uses the crossmapping
522 algorithm introduced in a previous section:
524 1. Map the userspace id on disk down into a kernel id in the filesystem's
529 2. Map the kernel id up into a userspace id in the caller's idmapping::
531     from_kuid(u0:k10000:r10000, k21000) = u-1
533 Again, the crossmapping algorithm fails in this case because the kernel id in
534 the filesystem idmapping cannot be mapped to a userspace id in the caller's
535 idmapping. Thus, the kernel will report the ownership of this file as the
538 Note how in the last two examples things would be simple if the caller would be
543 1. Map the userspace id on disk down into a kernel id in the filesystem's
548 2. Map the kernel id up into a userspace id in the caller's idmapping::
553 -----------------------------
555 The examples we've seen in the previous section where the caller's idmapping
559 each other, an administrator may often use different non-overlapping idmappings
566 An administrator wanting to provide easy read-write access to the following set
585 This would still leave ``dir`` rather useless to the second container. In fact,
593 on their machine at home and all files in their home directory will usually be
598 In both cases changing ownership recursively has grave implications. The most
599 obvious one is that ownership is changed globally and permanently. In the home
600 directory case this change in ownership would even need to happen every time the
606 change in ownership is tied to the lifetime of the filesystem mount, i.e. the
608 filesystem and mount it again in another user namespace. This is usually
624 Idmapped mounts make it possible to change ownership in a temporary and
633 privileged users in the initial user namespace.
645 conceptual distinctions should almost always be clearly expressed in the code.
662  uid_t <--> kuid_t <--> vfsuid_t
663  gid_t <--> kgid_t <--> vfsgid_t
666 e.g., during ``stat()``, or store ownership information in a shared VFS object
694 Similar to how we prefix all userspace ids in this document with ``u`` and all
695 kernel ids with ``k`` we will prefix all VFS ids with ``v``. So a mount
704 - ``i_uid_into_vfsuid()`` and ``i_gid_into_vfsgid()``
706   The ``i_*id_into_vfs*id()`` functions translate filesystem's kernel ids into
707   VFS ids in the mount's idmapping::
709    /* Map the filesystem's kernel id up into a userspace id in the filesystem's idmapping. */
712    /* Map the filesystem's userspace id down ito a VFS id in the mount's idmapping. */
715 - ``mapped_fsuid()`` and ``mapped_fsgid()``
717   The ``mapped_fs*id()`` functions translate the caller's kernel ids into
718   kernel ids in the filesystem's idmapping. This translation is achieved by
721    /* Map the caller's VFS id up into a userspace id in the mount's idmapping. */
724    /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
727 - ``vfsuid_into_kuid()`` and ``vfsgid_into_kgid()``
739 to ``k21000`` according to its idmapping. This is what is stored in the
742 When the caller queries the ownership of this file via ``stat()`` the kernel
744 kernel id up to a userspace id in the caller's idmapping.
746 But when the caller is accessing the file on an idmapped mount the kernel will
747 first call ``i_uid_into_vfsuid()`` thereby translating the filesystem's kernel
748 id into a VFS id in the mount's idmapping::
751    /* Map the filesystem's kernel id up into a userspace id. */
754    /* Map the filesystem's userspace id down into a VFS id in the mount's idmapping. */
757 Finally, when the kernel reports the owner to the caller it will turn the
758 VFS id in the mount's idmapping into a userspace id in the caller's
767 The kernel maps this to ``k11000`` in the caller's idmapping. Usually the
768 kernel would now apply the crossmapping, verifying that ``k11000`` can be
769 mapped to a userspace id in the filesystem's idmapping. Since ``k11000`` can't
770 be mapped up in the filesystem's idmapping directly this creation request
773 But when the caller is accessing the file on an idmapped mount the kernel will
774 first call ``mapped_fs*id()`` thereby translating the caller's kernel id into
778     /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */
781     /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */
784 When finally writing to disk the kernel will then map ``v21000`` up into a
785 userspace id in the filesystem's idmapping::
794 Let's now briefly reconsider the failing examples from earlier in the context
807 When the caller is using a non-initial idmapping the common case is to attach
810 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
814 2. Translate the caller's VFS id into a kernel id in the filesystem's
818       /* Map the VFS id up into a userspace id in the mount's idmapping. */
821       /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
824 3. Verify that the caller's kernel ids can be mapped to userspace ids in the
843 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
847 2. Translate the caller's VFS id into a kernel id in the filesystem's
851        /* Map the VFS id up into a userspace id in the mount's idmapping. */
854        /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
857 3. Verify that the caller's kernel ids can be mapped to userspace ids in the
874 In order to report ownership to userspace the kernel now does three steps using
877 1. Map the userspace id on disk down into a kernel id in the filesystem's
882 2. Translate the kernel id into a VFS id in the mount's idmapping::
885       /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
888       /* Map the userspace id down into a VFS id in the mounts's idmapping. */
891 3. Map the VFS id up into a userspace id in the caller's idmapping::
896 Earlier, the caller's kernel id couldn't be crossmapped in the filesystems's
897 idmapping. With the idmapped mount in place it now can be crossmapped into the
911 Again, in order to report ownership to userspace the kernel now does three
914 1. Map the userspace id on disk down into a kernel id in the filesystem's
919 2. Translate the kernel id into a VFS id in the mount's idmapping::
922       /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
925       /* Map the userspace id down into a VFS id in the mounts's idmapping. */
928 3. Map the VFS id up into a userspace id in the caller's idmapping::
933 Earlier, the file's kernel id couldn't be crossmapped in the filesystems's
934 idmapping. With the idmapped mount in place it now can be crossmapped into the
942 idmappings when either the caller, the filesystem or both uses a non-initial
944 a non-initial idmapping. This mostly happens in the context of containerized
946 mounted with the initial idmapping and filesystems mounted with non-initial
947 idmappings, access to the filesystem isn't working because the kernel ids can't
956 and files on a per-mount basis.
959 storage. At home they have id ``u1000`` and all files in their home directory
973 plugs in their portable storage at their work station they can setup a job that
975 when they create a file the kernel performs the following steps we already know
983 1. Map the caller's userspace ids into kernel ids in the caller's idmapping::
987 2. Translate the caller's VFS id into a kernel id in the filesystem's
991       /* Map the VFS id up into a userspace id in the mount's idmapping. */
994       /* Map the userspace id down into a kernel id in the filesystem's idmapping. */
997 3. Verify that the caller's filesystem ids can be mapped to userspace ids in the
1014 1. Map the userspace id on disk down into a kernel id in the filesystem's
1019 2. Translate the kernel id into a VFS id in the mount's idmapping::
1022       /* Map the kernel id up into a userspace id in the filesystem's idmapping. */
1025       /* Map the userspace id down into a VFS id in the mounts's idmapping. */
1028 3. Map the VFS id up into a userspace id in the caller's idmapping::
1034 which is the caller's userspace id on their workstation in our example.