Lines Matching full:that
22 exploration is needed to discover, is that it is complex. There are
23 many rules, special cases, and implementation alternatives that all
26 tool that we will make extensive use of is "divide and conquer". For
41 of elements: "slashes" that are sequences of one or more "``/``"
42 characters, and "components" that are sequences of one or more
43 non-"``/``" characters. These form two kinds of paths. Those that
52 component, but that isn't always accurate: a pathname can lack both
62 it must identify a directory that already exists, otherwise an error
68 pathname that is just slashes have a final component. If it does
75 tempting to consider that to have an empty final component. In many
76 ways that would lead to correct results, but not always. In
81 A pathname that contains at least one non-<slash> character and
82 that ends with one or more trailing <slash> characters shall not
85 directory entry that is to be created for a directory immediately
91 checking that the trailing slash is not used where it isn't
96 changes that affect that lookup. One fairly extreme case is that if
98 "a/b/..", that process might successfully resolve on "a/c".
102 "dcache" and an understanding of that is central to understanding
112 contains further information about the object in that parent with
113 the given name. The inode pointer can be ``NULL`` indicating that the
115 dentry of a directory to the dentries of the children, that linkage is
119 that will be particularly relevant is that it is closely integrated
120 with the mount table that records which filesystem is mounted where.
127 Some filesystems ensure that the information in the dcache is always
130 without checking with the filesystem, and means that the VFS can
134 Other filesystems don't provide that guarantee because they cannot.
135 These are typically filesystems that are shared across a network,
150 you ignore all the places that only run when "``LOOKUP_RCU``"
168 reference count. The special-sauce of this primitive is that the
172 Holding a reference on a dentry ensures that the dentry won't suddenly
199 ``d_lock`` is a synonym for the spinlock that is part of ``d_lockref`` above.
206 each candidate dentry that it finds in the hash table and then checks
207 that the parent and name are correct. So it doesn't lock the parent
222 accessing that slot in a hash table, and searching the linked list
223 that is found there.
228 happened to be looking at a dentry that was moved in this way,
234 ``rename_lock`` is a seqlock that is updated whenever any dentry is
235 renamed. If ``d_lookup`` finds that a rename happened while it
249 ``i_rwsem`` is a read/write semaphore that serializes all changes to a particular
250 directory. This ensures that, for example, an ``unlink()`` and a ``rename()``
252 stable while the filesystem is asked to look up a name that is not
256 This has a complementary role to that of ``d_lock``: ``i_rwsem`` on a
257 directory protects all of the names in that directory, while ``d_lock``
268 falls back to ``lookup_slow()`` which takes a shared lock on ``i_rwsem``, checks again that
275 that the required exclusion can be achieved. How path lookup chooses
280 name that is not yet in the dcache - the shared lock on ``i_rwsem`` will
293 If a matching dentry was found in the primary hash table then that is
294 returned and the caller can know that it lost a race with some other
298 knows that it has won any race and now is responsible for asking the
303 added to the primary hash table already. Note that a ``struct
310 ``DCACHE_PAR_LOOKUP`` to be cleared, using a wait_queue that was passed
311 to the instance of ``d_alloc_parallel()`` that won the race and that
314 has, the dentry is returned and the caller just sees that it lost any
316 likely explanation is that some other dentry was added instead using
325 Per-CPU here means that incrementing the count is cheap as it only
330 ``mnt_count`` doesn't ensure that the mount remains in the namespace and,
332 does, however, ensure that the ``mount`` data structure remains coherent,
344 crossing a mount point to check that the crossing was safe. That is,
345 the value in the seqlock is read, then the code finds the mount that
383 all the way back to `First Edition Unix`_ - of the function that
402 that is the "next" component in the pathname.
414 filesystem. Often that reference won't be needed, so this field is
416 is requested. Keeping a reference in the ``nameidata`` ensures that
420 It should be noted that in the case of ``LOOKUP_IN_ROOT`` or
432 escape that subtree. It works a bit like a local ``chroot()``.
438 Given a path (``name``) and a nameidata structure (``nd``), check that the
440 over one component while updating ``last_type`` and ``last``. If that
448 filesystem to revalidate the result if it is that sort of filesystem.
449 If that doesn't get a good result, it calls "``lookup_slow()``" which
465 seem obvious, but is worth pointing out so that we will recognize its
473 not call ``walk_component()`` that last time. Handling that final
491 It is worth noting that when flag ``LOOKUP_MOUNTPOINT`` is set,
492 path_lookupat() will unset LOOKUP_JUMPED in nameidata so that in the
494 This is important when unmounting a filesystem that is inaccessible, such as
506 the possibility that the final component is not ``LAST_NORM``. If the
510 won't try to create that name. They also check for trailing slashes
521 On filesystems that require it, the lookup routines will call the
522 ``->d_revalidate()`` dentry method to ensure that the cached information
524 from a server. In some cases it may find that there has been change
525 further up the path and that something that was thought to be valid
532 lookup a name can trigger changes to how that lookup should be
540 to three different flags that might be set in ``dentry->d_flags``:
545 If this flag has been set, then the filesystem has requested that the
550 unmounted, the ``d_manage()`` function will usually wait for that
557 processing. That server process can identify itself to the ``autofs``
564 This flag is set on every dentry that is mounted on. As Linux
565 supports multiple filesystem namespaces, it is possible that the
583 report that there was an error, that there was nothing to mount, or
589 There is no new locking of import here and it is important that no
603 We noted that REF-walk is complex because there are numerous details
615 thread from changing the data structures that a given thread is
618 same time, this can be very costly. Even when using locks that permit
621 goal when reading a shared data structure that no other process is
631 other parts it is important that RCU-walk can quickly fall back to
638 notices that something has changed or is changing, or if something
643 ``vfsmount`` and ``dentry``, and ensuring that these are still valid -
644 that a path walk with REF-walk would have found the same entries.
645 This is an invariant that RCU-walk must guarantee. It can only make
646 decisions, such as selecting the next step, that are decisions which
653 This pattern of "try RCU-walk, if that fails try REF-walk" can be
661 that fails with the error ``ECHILD`` they are called again with no
664 ``LOOKUP_RCU``) to ensure that entries found in the cache are forcibly
666 determines that they are too old to trust.
668 The ``LOOKUP_RCU`` attempt may drop that flag internally and switch to
670 that trip up RCU-walk are much more likely to be near the leaves and
671 so it is very unlikely that there will be much, if any, benefit from
678 ``rcu_read_lock()`` is held for the entire time that RCU-walk is walking
679 down a path. The particular guarantee it provides is that the key
684 is the only guarantee that RCU provides; everything else is done using
696 To preserve the invariant mentioned above (that RCU-walk may only make
697 decisions that REF-walk could have made), it must make the checks at
698 or near the same places that REF-walk holds the references. So, when
705 However, there is a little bit more to seqlocks than that. If
710 use ``read_seqcount_retry()`` to validate that copy.
713 imposes a memory barrier so that no memory-read instruction from
725 sufficient to catch any problem that could occur at this point.
727 With that little refresher on seqlocks out of the way we can look at
734 ensure that crossing a mount point is performed safely. RCU-walk uses
735 it for that too, but for quite a bit more.
744 that any "mount" or "unmount" happens.
754 If RCU-walk finds that ``mount_lock`` hasn't changed then it can be sure
755 that, had REF-walk taken counted references on each vfsmount, the
777 check if we have landed on a mount point and, if so, must find that
780 starting point of the path lookup was in part of the filesystem that
791 ``lookup_fast()`` is the only lookup routine that is used in RCU-mode,
793 ``lookup_fast()`` that we find the important "hand over hand" tracking
803 getting a counted reference to the new dentry before dropping that for
809 A semaphore is a fairly heavyweight lock that can only be taken when it is
812 take ``i_rwsem`` and modifies the directory in a way that RCU-walk needs
813 to notice, the result will be either that RCU-walk fails to find the
814 dentry that it is looking for, or it will find a dentry which
822 something that actually is there. When RCU-walk fails to find
831 That "dropping down to REF-walk" typically involves a call to
844 Other reasons for dropping out of RCU-walk that do not trigger a call
845 to ``unlazy_walk()`` are when some inconsistency is found that cannot be
852 takes a reference on each of the pointers that it holds (vfsmount,
853 dentry, and possibly some symbolic links) and then verifies that the
859 incrementing a counter. That works to take a second reference if you
868 ``mount_lock`` is then used to validate the reference. If that
869 validation fails, it may *not* be safe to just drop that reference in
872 finds that the reference it got might not be safe, checks the
889 In this case an extra "``MAY_NOT_BLOCK``" flag is passed so that it
913 the big picture, there are a couple of related patterns that are worth
916 The first is "try quickly and check, if that fails try slowly". We
917 can see that in the high-level approach of first trying RCU-walk and
923 The second pattern is "try quickly and check, if that fails try
930 "try quickly *and carefully*, then check". The fact that checking is
931 needed is a reminder that the system is dynamic and only a limited
940 There are several basic issues that we will examine to understand the
950 There are only two sorts of filesystem objects that can usefully
958 a component name refers to a symbolic link, then that component is
959 replaced by the body of the link and, if that body starts with a '/',
996 a further limit of eight on the maximum depth of recursion, but that was
1000 The ``nameidata`` structure that we met in an earlier article contains a
1001 small stack that can be used to store the remaining part of up to two
1004 lookup will never exceed that stack as, once the 40th symlink is
1007 It might seem that the name remnants are all that needs to be stored on
1008 this stack, but we need a bit more. To see that, we need to move on to
1017 able to find and temporarily hold onto these cached entries, so that
1029 pathname in a symlink can be seen as the content of that symlink and
1033 that the filesystem will allocate some temporary memory and copy or
1034 construct the symlink content into that memory whenever it is needed.
1038 on the dentry. This means that the mechanisms that pathname lookup
1046 on an inode does not imply any reference on cached pages of that
1047 inode, and even an ``rcu_read_lock()`` is not sufficient to ensure that
1050 significantly, needs to release that reference when it is finished
1055 but that isn't necessarily a big cost and it is better than dropping
1056 out of RCU-walk mode completely. Even filesystems that allocate
1067 looked at previously, ``->get_link()`` would need to be careful that
1073 do_delayed_call() to invoke that callback function with the argument.
1084 This means that each entry in the symlink stack needs to hold five
1091 Note that, in a given stack frame, the path remnant (``name``) is not
1092 part of the symlink that the other fields refer to. It is the remnant
1093 to be followed once that symlink has been fully parsed.
1101 symlink, or is restored from the stack, so that much of the loop
1108 Providing that operation is successful, the old path ``name`` is placed on the
1117 the symlink-just-found to avoid leaving empty path remnants that would
1122 ``walk_component()`` is also the last piece of code that needs to look at the
1123 old symlink as it walks that last component. So it is quite
1128 which indicates that it is yet too early to release the
1129 current symlink, and ``WALK_TRAILING`` which indicates that it is on the final
1145 so ``NULL`` is returned to indicate that the symlink can be released and
1148 The other case involves things in ``/proc`` that look like symlinks but
1155 something that looks like a symlink. It is really a reference to the
1157 objects you get a name that might refer to the same file - unless it
1161 ``nameidata`` in place to point to that target. ``->get_link()`` then
1171 For some callers, this is all they need; they want to create that
1174 apply special handling to the last component of that symlink, rather
1177 successive symlinks until one is found that doesn't point to another
1181 path_lookupat(), path_openat() using a loop that calls link_path_walk(),
1183 lookup_last(). If it is a symlink that needs to be followed,
1185 return the path so that the loop repeats, calling
1189 Of the various functions that examine the final component,
1223 open process continues on the symlink that was found.
1228 We previously said of RCU-walk that it would "take no locks, increment
1229 no counts, leave no footprints." We have since seen that some
1235 footprints in a way that doesn't affect directories is in updating access times.
1242 update the atime on that symlink.
1247 subject. The `clearest statement`_ is that, if a particular implementation
1249 documented "except that any changes caused by pathname resolution need
1250 not be documented". This seems to imply that POSIX doesn't really
1255 An examination of history shows that prior to `Linux 1.3.87`_, the ext2
1257 Unfortunately we have no record of why that behavior was changed.
1259 In any case, access time must now be updated and that operation can be
1264 limits the updates of ``atime`` to once per day on files that aren't
1279 the various flags that can be stored in the ``nameidata`` to guide the
1296 ``LOOKUP_PARENT`` indicates that the final component hasn't been reached
1300 ``ND_ROOT_PRESET`` indicates that the ``root`` field in the ``nameidata`` was
1304 ``ND_JUMPED`` means that the current dentry was chosen not because
1327 ensure that they return errors from ``nd_jump_link()``, because that is how
1331 bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the
1353 considered. Others are only checked for when considering that final
1356 ``LOOKUP_AUTOMOUNT`` ensures that, if the final component is an automount
1367 ``WALK_GET`` that we already met, but it is used in a different way.
1369 ``LOOKUP_DIRECTORY`` insists that the final component is a directory.
1377 if it knows that it will be asked to open or create the file soon.
1386 than even a couple of releases ago. But that doesn't mean it is
1388 symlinks that are stored in the inode so, while it handles many ext4
1389 symlinks, it doesn't help with NFS, XFS, or Btrfs. That support