admin-guide/pm/cpuidle.rst

1 .. SPDX-License-Identifier: GPL-2.0
8 CPU Idle Time Management
27 CPU idle time management is an energy-efficiency feature concerned about using
31 ------------
33 CPU idle time management operates on CPUs as seen by the *CPU scheduler* (that
37 software as individual single-core processors.  In other words, a CPU is an
43 program) at a time, it is a CPU.  In that case, if the hardware is asked to
46 Second, if the processor is multi-core, each core in it is able to follow at
47 least one program at a time.  The cores need not be entirely independent of each
48 other (for example, they may share caches), but still most of the time they
50 one program, those programs run mostly independently of each other at the same
51 time.  The entire cores are CPUs in that case and if the hardware is asked to
61 Finally, each core in a multi-core processor may be able to follow more than one
62 program in the same time frame (that is, each core may be able to fetch
63 instructions from multiple locations in memory and execute them in the same time
66 multiple individual single-core "processors", referred to as *hardware threads*
67 (or hyper-threads specifically on Intel hardware), that each can follow one
69 time management perspective and if the processor is asked to enter an idle state
78 ---------
81 *idle* by the Linux kernel when there are no tasks to run on them except for the
87 processor every time the task's code is run by a CPU.  The CPU scheduler
88 distributes work by assigning tasks to run to the CPUs present in the system.
91 no specific conditions preventing their code from being run by a CPU as long as
94 assigns it to one of the available CPUs to run and if there are no more runnable
95 tasks assigned to it, the CPU will load the given task's context and run its
98 simultaneously, they will be subject to prioritization and time sharing in order
99 to allow them to make some progress over time.]
103 in Linux idle CPUs run the code of the "idle" task called *the idle loop*.  That
106 idle states, or there is not enough time to spend in an idle state before the
109 useless instructions in a loop until it is assigned a new task to run.
112 .. _idle-loop:
119 idle time management subsystem called ``CPUIdle`` to select an idle state for
127 the platform or the processor architecture and organized in a one-dimensional
130 time.  This allows ``CPUIdle`` governors to be independent of the underlying
131 hardware and to work with any platforms that the Linux kernel can run on.
134 taken into account by the governor, the *target residency* and the (worst-case)
135 *exit latency*.  The target residency is the minimum time the hardware must
136 spend in the given state, including the time needed to enter it (which may be
140 latency, in turn, is the maximum time it will take a CPU asking the processor
143 the time needed to enter the given state in case the wakeup occurs when the
148 First of all, the governor knows the time until the closest timer event.  That
149 time is known exactly, because the kernel programs timers and it knows exactly
150 when they will trigger, and it is the maximum time the hardware that the given
151 CPU depends on can spend in an idle state, including the time necessary to enter
152 and exit it.  However, the CPU may be woken up by a non-timer event at any time
154 when that may happen.  The governor can only see how much time the CPU actually
155 was idle after it has been woken up (that time will be referred to as the *idle
157 time until the closest timer to estimate the idle duration in future.  How the
162 There are four ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_,
165 tick can be `stopped by the idle loop <idle-cpus-and-tick_>`_.  Available
178 driver chosen at the system initialization time cannot be replaced later, so the
186 .. _idle-cpus-and-tick:
192 the time sharing strategy of the CPU scheduler.  Of course, if there are
193 multiple runnable tasks assigned to one CPU at the same time, the only way to
194 allow them to make reasonable progress in a given time frame is to make them
195 share the available CPU time.  Namely, in rough approximation, each task is
196 given a slice of the CPU time to run its code, subject to the scheduling class,
197 prioritization and so on and when that time slice is used up, the CPU should be
203 The scheduler tick is problematic from the CPU idle time management perspective,
213 CPUs, because (by definition) they have no tasks to run except for the special
215 of the CPU time on them is the idle loop.  Since the time of an idle CPU need
223 (non-tick) timer due to trigger within the tick range, stopping the tick clearly
224 would be a waste of time, even though the timer hardware may not need to be
225 reprogrammed in that case.  Second, if the governor is expecting a non-timer
228 the target residency within the time until the expected wakeup, so that state is
232 waste of time and in this case the timer hardware would need to be reprogrammed,
234 does not occur any time soon, the hardware may spend indefinite amount of time
247 loop altogether.  That can be done through the build-time configuration of it
253 The systems that run kernels configured to allow the scheduler tick to be
255 generally regarded as more energy-efficient than the systems running kernels in
261 .. _menu-gov:
272 It first obtains the time until the closest timer event with the assumption
273 that the scheduler tick will be stopped.  That time, referred to as the *sleep
274 length* in what follows, is the upper bound on the time before the next CPU
294 values and, when predicting the idle duration next time, it computes the average
310 idle state is comparable with the predicted idle duration, the total time spent
319 from the power management quality of service, or `PM QoS <cpu-pm-qos_>`_,
331 if it has not decided to `stop the scheduler tick <idle-cpus-and-tick_>`_.  That
335 the real time until the closest timer event and if it really is greater than
336 that time, the governor may need to select a shallower state with a suitable
340 .. _teo-gov:
347 <menu-gov_>`_: it always tries to find the deepest idle state suitable for the
350 .. kernel-doc:: drivers/cpuidle/governors/teo.c
351    :doc: teo-description
353 .. _idle-states-representation:
358 For the CPU idle time management purposes all of the physical idle states
359 supported by the processor have to be represented as a one-dimensional array of
365 of it <idle-loop_>`_, must reflect the properties of the idle state at the
378 idle state "X" must reflect the minimum time to spend in idle state "MX" of
379 the module (including the time needed to enter it), because that is the minimum
380 time the CPU needs to be idle to save any energy in case the hardware enters
382 the exit time of idle state "MX" of the module (and usually its entry time too),
383 because that is the maximum delay between a wakeup signal and the time the CPU
401 parameters describing the idle state and a pointer to the function to run in
410 CPU at the initialization time.  That directory contains a set of subdirectories
450 ``time``
451 	Total time spent in this idle state by the given CPU (as measured by the
479 CPUs in the system at the same time.  Writing 1 to it causes the idle state to
492 The number in the :file:`time` file generally may be greater than the total time
496 enter any idle state at all).  The kernel can only measure the time span between
503 much time has been spent by the hardware in different idle states supported by
512 .. _cpu-pm-qos:
519 energy-efficiency features of the kernel to prevent performance from dropping
522 CPU idle time management can be affected by PM QoS in two ways, through the
528 signed 32-bit integer) to it.  In turn, the resume latency constraint for a CPU
530 32-bit integer) to the :file:`power/pm_qos_resume_latency_us` file under
532 ``<N>`` is allocated at the system initialization time.  Negative values
579 CPU in question every time the list of requests is updated this way or another
582 CPU idle time governors are expected to regard the minimum of the global
593 `disabled for individual CPUs <idle-states-representation_>`_, there are kernel
594 command line parameters affecting CPU idle time management.
597 CPU idle time management entirely.  It does not prevent the idle loop from
598 running on idle CPUs, but it prevents the CPU idle time governors and drivers
604 however, so it is rather crude and not very energy-efficient.  For this reason,
614 The other kernel command line parameters controlling CPU idle time management
619 options related to CPU idle time management: ``idle=poll``, ``idle=halt``,
633 P-states (see |cpufreq|) that require any number of CPUs in a package to be
634 idle, so it very well may hurt single-thread computations performance as well as
635 energy-efficiency.  Thus using it for performance reasons may not be a good idea
646 In addition to the architecture-level kernel command line options affecting CPU
647 idle time management, there are parameters affecting individual ``CPUIdle``
652 `Representation of Idle States <idle-states-representation_>`_), causes the