Lines Matching full:the
7 An overview of concepts and the Linux kernel's interfaces related to PCI power
11 This document only covers the aspects of power management specific to PCI
12 devices. For general description of the kernel's interfaces related to device
31 devices into states in which they draw less power (low-power states) at the
35 completely inactive. However, when it is necessary to use the device once
36 again, it has to be put back into the "fully functional" state (full-power
37 state). This may happen when there are some data for the device to handle or
38 as a result of an external event requiring the device to be active, which may
39 be signaled by the device itself.
41 PCI devices may be put into low-power states in two ways, by using the device
42 capabilities introduced by the PCI Bus Power Management Interface Specification,
43 or with the help of platform firmware, such as an ACPI BIOS. In the first
44 approach, that is referred to as the native PCI power management (native PCI PM)
45 in what follows, the device power state is changed as a result of writing a
46 specific value into one of its standard configuration registers. The second
47 approach requires the platform firmware to provide special methods that may be
48 used by the kernel to change the device's power state.
50 Devices supporting the native PCI PM usually can generate wakeup signals called
51 Power Management Events (PMEs) to let the kernel know about external events
52 requiring the device to be active. After receiving a PME the kernel is supposed
53 to put the device that sent it into the full-power state. However, the PCI Bus
55 delivering the PME from the device to the CPU and the operating system kernel.
56 It is assumed that the platform firmware will perform this task and therefore,
58 prepare the platform firmware for notifying the CPU of the PMEs coming from the
61 In turn, if the methods provided by the platform firmware are used for changing
62 the power state of a device, usually the platform also provides a method for
63 preparing the device to generate wakeup signals. In that case, however, it
64 often also is necessary to prepare the device for generating PMEs using the
65 native PCI PM mechanism, because the method provided by the platform depends on
68 Thus in many situations both the native and the platform-based power management
69 mechanisms have to be used simultaneously to obtain the desired result.
74 The PCI Bus Power Management Interface Specification (PCI PM Spec) was
75 introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a
79 The implementation of the PCI PM Spec is optional for conventional PCI devices,
80 but it is mandatory for PCI Express devices. If a device supports the PCI PM
82 configuration space. This field is used to describe and control the standard
83 features related to the native PCI power management.
85 The PCI PM Spec defines 4 operating states for devices (D0-D3) and for buses
86 (B0-B3). The higher the number, the less power is drawn by the device or bus
87 in that state. However, the higher the number, the longer the latency for
88 the device or bus to return to the full-power state (D0 or B0, respectively).
90 There are two variants of the D3 state defined by the specification. The first
91 one is D3hot, referred to as the software accessible D3, because devices can be
92 programmed to go into it. The second one, D3cold, is the state that PCI devices
93 are in when the supply voltage (Vcc) is removed from them. It is not possible
95 interface for putting the bus the device is on into a state in which Vcc is
96 removed from all devices on the bus.
98 PCI bus power management, however, is not supported by the Linux kernel at the
101 Note that every PCI device can be in the full-power state (D0) or in D3cold,
102 regardless of whether or not it implements the PCI PM Spec. In addition to
103 that, if the PCI PM Spec is implemented by the device, it must support D3hot
104 as well as D0. The support for the D1 and D2 power states is optional.
106 PCI devices supporting the PCI PM Spec can be programmed to go to any of the
107 supported low-power states (except for D3cold). While in D1-D3hot the
108 standard configuration registers of the device must be accessible to software
109 (i.e. the device is required to respond to PCI configuration accesses), although
110 its I/O and memory spaces are then disabled. This allows the device to be
111 programmatically put into D0. Thus the kernel can switch the device back and
112 forth between D0 and the supported low-power states (except for D3cold) and the
113 possible power state transitions the device can undergo are the following:
127 The transition from D3cold to D0 occurs when the supply voltage is provided to
128 the device (i.e. power is restored). In that case the device returns to D0 with
129 a full power-on reset sequence and the power-on defaults are restored to the
132 PCI devices supporting the PCI PM Spec can be programmed to generate PMEs
134 of generating PMEs from all supported power states. In particular, the
135 capability of generating PMEs from D3cold is optional and depends on the
136 presence of additional voltage (3.3Vaux) allowing the device to remain
142 The platform firmware support for the power management of PCI devices is
143 system-specific. However, if the system in question is compliant with the
144 Advanced Configuration and Power Interface (ACPI) Specification, like the
146 management interfaces defined by the ACPI standard.
148 For this purpose the ACPI BIOS provides special functions called "control
149 methods" that may be executed by the kernel to perform specific tasks, such as
151 using special byte-code language called the ACPI Machine Language (AML) and
152 stored in the machine's BIOS. The kernel loads them from the BIOS and executes
153 them as needed using an AML interpreter that translates the AML byte code into
155 writer can provide the kernel with a means to perform actions depending
156 on the system design in a system-specific fashion.
160 to be defined separately for each device supposed to be handled with the help of
161 the platform. This means, in particular, that ACPI device control methods can
162 only be used to handle devices that the BIOS writer knew about in advance. The
165 The ACPI specification assumes that devices can be in one of four power states
166 labeled as D0, D1, D2, and D3 that roughly correspond to the native PCI PM
167 D0-D3 states (although the difference between D3hot and D3cold is not taken
169 set of power resources that have to be enabled for the device to be put into
171 with the help of their own control methods, _ON and _OFF, that have to be
174 To put a device into the ACPI power state Dx (where x is a number between 0 and
175 3 inclusive) the kernel is supposed to (1) enable the power resources required
176 by the device in this state using their _ON control methods and (2) execute the
177 _PSx control method defined for the device. In addition to that, if the device
179 wakeup signals from that state, the _DSW (or _PSW, replaced with _DSW by ACPI
181 resources that are not required by the device in the target power state and are
183 _OFF control methods). If the current power state of the device is D3, it can
186 However, quite often the power states of devices are changed during a
187 system-wide transition into a sleep state or back into the working state. ACPI
188 defines four system sleep states, S1, S2, S3, and S4, and denotes the system
189 working state as S0. In general, the target system sleep (or working) state
190 determines the highest power (lowest number) state the device can be put
191 into and the kernel is supposed to obtain this information by executing the
193 If the device is required to wake up the system from the target sleep state, the
194 lowest power (highest number) state it can be put into is also determined by the
195 target state of the system. The kernel is then supposed to use the device's
196 _SxW control method to obtain the number of that state. It also is supposed to
197 use the device's _PRW control method to learn which power resources need to be
198 enabled for the device to be able to generate wakeup signals.
204 a result of the execution of the _DSW (or _PSW) ACPI control method before
205 putting the device into a low-power state, have to be caught and handled as
206 appropriate. If they are sent while the system is in the working state
207 (ACPI S0), they should be translated into interrupts so that the kernel can
208 put the devices generating them into the full-power state and take care of the
209 events that triggered them. In turn, if they are sent while the system is
210 sleeping, they should cause the system's core logic to trigger wakeup.
214 from the system core logic generated in response to various events that need to
217 capable of signaling wakeup. The information on the connections between GPEs
218 and event sources is recorded in the system's ACPI BIOS from where it can be
219 read by the kernel.
221 If a PCI device known to the system's ACPI BIOS signals wakeup, the GPE
222 associated with it (if there is one) is triggered. The GPEs associated with PCI
223 bridges may also be triggered in response to a wakeup signal from one of the
224 devices below the bridge (this also is the case for root bridges) and, for
225 example, native PCI PMEs from devices unknown to the system's ACPI BIOS may be
228 A GPE may be triggered when the system is sleeping (i.e. when it is in one of
229 the ACPI S1-S4 states), in which case system wakeup is started by its core logic
230 (the device that was the source of the signal causing the system wakeup to occur
231 may be identified later). The GPEs used in such situations are referred to as
234 Usually, however, GPEs are also triggered when the system is in the working
235 state (ACPI S0) and in that case the system's core logic generates a System
236 Control Interrupt (SCI) to notify the kernel of the event. Then, the SCI
237 handler identifies the GPE that caused the interrupt to be generated which,
238 in turn, allows the kernel to identify the source of the event (that may be
239 a PCI device signaling wakeup). The GPEs used for notifying the kernel of
240 events occurring while the system is in the working state are referred to as
245 for PCI Express devices. Namely, the PCI Express Base Specification introduced
249 may be routed directly to the system's core logic), but for PCI Express devices
250 they are in-band messages that have to pass through the PCI Express hierarchy,
251 including the root port on the path from the device to the Root Complex. Thus
253 interrupt whenever it receives a PME message from one of the devices below it.
254 The PCI Express Requester ID of the device that sent the PME message is then
255 recorded in one of the root port's configuration registers from where it may be
256 read by the interrupt handler allowing the device to be identified. [PME
257 messages sent by PCI Express endpoints integrated with the Root Complex don't
261 In principle the native PCI Express PME signaling may also be used on ACPI-based
262 systems along with the GPEs, but to use it the kernel has to ask the system's
263 ACPI BIOS to release control of root port configuration registers. The ACPI
264 BIOS, however, is not required to allow the kernel to control these registers
265 and if it doesn't do that, the kernel must not modify their contents. Of course
266 the native PCI Express PME signaling cannot be used by the kernel in that case.
275 The PCI Subsystem participates in the power management of PCI devices in a
277 the device power management core (PM core) and PCI device drivers.
278 Specifically, the pm field of the PCI subsystem's struct bus_type object,
302 These callbacks are executed by the PM core in various situations related to
308 The structure representing a PCI device, struct pci_dev, contains several fields
314 int pm_cap; /* PM capability offset in the
327 They also indirectly use some fields of the struct device that is embedded in
333 The PCI subsystem's first task related to device power management is to
334 prepare the device for power management and initialize the fields of struct
338 The first of these functions checks if the device supports native PCI PM
339 and if that's the case the offset of its power management capability structure
340 in the configuration space is stored in the pm_cap field of the device's struct
341 pci_dev object. Next, the function checks which PCI low-power states are
342 supported by the device and from which low-power states the device can generate
343 native PCI PMEs. The power management fields of the device's struct pci_dev and
344 the struct device embedded in it are updated accordingly and the generation of
345 PMEs by the device is disabled.
347 The second function checks if the device can be prepared to signal wakeup with
348 the help of the platform firmware, such as the ACPI BIOS. If that is the case,
349 the function updates the wakeup fields in struct device embedded in the
350 device's struct pci_dev and uses the firmware-provided method to prevent the
353 At this point the device is ready for power management. For driverless devices,
355 during system-wide transitions to a sleep state and back to the working state.
360 The PCI subsystem plays a vital role in the runtime power management of PCI
361 devices. For this purpose it uses the general runtime power management
369 that are executed by the core runtime PM routines. It also implements the
371 in low-power states, which at the time of this writing works for both the native
372 PCI Express PME signaling and the ACPI GPE-based wakeup signaling described in
375 First, a PCI device is put into a low-power state, or suspended, with the help
377 pci_pm_runtime_suspend() to do the actual job. For this to work, the device's
379 run by pci_pm_runtime_suspend() as the first action. If the driver's callback
380 returns successfully, the device's standard configuration registers are saved,
381 the device is prepared to generate wakeup signals and, finally, it is put into
382 the target low-power state.
384 The low-power state to put the device into is the lowest-power (highest number)
385 state from which it can signal wakeup. The exact method of signaling wakeup is
386 system-dependent and is determined by the PCI subsystem on the basis of the
387 reported capabilities of the device and the platform firmware. To prepare the
388 device for signaling wakeup and put it into the selected low-power state, the
389 PCI subsystem can use the platform firmware as well as the device's native PCI
392 It is expected that the device driver's pm->runtime_suspend() callback will
393 not attempt to prepare the device for signaling wakeup or to put it into a
394 low-power state. The driver ought to leave these tasks to the PCI subsystem
395 that has all of the information necessary to perform them.
397 A suspended device is brought back into the "active" state, or resumed,
398 with the help of pm_request_resume() or pm_runtime_resume() which both call
399 pci_pm_runtime_resume() for PCI devices. Again, this only works if the device's
401 the driver's callback is executed, pci_pm_runtime_resume() brings the device
402 back into the full-power state, prevents it from signaling wakeup while in that
403 state and restores its standard configuration registers. Thus the driver's
404 callback need not worry about the PCI-specific aspects of the device resume.
407 situations. First, it may be called at the request of the device's driver, for
409 as a result of a wakeup signal from the device itself (this sometimes is
410 referred to as "remote wakeup"). Of course, for this purpose the wakeup signal
411 is handled in one of the ways described in Section 1 and finally converted into
412 a notification for the PCI subsystem after the source device has been
415 The pci_pm_runtime_idle() function, called for PCI devices by pm_runtime_idle()
416 and pm_request_idle(), executes the device driver's pm->runtime_idle()
418 present at all), suspends the device with the help of pm_runtime_suspend().
419 Sometimes pci_pm_runtime_idle() is called automatically by the PM core (for
420 example, it is called right after the device has just been resumed), in which
421 cases it is expected to suspend the device if that makes sense. Usually,
422 however, the PCI subsystem doesn't really know if the device really can be
423 suspended, so it lets the device's driver decide by running its
430 handled in a specific way and the PM core executes subsystem-level power
432 each phase involves executing the same subsystem-level callback for every device
433 belonging to the given subsystem before the next phase begins. These phases
439 When the system is going into a sleep state in which the contents of memory will
440 be preserved, such as one of the ACPI sleep states S1-S3, the phases are:
444 The following PCI bus type's callbacks, respectively, are used in these phases::
450 The pci_pm_prepare() routine first puts the device into the "fully functional"
451 state with the help of pm_runtime_resume(). Then, it executes the device
452 driver's pm->prepare() callback if defined (i.e. if the driver's struct
453 dev_pm_ops object is present and the prepare pointer in that object is valid).
455 The pci_pm_suspend() routine first checks if the device's driver implements
456 legacy PCI suspend routines (see Section 3), in which case the driver's legacy
458 the device's driver doesn't provide a struct dev_pm_ops object (containing
459 pointers to the driver's callbacks), pci_pm_default_suspend() is called, which
460 simply turns off the device's bus master capability and runs
461 pcibios_disable_device() to disable it, unless the device is a bridge (PCI
462 bridges are ignored by this routine). Next, the device driver's pm->suspend()
465 to the device if necessary.
467 Note that the suspend phase is carried out asynchronously for PCI devices, so
468 the pci_pm_suspend() callback may be executed in parallel for any pair of PCI
469 devices that don't depend on each other in a known way (i.e. none of the paths
470 in the device tree from the root bridge to a leaf device contains both of them).
472 The pci_pm_suspend_noirq() routine is executed after suspend_device_irqs() has
473 been called, which means that the device driver's interrupt handler won't be
474 invoked while this routine is running. It first checks if the device's driver
475 implements legacy PCI suspends routines (Section 3), in which case the legacy
476 late suspend routine is called and its result is returned (the standard
477 configuration registers of the device are saved if the driver's callback hasn't
478 done that). Second, if the device driver's struct dev_pm_ops object is not
479 present, the device's standard configuration registers are saved and the routine
480 returns success. Otherwise the device driver's pm->suspend_noirq() callback is
481 executed, if present, and its result is returned if it fails. Next, if the
482 device's standard configuration registers haven't been saved yet (one of the
484 saves them, prepares the device to signal wakeup (if necessary) and puts it into
487 The low-power state to put the device into is the lowest-power (highest number)
488 state from which it can signal wakeup while the system is in the target sleep
489 state. Just like in the runtime PM case described above, the mechanism of
490 signaling wakeup is system-dependent and determined by the PCI subsystem, which
491 is also responsible for preparing the device to signal wakeup from the system's
496 into low-power states. However, if one of the driver's suspend callbacks
497 (pm->suspend() or pm->suspend_noirq()) saves the device's standard configuration
498 registers, pci_pm_suspend_noirq() will assume that the device has been prepared
499 to signal wakeup and put into a low-power state by the driver (the driver is
500 then assumed to have used the helper functions provided by the PCI subsystem for
502 rare cases doing that in the driver may be the optimum approach.
507 When the system is undergoing a transition from a sleep state in which the
508 contents of memory have been preserved, such as one of the ACPI sleep states
509 S1-S3, into the working state (ACPI S0), the phases are:
513 The following PCI bus type's callbacks, respectively, are executed in these
520 The pci_pm_resume_noirq() routine first puts the device into the full-power
522 hardware quirks related to the device, if necessary. This is done
523 unconditionally, regardless of whether or not the device's driver implements
524 legacy PCI power management callbacks (this way all PCI devices are in the
526 when their interrupt handlers are invoked for the first time during resume,
527 which allows the kernel to avoid problems with the handling of shared interrupts
529 callbacks (see Section 3) are implemented by the device's driver, the legacy
530 early resume callback is executed and its result is returned. Otherwise, the
534 The pci_pm_resume() routine first checks if the device's standard configuration
535 registers have been restored and restores them if that's not the case (this
536 only is necessary in the error path during a failing suspend). Next, resume
537 hardware quirks related to the device are applied, if necessary, and if the
539 Section 3), the driver's legacy resume callback is executed and its result is
540 returned. Otherwise, the device's wakeup signaling mechanisms are blocked and
541 its driver's pm->resume() callback is executed, if defined (the callback's
544 The resume phase is carried out asynchronously for PCI devices, like the
546 on each other in a known way, the pci_pm_resume() routine may be executed for
547 the both of them in parallel.
549 The pci_pm_complete() routine only executes the device driver's pm->complete()
556 a system image to be created and written into a persistent storage medium. The
560 The freezing of devices is carried out after enough memory has been freed (at
561 the time of this writing the image creation requires at least 50% of system RAM
562 to be free) in the following three phases:
566 that correspond to the PCI bus type's callbacks::
572 This means that the prepare phase is exactly the same as for system suspend.
573 The other two phases, however, are different.
575 The pci_pm_freeze() routine is quite similar to pci_pm_suspend(), but it runs
576 the device driver's pm->freeze() callback, if defined, instead of pm->suspend(),
577 and it doesn't apply the suspend-related hardware quirks. It is executed
581 The pci_pm_freeze_noirq() routine, in turn, is similar to
582 pci_pm_suspend_noirq(), but it calls the device driver's pm->freeze_noirq()
583 routine instead of pm->suspend_noirq(). It also doesn't attempt to prepare the
585 the device's standard configuration registers if they haven't been saved by one
586 of the driver's callbacks.
588 Once the image has been created, it has to be saved. However, at this point all
590 I/O is obviously necessary for the image saving. Thus they have to be brought
591 back to the fully functional state and this is done in the following phases:
595 using the following PCI bus type's callbacks::
603 The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq().
604 It puts the device into the full power state and restores its standard
605 configuration registers. It also executes the device driver's pm->thaw_noirq()
608 The pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device
613 The complete phase is the same as for system resume.
615 After saving the image, devices need to be powered down before the system can
616 enter the target sleep state (ACPI S4 for ACPI-based systems). This is done in
621 where the prepare phase is exactly the same as for system suspend. The other
622 two phases are analogous to the suspend and suspend_noirq phases, respectively.
623 The PCI subsystem-level callbacks they correspond to::
629 although they don't attempt to save the device's standard configuration
635 System restore requires a hibernation image to be loaded into memory and the
636 pre-hibernation memory contents to be restored before the pre-hibernation system
639 As described in Documentation/driver-api/pm/devices.rst, the hibernation image
640 is loaded into memory by a fresh instance of the kernel, called the boot kernel,
641 which in turn is loaded and run by a boot loader in the usual way. After the
642 boot kernel has loaded the image, it needs to replace its own code and data with
643 the code and data of the "hibernated" kernel stored within the image, called the
645 the image during hibernation, in the
649 phases described above. However, the devices affected by these phases are only
650 those having drivers in the boot kernel; other devices will still be in whatever
651 state the boot loader left them.
653 Should the restoration of the pre-hibernation memory contents fail, the boot
654 kernel would go through the "thawing" procedure described above, using the
655 thaw_noirq, thaw, and complete phases (that will only affect the devices having
656 drivers in the boot kernel), and then continue running normally.
658 If the pre-hibernation memory contents are restored successfully, which is the
659 usual situation, control is passed to the image kernel, which then becomes
660 responsible for bringing the system back to the working state. To achieve this,
661 it must restore the devices' pre-hibernation functionality, which is done much
662 like waking up from the memory sleep state, although it involves different
667 The first two of these are analogous to the resume_noirq and resume phases
668 described above, respectively, and correspond to the following PCI subsystem
675 respectively, but they execute the device driver's pm->restore_noirq() and
678 The complete phase is carried out in exactly the same way as during system
689 executed by the PCI subsystem's power management routines described above and by
690 controlling the runtime power management of their devices.
692 At the time of this writing there are two ways to define power management
693 callbacks for a PCI device driver, the recommended one, based on using a
695 the "legacy" one, in which the .suspend() and .resume() callbacks from struct
696 pci_driver are used. The legacy approach, however, doesn't allow one to define
698 drivers. Therefore it is not covered by this document (refer to the source code
703 the PCI subsystem's PM routines in various circumstances. A pointer to the
704 driver's struct dev_pm_ops object has to be assigned to the driver.pm field in
705 its struct pci_driver object. Once that has happened, the "legacy" PM callbacks
708 The PM callbacks in struct dev_pm_ops are not mandatory and if they are not
709 defined (i.e. the respective fields of struct dev_pm_ops are unset) the PCI
710 subsystem will handle the device in a simplified default manner. If they are
711 defined, though, they are expected to behave as described in the following
717 The prepare() callback is executed during system suspend, during hibernation
722 This callback is only necessary if the driver's device has children that in
723 general may be registered at any time. In that case the role of the prepare()
724 callback is to prevent new children of the device from being registered until
725 one of the resume_noirq(), thaw_noirq(), or restore_noirq() callbacks is run.
727 In addition to that the prepare() callback may carry out some operations
728 preparing the device to be suspended, although it should not allocate memory
729 (if additional memory is required to suspend the device, it has to be
736 The suspend() callback is only executed during system suspend, after prepare()
737 callbacks have been executed for all devices in the system.
739 This callback is expected to quiesce the device and prepare it to be put into a
740 low-power state by the PCI subsystem. It is not required (in fact it even is
741 not recommended) that a PCI driver's suspend() callback save the standard
742 configuration registers of the device, prepare it for waking up the system, or
744 care of by the PCI subsystem, without the driver's participation.
748 pci_set_power_state() should be used to save the device's standard configuration
750 low-power state, respectively. Moreover, if the driver calls pci_save_state(),
751 the PCI subsystem will not execute either pci_prepare_to_sleep(), or
752 pci_set_power_state() for its device, so the driver is then responsible for
753 handling the device as appropriate.
755 While the suspend() callback is being executed, the driver's interrupt handler
756 can be invoked to handle an interrupt from the device, so all suspend-related
757 operations relying on the driver's ability to handle interrupts should be
763 The suspend_noirq() callback is only executed during system suspend, after
764 suspend() callbacks have been executed for all devices in the system and
765 after device interrupts have been disabled by the PM core.
767 The difference between suspend_noirq() and suspend() is that the driver's
775 The freeze() callback is hibernation-specific and is executed in two situations,
777 in preparation for the creation of a system image, and during restore,
778 after a system image has been loaded into memory from persistent storage and the
781 The role of this callback is analogous to the role of the suspend() callback
782 described above. In fact, they only need to be different in the rare cases when
783 the driver takes the responsibility for putting the device into a low-power
786 In that cases the freeze() callback should not prepare the device system wakeup
788 save the device's standard configuration registers using pci_save_state().
793 The freeze_noirq() callback is hibernation-specific. It is executed during
795 devices in preparation for the creation of a system image, and during restore,
798 after device interrupts have been disabled by the PM core.
800 The role of this callback is analogous to the role of the suspend_noirq()
804 The difference between freeze_noirq() and freeze() is analogous to the
810 The poweroff() callback is hibernation-specific. It is executed when the system
815 The role of this callback is analogous to the role of the suspend() and freeze()
816 callbacks described above, although it does not need to save the contents of
817 the device's registers. In particular, if the driver wants to put the device
818 into a low-power state itself instead of allowing the PCI subsystem to do that,
819 the poweroff() callback should use pci_prepare_to_sleep() and
820 pci_set_power_state() to prepare the device for system wakeup and to put it
821 into a low-power state, respectively, but it need not save the device's standard
827 The poweroff_noirq() callback is hibernation-specific. It is executed after
828 poweroff() callbacks have been executed for all devices in the system.
830 The role of this callback is analogous to the role of the suspend_noirq() and
831 freeze_noirq() callbacks described above, but it does not need to save the
832 contents of the device's registers.
834 The difference between poweroff_noirq() and poweroff() is analogous to the
840 The resume_noirq() callback is only executed during system resume, after the
841 PM core has enabled the non-boot CPUs. The driver's interrupt handler will not
843 operations that might race with the interrupt handler.
845 Since the PCI subsystem unconditionally puts all devices into the full power
846 state in the resume_noirq phase of system resume and restores their standard
854 The resume() callback is only executed during system resume, after
855 resume_noirq() callbacks have been executed for all devices in the system and
856 device interrupts have been enabled by the PM core.
858 This callback is responsible for restoring the pre-suspend configuration of the
859 device and bringing it back to the fully functional state. The device should be
865 The thaw_noirq() callback is hibernation-specific. It is executed after a
866 system image has been created and the non-boot CPUs have been enabled by the PM
867 core, in the thaw_noirq phase of hibernation. It also may be executed if the
869 after enabling the non-boot CPUs). The driver's interrupt handler will not be
872 The role of this callback is analogous to the role of resume_noirq(). The
874 freeze() and freeze_noirq(), so in general it does not need to modify the
875 contents of the device's registers.
880 The thaw() callback is hibernation-specific. It is executed after thaw_noirq()
881 callbacks have been executed for all devices in the system and after device
882 interrupts have been enabled by the PM core.
884 This callback is responsible for restoring the pre-freeze configuration of
885 the device, so that it will work in a usual way after thaw() has returned.
890 The restore_noirq() callback is hibernation-specific. It is executed in the
891 restore_noirq phase of hibernation, when the boot kernel has passed control to
892 the image kernel and the non-boot CPUs have been enabled by the image kernel's
895 This callback is analogous to resume_noirq() with the exception that it cannot
896 make any assumption on the previous state of the device, even if the BIOS (or
897 generally the platform firmware) is known to preserve that state over a
900 For the vast majority of PCI device drivers there is no difference between
906 The restore() callback is hibernation-specific. It is executed after
907 restore_noirq() callbacks have been executed for all devices in the system and
908 after the PM core has enabled device drivers' interrupt handlers to be invoked.
911 to resume_noirq(). Consequently, the difference between restore_noirq() and
912 restore() is analogous to the difference between resume_noirq() and resume().
914 For the vast majority of PCI device drivers there is no difference between
920 The complete() callback is executed in the following situations:
924 - during hibernation, before saving the system image, after thaw() callbacks
926 - during system restore, when the system is going back to its pre-hibernation
929 It also may be executed if the loading of a hibernation image into memory fails
931 devices that have drivers in the boot kernel).
933 This callback is entirely optional, although it may be necessary if the
939 The runtime_suspend() callback is specific to device runtime power management
940 (runtime PM). It is executed by the PM core's runtime PM framework when the
944 This callback is responsible for freezing the device and preparing it to be
945 put into a low-power state, but it must allow the PCI subsystem to perform all
946 of the PCI-specific actions necessary for suspending the device.
951 The runtime_resume() callback is specific to device runtime PM. It is executed
952 by the PM core's runtime PM framework when the device is about to be resumed
953 (i.e. put into the full-power state and programmed to process I/O normally) at
956 This callback is responsible for restoring the normal functionality of the
957 device after it has been put into the full-power state by the PCI subsystem.
958 The device is expected to be able to process I/O in the usual way after
964 The runtime_idle() callback is specific to device runtime PM. It is executed
965 by the PM core's runtime PM framework whenever it may be desirable to suspend
966 the device according to the PM core's information. In particular, it is
967 automatically executed right after runtime_resume() has returned in case the
968 resume of the device has happened as a result of a spurious event.
970 This callback is optional, but if it is not implemented or if it returns 0, the
971 PCI subsystem will call pm_runtime_suspend() for the device, which in turn will
972 cause the driver's runtime_suspend() callback to be executed.
977 Although in principle each of the callbacks described in the previous
979 point two or more members of struct dev_pm_ops to the same routine. There are
982 The DEFINE_SIMPLE_DEV_PM_OPS() declares a struct dev_pm_ops object with one
983 suspend routine pointed to by the .suspend(), .freeze(), and .poweroff()
984 members and one resume routine pointed to by the .resume(), .thaw(), and
985 .restore() members. The other function pointers in this struct dev_pm_ops are
988 The DEFINE_RUNTIME_DEV_PM_OPS() is similar to DEFINE_SIMPLE_DEV_PM_OPS(), but it
989 additionally sets the .runtime_resume() pointer to pm_runtime_force_resume()
990 and the .runtime_suspend() pointer to pm_runtime_force_suspend().
992 The SYSTEM_SLEEP_PM_OPS() can be used inside of a declaration of struct
993 dev_pm_ops to indicate that one suspend routine is to be pointed to by the
995 be pointed to by the .resume(), .thaw(), and .restore() members.
1000 The PM core allows device drivers to set flags that influence the handling of
1001 power management for the devices by the core itself and by middle layer code
1002 including the PCI bus type. The flags should be set once at the driver probe
1003 time with the help of the dev_pm_set_driver_flags() function and they should not
1006 The DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the
1008 if the device is in runtime suspend when the system suspend starts. That also
1009 affects all of the ancestors of the device, so this flag should only be used if
1012 The DPM_FLAG_SMART_PREPARE flag causes the PCI bus type to return a positive
1013 value from pci_pm_prepare() only if the ->prepare callback provided by the
1014 driver of the device returns a positive value. That allows the driver to opt
1015 out from using the direct-complete mechanism dynamically (whereas setting
1018 The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
1019 perspective the device can be safely left in runtime suspend during system
1021 to avoid resuming the device from runtime suspend unless there are PCI-specific
1023 pci_pm_poweroff_late/noirq() to return early if the device remains in runtime
1024 suspend during the "late" phase of the system-wide transition under way.
1025 Moreover, if the device is in runtime suspend in pci_pm_resume_noirq() or
1029 Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows its
1030 "noirq" and "early" resume callbacks to be skipped if the device can be left
1031 in suspend after a system-wide transition into the working state. This flag is
1032 taken into consideration by the PM core along with the power.may_skip_resume
1033 status bit of the device which is set by pci_pm_suspend_noirq() in certain
1034 situations. If the PM core determines that the driver's "noirq" and "early"
1035 resume callbacks should be skipped, the dev_pm_skip_resume() helper function
1037 pci_pm_resume_early() to return upfront without touching the device and
1038 executing the driver callbacks.
1044 are responsible for controlling the runtime power management (runtime PM) of
1047 The PCI device runtime PM is optional, but it is recommended that PCI device
1048 drivers implement it at least in the cases where there is a reliable way of
1049 verifying that the device is not used (like when the network cable is detached
1052 To support the PCI runtime PM the driver first needs to implement the
1054 the runtime_idle() callback to prevent the device from being suspended again
1055 every time right after the runtime_resume() callback has returned
1056 (alternatively, the runtime_suspend() callback will have to check if the
1057 device should really be suspended and return -EAGAIN if that is not the case).
1059 The runtime PM of PCI devices is enabled by default by the PCI core. PCI
1061 However, it is blocked by pci_pm_init() that runs the pm_runtime_forbid()
1062 helper function. In addition to that, the runtime PM usage counter of
1063 each PCI device is incremented by local_pci_probe() before executing the
1064 probe callback provided by the device's driver.
1066 If a PCI driver implements the runtime PM callbacks and intends to use the
1067 runtime PM framework provided by the PM core and the PCI subsystem, it needs
1068 to decrement the device's runtime PM usage counter in its probe callback
1069 function. If it doesn't do that, the counter will always be different from
1070 zero for the device and it will never be runtime-suspended. The simplest
1071 way to do that is by calling pm_runtime_put_noidle(), but if the driver
1074 just needs to call a function that decrements the devices usage counter
1075 from its probe routine to make runtime PM work for the device.
1077 It is important to remember that the driver's runtime_suspend() callback
1078 may be executed right after the usage counter has been decremented, because
1079 user space may already have caused the pm_runtime_allow() helper function
1080 unblocking the runtime PM of the device to run via sysfs, so the driver must
1083 The driver itself should not call pm_runtime_allow(), though. Instead, it
1085 do it via sysfs as stated above), but it must be prepared to handle the
1086 runtime PM of the device correctly as soon as pm_runtime_allow() is called
1087 (which may happen at any time, even before the driver is loaded).
1089 When the driver's remove callback runs, it has to balance the decrementation
1090 of the device's runtime PM usage counter at the probe time. For this reason,
1091 if it has decremented the counter in its probe callback, it must run
1092 pm_runtime_get_noresume() in its remove callback. [Since the core carries
1093 out a runtime resume of the device and bumps up the device's usage counter
1094 before running the driver's remove callback, the runtime PM of the device
1095 is effectively disabled for the duration of the remove execution and all
1096 runtime PM helper functions incrementing the device's usage counter are
1099 The runtime PM framework works by processing requests to suspend or resume
1102 by work items put into the power management workqueue, pm_wq. Although there
1104 queued by the PM core (for example, after processing a request to resume a
1105 device the PM core automatically queues a request to check if the device is
1107 requests for their devices. For this purpose they should use the runtime PM
1108 helper functions provided by the PM core, discussed in
1112 request into pm_wq. In the majority of cases this also is done by their
1113 drivers that use helper functions provided by the PM core for this purpose.
1115 For more information on the runtime PM of devices refer to