Lines Matching full:the
7 that are mapped directly into the VM's physical address space.
8 Guest device drivers can interact directly with the hardware
9 without intermediation by the host hypervisor. This approach
10 provides higher bandwidth access to the device with lower
11 latency, compared with devices that are virtualized by the
12 hypervisor. The device should appear to the guest just as it
14 to the Linux device drivers for the device.
24 and produces the same benefits by allowing a guest device
25 driver to interact directly with the hardware. See Hyper-V
36 it is operating, so the Linux device driver for the device can
37 be used unchanged, provided it uses the correct Linux kernel
39 with Linux. But the initial detection of the PCI device and
40 its integration with the Linux PCI subsystem must use Hyper-V
43 guests as VMBus devices via the standard VMBus "offer"
45 /sys/bus/vmbus/devices. The VMBus vPCI driver in Linux at
48 the normal PCI device data structures in Linux that would
49 exist if the PCI device were discovered via ACPI on a bare-
50 metal system. Once those data structures are set up, the
51 device also has a normal PCI identity in Linux, and the normal
52 Linux device driver for the vPCI device can function as if it
54 presented dynamically through the VMBus offer mechanism, they
55 do not appear in the Linux guest's ACPI tables. vPCI devices
57 the life of the VM, and not just during initial boot.
59 With this approach, the vPCI device is a VMBus device and a
60 PCI device at the same time. In response to the VMBus offer
61 message, the hv_pci_probe() function runs and establishes a
62 VMBus connection to the vPCI VSP on the Hyper-V host. That
63 connection has a single VMBus channel. The channel is used to
64 exchange messages with the vPCI VSP for the purpose of setting
65 up and configuring the vPCI device in Linux. Once the device
66 is fully configured in Linux as a PCI device, the VMBus
67 channel is used only if Linux changes the vCPU to be interrupted
68 in the guest, or if the vPCI device is removed from
69 the VM while the VM is running. The ongoing operation of the
70 device happens directly between the Linux device driver for
71 the device and the hardware, with VMBus and the VMBus channel
78 Linux guests due to differences in the overall structure of
79 the Linux PCI subsystem compared with Windows. Nonetheless,
80 with a bit of hackery in the Hyper-V virtual PCI driver for
81 Linux, the virtual PCI device is setup in Linux so that
82 generic Linux PCI subsystem code and the Linux driver for the
86 domain with a host bridge. The PCI domainID is derived from
87 bytes 4 and 5 of the instance GUID assigned to the VMBus vPCI
88 device. The Hyper-V host does not guarantee that these bytes
90 collisions. The collision resolution is intended to be stable
91 across reboots of the same VM so that the PCI domainIDs don't
92 change, as the domainID appears in the user space
96 config space for the device. This MMIO range is communicated
97 to the Hyper-V host over the VMBus channel as part of telling
98 the host that the device is ready to enter d0. See
99 hv_pci_enter_d0(). When the guest subsequently accesses this
100 MMIO range, the Hyper-V host intercepts the accesses and maps
101 them to the physical device PCI config space.
103 hv_pci_probe() also gets BAR information for the device from
104 the Hyper-V host, and uses this information to allocate MMIO
105 space for the BARs. That MMIO space is then setup to be
106 associated with the host bridge so that it works when generic
107 PCI subsystem code in Linux processes the BARs.
109 Finally, hv_pci_probe() creates the root PCI bus. At this
110 point the Hyper-V virtual PCI driver hackery is done, and the
111 normal Linux PCI machinery for scanning the root bus works to
112 detect the device, to perform driver matching, and to
113 initialize the driver and device.
118 guest VM at any time during the life of the VM. The removal
119 is instigated by an admin action taken on the Hyper-V host and
120 is not under the control of the guest OS.
122 A guest VM is notified of the removal by an unsolicited
123 "Eject" message sent from the host to the guest over the VMBus
124 channel associated with the vPCI device. Upon receipt of such
125 a message, the Hyper-V virtual PCI driver in Linux
127 shutdown and remove the device. When those calls are
129 Hyper-V over the VMBus channel indicating that the device has
131 message to the Linux guest, which the VMBus driver in Linux
132 processes by removing the VMBus identity for the device. Once
133 that processing is complete, all vestiges of the device having
134 been present are gone from the Linux kernel. The rescind
135 message also indicates to the guest that Hyper-V has stopped
136 providing support for the vPCI device in the guest. If the
138 would be an invalid reference. Hypercalls affecting the device
139 return errors, and any further messages sent in the VMBus
142 After sending the Eject message, Hyper-V allows the guest VM
143 60 seconds to cleanly shutdown the device and respond with
144 Ejection Complete before sending the VMBus rescind
145 message. If for any reason the Eject steps don't complete
146 within the allowed 60 seconds, the Hyper-V host forcibly
147 performs the rescind steps, which will likely result in
148 cascading errors in the guest because the device is now no
149 longer present from the guest standpoint and accessing the
153 during the guest VM lifecycle, proper synchronization in the
156 fully setup. The Hyper-V virtual PCI driver has been updated
157 several times over the years to fix race conditions when
160 See comments in the code.
164 The Hyper-V virtual PCI driver supports vPCI devices using
165 MSI, multi-MSI, or MSI-X. Assigning the guest vCPU that will
166 receive the interrupt for a particular MSI or MSI-X message is
167 complex because of the way the Linux setup of IRQs maps onto
168 the Hyper-V interfaces. For the single-MSI and MSI-X cases,
169 Linux calls hv_compse_msi_msg() twice, with the first call
170 containing a dummy vCPU and the second call containing the
172 (on x86) or the GICD registers are set (on arm64) to specify
173 the real vCPU again. Each of these three calls interact
175 receive the interrupt before it is forwarded to the guest VM.
176 Unfortunately, the Hyper-V decision-making process is a bit
177 limited, and can result in concentrating the physical
179 See details about how this is resolved in the extensive
180 comment above the function hv_compose_msi_req_get_cpu().
182 The Hyper-V virtual PCI driver implements the
184 Unfortunately, on Hyper-V the implementation requires sending
185 a VMBus message to the Hyper-V host and awaiting an interrupt
188 held, it doesn't work to do the normal sleep until awakened by
189 the interrupt. Instead hv_compose_msi_msg() must send the
190 VMBus message, and then poll for the completion message. As
191 further complexity, the vPCI device could be ejected/rescinded
192 while the polling is in progress, so this scenario must be
193 detected as well. See comments in the code regarding this
196 Most of the code in the Hyper-V virtual PCI driver (pci-
199 interrupt assignments are managed. On x86, the Hyper-V
200 virtual PCI driver in the guest must make a hypercall to tell
202 MSI/MSI-X interrupt, and the x86 interrupt vector number that
203 the x86_vector IRQ domain has picked for the interrupt. This
204 hypercall is made by hv_arch_irq_unmask(). On arm64, the
205 Hyper-V virtual PCI driver manages the allocation of an SPI
206 for each MSI/MSI-X interrupt. The Hyper-V virtual PCI driver
207 stores the allocated SPI in the architectural GICD registers,
212 The Hyper-V virtual PCI driver in Linux supports vPCI devices
213 whose drivers create managed or unmanaged Linux IRQs. If the
214 smp_affinity for an unmanaged IRQ is updated via the /proc/irq
215 interface, the Hyper-V virtual PCI driver is called to tell
216 the Hyper-V host to change the interrupt targeting and
217 everything works properly. However, on x86 if the x86_vector
219 running out of vectors on a CPU, there's no path to inform the
220 Hyper-V host of the change, and things break. Fortunately,
222 using all the vectors on a CPU doesn't happen. Since such a
228 By default, Hyper-V pins all guest VM memory in the host
229 when the VM is created, and programs the physical IOMMU to
230 allow the VM to have DMA access to all its memory. Hence
231 it is safe to assign PCI devices to the VM, and allow the
232 guest operating system to program the DMA transfers. The
234 DMA to memory belonging to the host or to other VMs on the
235 host. From the Linux guest standpoint, such DMA transfers
237 IOMMU in the guest.
241 required by the architecture. When running on arm64, the
243 non-cache-coherent devices, with the behavior of each device
244 specified in the ACPI DSDT. But when a PCI device is assigned
245 to a guest VM, that device does not appear in the DSDT, so the
247 from the VMBus node in the ACPI DSDT to all VMBus devices,
250 Current Hyper-V versions always indicate that the VMBus is
252 cache coherent and the CPU does not perform any sync
258 messages are passed over a VMBus channel between the Hyper-V
259 host and the Hyper-v vPCI driver in the Linux guest. Some
261 the guest and host must agree on the vPCI protocol version to
262 be used. The version is negotiated when communication over
263 the VMBus channel is first established. See
264 hv_pci_protocol_negotiation(). Newer versions of the protocol
266 additional information about the vPCI device, such as the
268 the underlying hardware.
272 When the vPCI protocol version provides it, the guest NUMA
273 node affinity of the vPCI device is stored as part of the Linux
274 device information for subsequent use by the Linux driver. See
275 hv_pci_assign_numa_node(). If the negotiated protocol version
276 does not support the host providing NUMA affinity information,
277 the Linux guest defaults the device NUMA node to 0. But even
278 when the negotiated protocol version includes NUMA affinity
279 information, the ability of the host to provide such
281 the guest receives NUMA node value "0", it could mean NUMA
283 Unfortunately it is not possible to distinguish the two cases
284 from the guest side.
289 standard set of functions provided by the Linux PCI subsystem.
292 in the Hyper-V virtual PCI driver. In normal VMs,
293 these hv_pcifront_*() functions directly access the PCI config
294 space, and the accesses trap to Hyper-V to be handled.
296 from reading the guest instruction stream to emulate the
297 access, so the hv_pcifront_*() functions must invoke
298 hypercalls with explicit arguments describing the access to be
303 The Hyper-V host and Hyper-V virtual PCI driver in Linux
305 path between the host and guest. The back-channel path uses
306 messages sent over the VMBus channel associated with the vPCI
307 device. The functions hyperv_read_cfg_blk() and
308 hyperv_write_cfg_blk() are the primary interfaces provided to
309 other parts of the Linux kernel. As of this writing, these
310 interfaces are used only by the Mellanox mlx5 driver to pass
311 diagnostic data to a Hyper-V host running in the Azure public
312 cloud. The functions hyperv_read_cfg_blk() and