From Fedora Project Wiki
(add some links)
(Update the pci_addr= description based on discussion upstream)
Line 42: Line 42:
A <code>pci_addr=</code> command line option would be added to QEMU allowing each device's PCI address can be explicitly specified.
A <code>pci_addr=</code> command line option would be added to QEMU allowing each device's PCI address can be explicitly specified.


In order to supply these addresses on the QEMU command line, libvirt would initially start the guest with no addresses specified and use the <code>info pci</code> to determine what address was allocated to each device. Those addresses would then be stored in the guest's XML configuration and be supplied on the command line when the guest is subsequently re-started. It is likely the <code>info pci</code> output would have to be improved to allow libvirt to match the devices it requested with the devices in the output.
In order to supply these addresses on the QEMU command line, libvirt would either:


# initially start the guest with no addresses specified and use the an improved <code>info pci</code> format to determine what address was allocated to each device, or
# somehow query qemu for what addresses are available and sequentially allocate addresses to each device
The addresses would then be stored in the guest's XML configuration and be supplied on the command line when the guest is started.
Pros:
Pros:



Revision as of 16:30, 15 June 2009

KVM Stable PCI Addresses

Summary

Allow devices in KVM guest virtual machines to retain the same PCI address allocations as other devices are added or removed from the guest configuration.

This is particularily important for Windows guests in order to prevent warnings or reactivation when device addresses change.

Owner

Current status

  • Targeted release: Fedora 12
  • Last updated: 2009-06-15
  • Percentage of completion: 0%

TODO

  • Settle on one of the proposed solutions upstream
  • Implement the qemu side
  • Implement the libvirt side
  • Testing

Completed

  • None

Detailed Description

QEMU allocates PCI addresses to devices (roughly) in the order devices are supplied on the command line. Built-in PCI devices - like the IDE, USB and VGA controllers - are allocated first.

Windows will warn users when a device's PCI address is changed and may even require the Windows install to be reactivated. In order to prevent this, we should do what we can to ensure that the device does not move PCI between slots as devices are added or removed to/from the guest configuration.

A related problem is that of guest ABI changes between versions of QEMU. That is, updating to a newer version of QEMU may cause devices to change subtly (e.g. PCI class of a device or additional capabilities) which again may require Windows to be reactivated. See also Features/KVM Stable Guest ABI.

A number of solutions are being discussed upstream.

Solution 1 :: pci_addr= Command Line Option

A pci_addr= command line option would be added to QEMU allowing each device's PCI address can be explicitly specified.

In order to supply these addresses on the QEMU command line, libvirt would either:

  1. initially start the guest with no addresses specified and use the an improved info pci format to determine what address was allocated to each device, or
  2. somehow query qemu for what addresses are available and sequentially allocate addresses to each device

The addresses would then be stored in the guest's XML configuration and be supplied on the command line when the guest is started. Pros:

  1. Simplest option from QEMU's point of view
  2. The libvirt changes would be relatively straightforward
  3. Possible for this to be in place for Fedora 12

Cons:

  1. info pci needs to be made parseable
  2. Only solves one aspect of the guest ABI problem

Solution 2 :: pci_add Monitor Command

Same as the first solution, except using the pci_add monitor command to specify PCI addresses instead of a pci_addr= command line argument.

The idea is that qemu would be started using the -S option and the required devices added using pci_add before issuing the continue command.

The issue of how libvirt would parse the output of info pci would remain.

Pros:

  1. Requires minimal changes upstream - probably just making info pci easier to parse
  2. Possible for this to be in place for Fedora 12

Cons:

  1. Solution only under consideration because other solutions might not be agreed upstream in time
  2. info pci needs to be made parseable
  3. libvirt would have to learn this new method of starting guests
  4. Only solves one aspect of the guest ABI problem

Solution 3 :: Machine Description File

A machine description file containing a device tree could be supplied to QEMU which, amongst other things, would specify the PCI addresses of devices.

libvirt would initially start the guest with a minimal (i.e. without PCI addresses) machine description file and then use a monitor command to obtain the entire tree (i.e. with PCI addresses). The dumped tree would then be retained (probably embedded in the guest XML config) and used when the guest is re-started in future.

In order to use this machine description file, libvirt would need to be able to parse and modify the full tree obtained from the monitor command. For example, if a device was removed from the guest configuration, libvirt would need to remove it from the machine description before re-starting the guest.

What may not be obvious here is that libvirt currently does not need to detect when guest configuration has changed. It merely maps the guest configuration to a qemu command line. If it switches to supplying a machine description file, it would need to iterate over the guest XML and add, remove or modify devices in the device tree.

Pros:

  1. Solves the guest ABI issue more generally
  2. Progress is being made upstream in QEMU towards this solution

Cons:

  1. Significant work still required in QEMU
  2. Very complex solution for libvirt
  3. Highly unlikely to be in place for Fedora 12

Solution 4 :: Compat Hints File

A monitor command (e.g. saveabi or savevm --abi) would be added which would export a device tree containing enough information for QEMU to configure devices in exactly the same way as when the tree was exported. A simple implementation might just contain a version number and PCI address for each device.

The exported file would be opaque from the point of view of qemu users. It would merely serve as compat hints which augment the supplied guest configuration. For example, if a device was removed from the guest configuration but remained in the compat hints file, then QEMU would merely ignore the hints for that device.

libvirt would export a new compat hints file each time it starts a QEMU guest or hotplugs a device. The compat hints would then be stored (probably embedded in the guest XML configuration) and be passed to QEMU via the command line when the guest is re-started.

Pros:

  1. Solves the guest ABI issue more generally
  2. Straightforward solution for libvirt

Cons:

  1. Upstream not keen on adding both compat hints and machine description formats
  2. Quite unlikely to be ready for Fedora 12

Benefit to Fedora

This feature would remove a significant issue with Fedora's virtualization support of Windows.

Scope

Each of the proposed solutions requires significant work in both QEMU and libvirt. No other Fedora packages would be affected.

How To Test

  1. Start a guest with two PCI NICs
  2. Note the PCI slot numbers of each NIC
  3. Remove the first NIC and re-start the guest
  4. Check that the slot number of the second NIC hasn't changed

User Experience

It should be possible to add/remove devices without causing Window's guests to require reactivation.

Dependencies

The main dependency is agreeing upstream on the QEMU part of the solution.

Contingency Plan

No contingency plan required; if one of the solutions is not implemented, the feature will not be available and nothing else will be affected.

Documentation

Release Notes

KVM guests in Fedora 12 now have stable PCI addresses, reducing the chance that Windows guests will require reactivation as guest configuration is modified.

Comments and Discussion