Summary
Enhance networking support in libvirt
Owner
- Name: Laine Stump
- Email: laine at redhat dot com
Current status
- Targeted release: Fedora 16
- Last updated: 2011-07-08
- Percentage of completion: 90%
Detailed Description
Background
This document discusses two improvements to libvirt's networking support:
1. Logical Abstraction Between Guest Config and Host Config
Currently, the details of how a libvirt-managed virtual guest is connected to the network are all contained in the <interface> element(s) of the guest's domain configuration, which is an XML document. This is very flexible, allowing several different types of connection (virtual network, host bridge, direct macvtap connection to physical interface, qemu usermode, user-defined via an external script), but currently has the problem that unnecessary details of the virtualization host resources are embedded into the guest's config (the most obvious item being the name of the physical interface or bridge on the host that is being used for the connection); if the guest is migrated to a different host, and that host has a different hardware or network config (or possibly the same hardware, but that hardware is currently in use by a different guest), the migration will fail. This makes the requirements for migration very rigid.
Another problem with this system is that a large host may have multiple physical network interfaces (or SRIOV virtual functions) which need to be shared among the guests, and when a guest starts up, the physical interface that it previously used may already be in use by a different guest (some modes of direct/macvtap connection allow only a single guest at a time to use each physical device).
2. Transactional Host Network Configuration Changes
It is often necessary to change the network configuration of virtualization hosts, especially during initial deployment. libvirt offers an API (and a series of shell commands) to enable making these changes remotely via libvirt. However, an incorrect change to the networking may leave the host unreachable, and getting the network back to a usable state may be difficult or even impossible.
Implementation
1. Logical Network Abstraction
All configuration information dealing with details of the host physical hardware will be optionally moved to an expanded libvirt network definition (created/managed vir libvirt's virNetwork API (a.k.a. "virsh net-*"), which can also include a pool of physical interfaces for use by guests connecting via that network. Guest <interface> definitions will reference these networks rather than referencing the physical hardware directly. This way, a guest interface can be defined to connect via "network X", and as the guest moves from one host to another, it will simply look up the specifics in each host's "network X" definition, and setup the connection appropriately.
2. Transactional Host Network Configuration Changes
Three new API functions will be provided as a part of the existing virInterface*() API in libvirt: virInterfaceChangeBegin, virInterfaceChangeCommit, and virInterfaceChangeRollback. These functions are really just a frontend for similar new functions in the netcf library (as are all of the virInterface* functions in libvirt. The parallel functions in netcf likewise call through to a new initscript (installed as part of netcf) that does the following: 1) for "change begin" the current state of all interface configuration related files in /etc/sysconfig/network-scripts is copied into a "snapshot" directory, 2) for "change commit", this snapshot directory is deleted, or 3) for "change rollback", the newly created config files are removed, and replaced with the ones from the snapshot directory (netcf adds the functionality of calling ifup for deleted interfaces that are added back as part of the rollback, ifdown for new interfaces that are removed, and ifdown/ifup for existing interfaces that are changed). The same initscript is run at boot time - if it sees that interface configuration changes have been made, and they haven't yet been committed, it assumes that these changes made the host unreachable, and performs an automatic rollback.
Benefit to Fedora
Deployment and provisioning of large virt installations (i.e. with multiple hosts and migrating guests) will be made simpler.
Scope
As described above, changes are required in libvirt, as well as netcf (for item 2 only). Eventually the tools using libvirt should be updated to take advantage of the new features, but that is not immediately necessary, nor within the scope of this work.
TODO
- for item 2 (Transactional Interface Configuration Changes), when a rollback is done, netcf doesn't yet bounce the interfaces that need it (ifup/ifdown). This doesn't require any API changes, only some additional behind-the-scences code in netcf.
Completed
- All of item 1 (Logical Abstraction) is code complete, has gone through basic testing, and is waiting on upstream review to be pushed into upstream libvirt. Barring unexpected problems in review or testing, it will be in libvirt-0.9.4, which will be released at the end of July.
- The only piece of item 2 (Transaction Interface Configuration Changes) that isn't completed is outlined in the TODO section above. The rest is completed, tested, in an upstream release of both libvirt (0.9.2) and netcf (0.1.8), and both of these are already included in Rawhide.
How To Test
1. Logical Network Abstraction
1) Start with an existing <domain> configuration that connects to the network via a host bridge:
<domain> ... <interface type='bridge'> <source bridge='br0'/> ... </interface> ...
1a) define a new network called 'br0-net':
<network> <name>test-bridge</name> <forward mode='bridge'/> <bridge name='br0' /> </network>
1b) modify the <domain> configuration to reference this network:
<domain> ... <interface type='network'> <source network='test-bridge'/> ... </interface> ...
1c) net-start test-bridge, then stop and re-start the guest domain, and verify that it still has connectivity
---
2) Start with an existing <domain> configuration that connects to the network directly via a physical interface (i.e. "macvtap" mode):
<domain> ... <interface type='direct'> <source dev='eth0' mode='private'/> ... </interface> ...
2a) define a new network called 'eth0-net':
<network> <name>test-direct</name> <forward mode='private'> <interface dev='eth0'/> </forward> ... </network>
2b) modify the <domain> configuration to reference this network:
<domain> ... <interface type='network'> <network name='test-direct'/> ... </interface> ...
2c) net-start test-direct, then stop and re-start the guest domain, and verify that it still has connectivity
2. Transactional Host Network Configuration Changes
1. # virsh iface-begin 2. (remove an interface with virsh iface-undefine, modify one with iface-edit, add a new one with iface-define) 3. # virsh iface-rollback (or reboot the machine) 4. (verify that the original interface configuration is restored).
---
Perform another test similar to the previous, except make changes that won't result in a non-working machine, then run "virsh iface-commit" and verify that the new interface config remains in effect, even after a system reboot of the host.
User Experience
See the previous section.
Dependencies
None, outside of the implementation efforts detailed above (i.e. libvirt and netcf).
Contingency Plan
Administrators can continue configuring their host and guest networking as they did in the past.
Documentation
Logical Network Abstraction:
Transactional Interface Configuration Change:
- patch and description of netcf initscript
- patch and description of netcf API
- libvirt patch series
- Upstream Bugzilla record
Release Notes
This version of libvirt adds the ability to remove details of virt host physical hardware from guest configuration, making it simpler to migrate hosts from one guest to another, or to operate in a large scale environment where there are a large number of physical interfaces (or SRIOV virtual functions).
Part 2. of the functionality allows safely changing virt host interface configuration without fear of making an incorrect change that will leave the host unusable.