From Fedora Project Wiki

Virtual Machine Lock Manager

Summary

The virtual machine lock manager is a daemon which will ensure that a virtual machine's disk image cannot be written to by two QEMU/KVM processes at the same time. It provides protection against starting the same virtual machine twice, or adding the same disk to two different virtual machines.

Owner

Current status

  • Targeted release: Fedora 16
  • Last updated: 21-09-2011
  • Percentage of completion: 100%

Detailed Description

Virtual machines running via the QEMU/KVM platform do not currently acquire any kind of lock when starting up. This means it is possible for the same virtual machine to be accidentally started more than once, or for the same disk image to be accidentally added to two different virtual machines. The result of such a mistake is likely to be catastrophic destruction of the virtual machines filesystem.

The virtual machine lock manager is a framework embedded in the libvirtd daemon that allows for pluggable locking mechanisms. The first available plugin introduced in F16, integrates with the 'sanlock' program. This will protect against adding the same disk to two different virtual machines, and protect against libvirtd bugs where it might "forget" about a previously running virtual machine. If the administrator mounts a suitable shared filesystem (eg, NFS) in /var/lib/libvirt/lockd then the lock manager protection will be extended to all hosts shared that filesystem.

Later Fedora releases will introduce alternative lock manager implementations.

Benefit to Fedora

Hosts running virtual machines for QEMU/KVM will have much stronger protection against administrator host/cluster configuration mistakes. This will reduce the risk that a virtual machines' disk image will become corrupted as a result.

Scope

The changes are confined to the libvirt and sanlock packages

- The new 'sanlock' RPM is introduced to Fedora
- The new 'libvirt-locking-sanlock' sub-RPM is introduced to the libvirt.spec file
- The /etc/libvirt/qemu.conf file will gain a configuration parameter to set the lock manager implementation
- A new /etc/libvirt/qemu-sanlock.conf file is introduced for sanlock lock manager configuration

How To Test

There are no special hardware requirements for testing this feature, beyond those already required for running QEMU/KVM virtual machines.

General host setup

Install libvirt, KVM, etc as per normal practice. Additionally install the 'augtool', 'libvirt-lock-sanlock' and 'sanlock' RPMs using yum

The sanlock plugin requires a directory in which it will store leases. For single host protection, this directory can be a local filesystem, but for cross-host protection it needs to be a network filesystem like NFS, or cluster filesystem like GFS. By convention the directory should be '/var/lib/libvirt/sanlock'.

Each host that shares the same filesystem for leases, needs to be allocated a *unique* host ID, between 1 and 512.

With this in mind the basic configuration for sanlock can be done with the following augeas commands:

 $ augtool
 augtool> set /files/etc/libvirt/qemu.conf/lock_manager "sanlock"
 augtool> set /files/etc/libvirt/qemu-sanlock.conf/host_id 1
 augtool> set /files/etc/libvirt/qemu-sanlock.conf/auto_disk_leases 1
 augtool> set /files/etc/libvirt/qemu-sanlock.conf/disk_lease_dir "/var/lib/libvirt/sanlock"
 augtool> save
 Saved 1 file(s)
 augtool> quit

Obviously, change the 'host_id' line to give a unique value for the host.

By default sanlock uses a software watchdog to ensure that the host is automatically hard rebooted if something goes wrong. In testing this is not very nice, so disable the sanlock watchdog and then start the sanlock daemon

 $ echo 'SANLOCKOPTS="-w 0"' > /etc/sysconfig/sanlock
 $ service sanlock start

Single host testing

- Follow the 'General host setup' instructions
- Restart the libvirtd daemon
- Provision two virtual machines
- Create a third disk image  (eg dd if=/dev/zero of=/var/lib/libvirt/images/extra.img bs=1M count=100)
- Add the following XML to the configuration of both virtual machines
     <disk type='file' device='disk'>
       <source file='/var/lib/libvirt/images/extra.img'/>
       <target dev='vdb' bus='virtio'/>
     </disk>
- Start the first virtual machine
- Attempt to start the second virtual machine

The last step should fail, with a message that the disk image is already in use.

 - Stop the first virtual machine
 - Attempt to start the second virtual machine

The second VM should now successfully run


Dual host testing

- Follow the 'General host setup' instructions, on both hosts
- Mount an NFS volume at /var/lib/libvirt/sanlock on both hosts
- Restart the libvirtd daemon on both hosts
- Provision a virtual machine
- Copy the virtual machine configuration to the second host
        virsh dumpxml myguest > myguest.xml
        virsh -c qemu+ssh://otherhost/system define myguest.xml
- Start the virtual machine on the first host
- Attempt to start the virtual machine on the second host

The last step should fail, with a message that the disk image is already in use.

- Stop the virtual machine on the first host
- Attempt to start the virtual machine on the second host

The VM should now succesfully run on the second host

Migration testing

- As per "Dual host testing"
- Attempt to migrate the running VM from the first host to the second host

Libvirtd failure testing

- As per 'Single host testing"
- Start the first virtual machine
- Stop the libvirtd daemon, without stopping the VM
- Delete the files /var/run/libvirt/qemu/myguest.{pid,xml}  (this ophans the VM from libvirtd)
- Start the libvirtd daemon
- Attempt to start the first virtual machine again

The last step should fail, with a message that the disk image is already in use.

- Find the orphaned QEMU process and manually kill it
- Attempt to start the first virtual machine again

The VM should now once again run successfully

User Experience

End users should see no difference in behaviour of QEMU/KVM virtualization during normal operation.

They will be prevented from making certain configuration/operational mistakes which would otherwise result in the same disk image being run twice


Dependencies

The feature is confined to the 'libvirt' package

Contingency Plan

The use of 'sanlock' is an explicit adminsitrator 'opt in', thus no contingency plan is required. The user can simply run without a lock manager, in which case the behaviour will be identical to previous Fedora releases.

Documentation

The primary upstream documentation is at

Release Notes

  • The QEMU/KVM virtualization driver in libvirt includes an optional lock manager plugin to enforce exclusive access to the virtual machine disk images on a single host. This prevents multiple guests being started with the same disk image, unless the <shareable/> flag is set for the disk
  • If a shared filesystem (eg NFS) is mounted at /var/lib/libvirt/lockd, the protection extends across multiple hosts in the network
  • If configuring locking across multiple hosts it is important to ensure that all disk image paths are globally unique across all hosts sharing the same NFS mount, and that block devices use the stable unique names under /dev/disk/by-path/ and not the unstable /dev/sdNN names

Comments and Discussion