It is very easy to write a kickstart file with bugs or that results in a system that does not boot. I'll present a series of questions to ask yourself and examples to help diagnose where the problem lies. Once you know that, it should be easier to understand what you can do to inspect further.
There are 4 steps in the process:
- create a guest
- perform an automated installation in the guest
- boot the guest and extract the list of installed RPMs
- upload and archive the disk image of the guest
Is it a problem with guest creation?
I'd like to think this never happens, but this is new code and a human wrote it. There have been unusual cases where libvirt, ImageFactory, or Oz was misconfigured and guests could not be started properly. So far the errors have been clear in the task output, look either in the results string or oz.log. The bad news is that in this case you really can only inform Rel-Eng about the issue and wait for a resolution. The best way is to file a ticket in [1]
Did the installation fail?
The Anaconda installation can fail for many reasons: missing packages, network problems, or syntax errors in %post. Tasks will also fail if Anaconda prompts for input for any reason. If Brew detects a lack of disk activity in the guest for more than 5 minutes, it will fail the build and tear down the guest.
These sorts of failures often have a screenshot taken and saved with the task output called screenshot.ppm. Viewing this will usually tell you what Anaconda is complaining about if the installer detected an issue or prompted for input. Here's an example. Note the string in the results output that says "No disk activity in 300 seconds, failing." This almost always means Anaconda hit an issue and either gave up or waited.
If Anaconda claims it is missing packages, confirm they exist in the repos you are using with --repo, if you are using that option. If you are not, confirm the builds you expect are in the tag inheritance for the target you are running. This is a lot like checking whether an RPM will build against the right libraries, except we're building an image instead.
If you get the rare Anaconda dialog box that says something like "An unexpected error occurred", try using the text command in kickstart, which will have Anaconda boot in text mode. Sometimes the Python traceback (or whatever the error condition is) will be printed there. I have also seen cases where text-mode yields a black screen, but booting in graphical mode (the default) does produce a useful dialog box. Issue like this stem from syntax errors in the kickstart file, or bugs in pykickstart itself. If you think it is a pykickstart bug, then someone in Rel-Eng needs to update pykickstart on the builders.
Did the guest boot?
Koji waits 5 minutes for a guest to boot in this step. It unfortunately does not give a lot of insight to why a guest may not boot, so these are a tougher class of issues to work through. You can usually answer this question by looking in results string. If you see "Timed out waiting for guest to boot", then this is your problem. You can also confirm this in oz.log. there is some enhancements on the way that will make diagnosing these issues easier.
For now though, the best way to investigate an issue like this is to drive a guest installation locally using something like Gnome's Virtual Machine Manager (VMM). The steps I do are:
- Select a Network Install
- For the Operating System Install URL use the same one you have in koji. It will be something like
- Set the Kickstart URL to where your kickstart file is. You may need to make it available over http.
- Bump the memory to 2048M for good measure
- Launch the guest and let it complete installation
- Open a VNC session and watch what happens when the guest attempts to boot.
If the console is not providing enough information, we have to get more creative. Anaconda supports starting an SSH daemon while the installation is happening with the sshpw command in kickstart. Set that and comment out the reboot command. This will let the installation complete locally and wait for a keystroke to reboot the guest. At this point you should be able to ssh in and inspect the environment to figure out what is going on. You should also consider making use of the --log option to %post so that output from the script is saved somewhere.
Another option would be to scp logs and other files off of the guest as part of the %post script.
Other Guest Misconfigurations
If the guest boots but you're having problems accessing it I'd suggest following same procedure as when the guest fails to boot. This could be a result of firewall misconfigurations or SSH not being available for some reason. Usually in this case the build is succeeding in Brew, but there's something still fundamentally broken in the image. If the issue is something you can investigate while the guest is online (you can log in), then I'd suggest importing it locally using the libvirt.xml and the disk image provided in Brew's task output.
You can also do investigative work in an offline mode by mounting the image locally or using something like libguestfs to poke around without starting the guest. The fast, dirty way I do it is by mounting it. This can often pollute your guest environment but since I'm investigating a bug that will cause me to rebuild the image anyway I don't usually care. Here's how to do it on RHEL 6:
- Download the image from Brew
If the image format is not raw, you have to convert it first with qemu-img. Something like:
$ qemu-img convert -O raw <image-file> <output-file>
Now mount it up using loopback devices. (as root) If your image has multiple partitions in it, you may need to pass in a different mapped loopback device like loop0p2. Whichever one you think is the root partition or has the issue you're trying to fix.
# kpartx -av <raw-image> # mount -o loop /dev/mapper/loop0p1 /mnt/my_directory
Hopefully at this point you figure out the issue. To tear down the image you'll run commands as root like so:
# umount /mnt/my_directory # dmsetup remove loop0p1 # losetup -d /dev/loop0
Again, if you used different loopback devices, substitute those in to the dmsetup and losetup commands.