Zuul Based CI
What is Zuul
Zuul is the CI and gating system form the OpenStack project. It is able to scale fine and handle by default features such as artifacts sharing between jobs and cross Git repositories testing. You can see Zuul in action here [1].
Below is a list of features proposed by Zuul and its companion Nodepool:
- Event-driven pipelines based on Code-Review or Pull-Request workflow: jobs can be triggered automatically when a PR is submitted, changed, approved, or when the repository is tagged.
- CI-as-code: jobs are defined as YAML + Ansible playbooks, pipeline definitions as YAML files. Zuul reads and loads those definitions directly from Git repositories.
- Support for jobs inheritance, jobs dependencies, jobs chaining (with artifacts sharing).
- Speculative testing of new jobs before merging: jobs will be run as they are submitted to make sure they behave as expected.
- Cross repositories dependencies: a jobs' workspace can include unmerged patches from other projects if specified
- Parallel job run, only capped by resources available or predefined quotas
- Automated jobs resources lifecycle management: resources like VMs or containers needed by a given job can be defined in-repository, spawned on demand at a job's start, and destroyed when the job is finished, or held for debugging
- Job resources support of OpenStack, OpenShift, K8S, Static nodes, AWS.
- Well-defined, reproducible job environments to eliminate flakiness
- Speculative testing before merging (gating): if several patches are about to land at the same time, they are tested on the repository's future state.
Until now, Zuul was only able to listen to Gerrit or Github events, a new upcoming driver [2] will allow Zuul to interface with Pagure as well. Pagure, Zuul and Nodepool could therefore combine into a very efficient CI/CD stack.
Pagure PR tests via the Zuul
Firstly we have worked on a Zuul driver for Pagure that bring all the nice features of Zuul usage with any recent Pagure instances (Zuul was designed for CI and Gating of OpenStack projects). It only relies on the Pagure web hook and API system.
We are able to run jobs and report results on PRs opened on Pagure instances like https://src.fedoraproject.org or https://pagure.io.
To do so we have deployed a Zuul/Nodepool instance (From the Software Factory project https://www.softwarefactory-project.io/) here: https://fedora.softwarefactoryproject.io/zuul/
Some use cases for the PR workflow
- Build package on PR: https://stg.pagure.io/python-mock-distgit/pull-request/1
- Build package on PR which depends on another PR and validate a BuildRequire deps is handled [3] (artifacts sharing): https://stg.pagure.io/python-redis-distgit/pull-request/1
- Build package on PR then run child jobs to validate package via the package included tests (standard test interface): https://stg.pagure.io/attr/pull-request/2. Here a negative test where the source is changed to trigger a failure [4]
- Build package on PR then validate STI included functional tests and RPM lint via two childs job https://stg.pagure.io/python-redis-distgit/pull-request/2. Note that this RPM build is done on Koji as a scratch build [5].
Some PR workflows for src.fedoraproject.org
- When a PR is proposed or changed or at the packager request (by typing a specific PR comment in Pagure)
- Parent job to scratch build the package on Koji
- Child job to run in package functional tests
- Child job to run RPM lint
- When the PR is merged or at the packager request (by typing a specific PR comment in Pagure)
- Job to to build of Koji is performed
Zuul has a branch matching system that make a job behave differently according to the branch where the PR is opened. That means PR on master could build on the Koji rawhide target and validate on a rawhide node, a PR on f30 branch could build against the f30 target and validate on the f30 node.
Advanced scenarios that involve multiple packages could be validated at PR level. For instance a PR on rpms/mod_wsgi could have a dependency on a rpms/httpd PR (assuming both projects have a rpm build job attached based on Zuul). The jobs for the PR on rpms/mod_wsgi could use the RPM artifacts built for the dependent rpm/httpd PR for build (BuildRequire) and validation (Require). The dependencies chain is not limited to one dependency.
How a Git repository is attached to Zuul
Zuul serves a web server with a dedicated endpoint to receive the "web event hook notifications" sent by Pagure. The events are the source that will trigger actions Zuul side like a job execution. To report back CI status, comments, or even merge on Pagure (gating), Zuul relies on the Pagure REST API.
Zuul needs a project API token to act on the Pagure REST API and a project web hook token to validate event payloads sent from Pagure to the Zuul endpoint. Both tokens are per project on Pagure thus to scale Zuul needs a user API token set with the "Modify an existing project" right to read the web hook token and create project API tokens. The owner of this user API token must be added as project admin.
For instance on https://stg.pagure.io there is already a bot account for Zuul called zuulbot [6]. To attach a project from this staging instance of Pagure for https://fedora.softwarefactory-project.io Zuul here is the process:
- Add "zuulbot" as admin in settings/Users & Groups
- In Settings
- Notify on pull-request flag
- Web-hooks: https://fedora.softwarefactory-project.io/zuul/api/connection/stg.pagure.io/payload
- (For gating, optional): Minimum score to merge pull-request: N
- (For gating, optional): Always merge
Finally https://fedora.softwarefactory-project.io Zuul must be tell to handle the project by opening a PR here https://pagure.io/fedora-project-config/blob/master/f/resources/fedora.yaml and have it merged. Feel free to try !
How a job is attached to a Git repository
Let's have a look to jobs attached to stg.pagure.io/python-redis-distgit [7].
The project's pipelines definition is located in [8]. There is a use of a template called "basic-check". The template defined which jobs will run in check, gate or post pipelines.
Check, Gate, Post pipelines are defined here [9]. Basically they define which Pagure Event triggers jobs attached to a pipeline and what action Zuul should take when a job is a success or a failure.
basic-check template jobs are defined in:
- rawhide-rpm-koji-scratch-build in fedora-zuul-jobs [10] is based on a parent job [11]. Notice that job definition format is purely YAML following a format expected by Zuul [12]. The base job use post-run and run playbooks from [13]. Those playbooks use roles pagure.io/zuul-distro-jobs project (see roles: [{zuul: zuul-distro-jobs}] on the base job definition).
- rawhide-rpm-tests in fedora-zuul-jobs [14]
- artifact-rpm-lint in zuul-distro-jobs [15]
Current architecture
Configuration
Zuul and Nodepool are hosted on https://fedora.softwarefactory-project.io. Here is the list of Pagure repositories that contain the configuration:
- pagure.io/fedora-project-config [16] Contains the Software Factory configuration. This is where Nodepool providers, Nodepool images are defined and where Git projects are attached to Zuul. Each PR proposed on that repository is validated by Zuul and deployed once merged. For instance see the "config-check" job on https://pagure.io/fedora-project-config/pull-request/18. And the deployment job "config-update" here [17]
- pagure.io/fedora-zuul-jobs-config [18] Contains the Zuul jobs configuration. This a Zuul trusted repository. Config changes included on PRs on that repository won't be taken in account by Zuul. A trusted repository is best suited to host pipelines, project's pipelines, secrets.
- pagure.io/fedora-zuul-jobs [19] Contains the Zuul jobs configuration but as untrusted repository.
zuul-distro-jobs
pagure.io/zuul-distro-jobs [20] is a generic suite of jobs and roles for Zuul dedicated to build and publish RPMs (and hopefully containers in the future). We are working on it with the idea to provide a ready to use Zuul jobs library for Zuul users having to handle RPM build in the CI. Currently the is support for Koji, Copr, DLRN, Mock. Feel free to add more !
Nodepool node
For the moment three node labels are defined in Nodepool [21]
- Fedora 29 cloud image (VM)
- Fedora 30 container (runc)
- Centos 7 container (runc)
Buildsys build validation via AMQP and Zuul
We have built a proof of concept that use Zuul to run jobs based on event received on fedmsg.
The goal was to run a rpmbuild job when a package is built on Fedora's Koji and later be able to reuse the same Zuul jobs that can be triggered at PR level at fedmsg event level too.
Unfortunately Zuul is not designed out of the box to handle AMQP messages mainly because Zuul is designed to react on events generated by a Git or Code Review system where each event belong to a Git repository.
Nevertheless it is possible to simulate a Git repository when an event on fedmsg arrive and send the proper and expected Git events to Zuul.
Architecture
The Zuul Gateway
The Zuul gateway [22] is a proof of concept that generates virtual Git references to trigger zuul events from non git events. It simulates a Pagure instance and interact via Zuul via the Zuul's Pagure driver. Zuul gateway is controled via a REST api.
The Fedora messaging Consumer
The Fedora messaging Consumer [23] (See Consumer class) is a simple fedora-messaging Callback that filter events and write events as JSON on the files system (new/). Those events file will be consumed by another process.
The Fedora messaging Processor
The Fedora messaging Processor [24] (See the main function) manage to (in a loop):
- look for an event file on the files system (generated by the consumer)
- call Koji to fetch the build tasks data and fetch the list of built rpms
- call the Zuul gateway to convert each event to a fake in memory Git repository (and create a fake .zuul.yaml)
- call the Zuul gateway to trigger a fake Pagure style event for Zuul
- Zuul read the fake Git repository and process the job
- look for the status of Zuul jobs
- move the event from new/ to done/ when the job finished
Access the jobs results
Triggered jobs are processed by Zuul so jobs results are available in the builds page of Zuul https://fedora.softwarefactory-project.io/zuul/t/fedora-staging/builds?project=gateway. Here [25] is a the result of the event [26].