From Fedora Project Wiki

Koji Maven Support

Koji Maven Support (hereafter referred to as Koji-Maven) is an attempt to bring the same security, auditability, and reproduceability to Java builds that Koji brings to rpm builds, without having to radically alter the build processes of upstream Java projects.

What is it?

Koji-Maven is a wrapper around the Maven build tool, in the same way Koji is a wrapper around the mock build tool. Koji-Maven manages the repository of jars from which Maven pulls build dependencies, and tracks the build environment, recording what jar files are downloaded into the build environment. Build output is added to the repository for use by subsequent builds, and build logs are stored centrally. A "wrapper" rpm can be generated that contains exactly the same jars generated by the Maven build, and can then be used by subsequent rpm-based builds. And rpm-based builds which generate jars can add those jars to the Maven repository for use by subsequent Maven builds. For Fedora we will create the wrapper rpms to ship with the distro.

How does it work?

Package Management

Koji-Maven manages Java artifacts (.jar, .war, .ear, etc. files) natively, in much the same way that Koji manages .rpm files. Koji-Maven extends the concept of a Koji build so it can be associated with both rpms and Java artifacts. These builds may then be added to a tag, and repositories can be created from groups of tags. For each tag associated with a build target, Koji-Maven can be configured to create a Maven repository, in addition to the yum repository. This repo will contain all Java artifacts associated with a build that is associated with that tag (or another tag in the group). Each tag can be configured to include only artifacts from the "latest" build of a given package (consistent with the yum repositories), or to include all versions of a package associated with the tag. This is required because a single Maven build may depend on more than one version of the same package. Once the repositories are created they are used by the builders for creating an environment suitable for building other packages.

Building

A Koji-Maven build environment is created the same way as a standard Koji build environment, by using mock to install a group of packages, including "maven2" (which will do the actual building), into a chroot. The source code and any patches are then downloaded from a source control system and the patches applied. A settings file is written into the chroot that points Maven at the Koji-managed Maven repo, and uses that as the override for all project-defined repos (using <mirrorOf>*<mirrorOf>). /usr/bin/mvn is then run against the source tree. The source tree must have a .pom file in its top-level directory for Maven to process it.

The Maven build is performed in two steps. First /usr/bin/mvn dependency:resolve-plugins is called. This downloads all necessary plugins from the Koji-managed Maven repo into the local Maven repo. Koji-Maven then scans the local repo to determine which artifacts have been downloaded, and records that information in the database. If all plugins are resolved successfully, then /usr/bin/mvn deploy is called to perform the actual build. Once that is done the local repository is scanned again to identify any additional build dependencies that were downloaded, and records those as well. The build output is then uploaded to the hub where it is processed, recorded in the database, and added to the Koji-managed Maven repo for later distribution or use by other builds.

The developer can also specify the location in source control of a specfile fragment that can be used to package the jars/wars/ears/etc. generated by the Maven build into a rpm. The specfile fragment can use the Cheetah templating language to perform substitutions, conditionalize parts of the specfile, or execute simple logic. The template is passed a defined set of data, including the name, version, and release of the build, and a list of all output generated by the Maven build. Once the template has been processed, it is placed into a directory with the output of the Maven build and rpmbuild is run to generate the "source rpm". This srpm will actually contain the binary jars generated by the Maven build. Once the srpm is generated, it is built in a pristine mock chroot like any other rpm built in Koji. Once this "wrapper rpm" build is complete, the rpms are associated with the existing Koji build and are available for testing and distribution, like any other rpms in Koji.

Why should we use it?

Maven has emerged as the defacto standard Java build tool, and more and more projects are using it upstream. However, the Maven build model relies on downloading pre-built binary jars from potentially untrusted repositories on the Internet, with no link back to the source code and no way to verify who built what when. This is incompatible with the objectives and policies of the Fedora Project, and incompatible with a robust, reliable, reproduceable, and auditable build and release process. Koji-Maven was designed to address these issues by managing all Maven artifacts locally, and providing a link between the source code, the binary jars, and the build process and environment.

The alternative is to run Maven from within rpmbuild, and Fedora Java packagers have gone through significant effort to make this happen. However, it has a number of problems. A fundamental assumption of Maven is that every version of every package goes into a global repository (http://repo1.maven.org/ is the largest, but there are a few others) and that Maven has access to this repository at build time. Maven builds are free to depend on any version of any package in that repository, and builds will often pull down multiple versions of the same package to satisfy plugin and build dependencies. This is at odds with rpmbuild, which assumes that everything is available locally, and discourages network access during the build. Fedora Java packaging works around this by patching Maven to support use of /usr/share/java as a Maven repo, patching the .pom files to support this local repo, and by maintaining a set of XML files that map the names and versions of dependencies (as reflected in the project's .pom file) to jars with different names and versions that are provided by installed rpms. The denormalization of dependency information, generation and maintenance of the patches, and divergence from the upstream build process are all barriers to getting new Java packages building in Fedora. It requires significant initial effort, and the complexity and fragility of the build process means that the maintenance burden is increased. Upgrading a single package to a new version can cause a large number of dependent packages fail rebuilds, because version numbers are hard-coded in the dependency maps. See the jetty specfile for an example of a Fedora Java package being built with Maven.

Koji-Maven aims to alleviate a lot of these problems by decoupling the build process from the packaging format. Rather than forcing Maven to operate from within rpmbuild, something it was never designed for, we run Maven directly, and use rpm simply as a packaging format to bundle the output of the Maven build process. This removes the need to patch .pom files or generate dependency maps. The build process is exactly the same as what the upstream developers are doing every day, so build problems will be identified and fixed quickly. The specfile fragments that are used to package the Maven output are very simple and can be reused between projects in many cases. This allows Java developers to focus on improving the code, rather than getting mired in complex packaging issues, and should encourage the inclusion of many more Java projects into the Fedora distribution.

Note that since the Koji-Maven builds will be producing rpms, they will be available as dependencies to current rpm-based builds, and rpm-based builds will be able to populate the Maven repos with their Java output. There will not be a need to migrate existing Java builds to the Koji-Maven approach, both processes can co-exist and interoperate fully. If a packager chooses to use the Koji-Maven system, they will benefit from a simpler build process and lower maintenance overhead.

A major goal of the Koji-Maven effort would be to get JBoss building using this system. JBoss is a large, complex application with a lot of dependencies, and this has made packaging it as rpms a challenge. Koji-Maven would enable us to build JBoss from source using the upstream build process, and package up the result as rpms with minimal effort. Having JBoss as a well-supported component of Fedora, and enabling users to run yum install jboss and get a working application server, is a compelling reason to use Koji-Maven in the Fedora build system.

Drawbacks

End-user rebuilds

The biggest drawback is probably the inability to rebuild from source using rpmbuild --rebuild. The source rpms will be a container for both the upstream source (a zipfile containing the checkout from source control) and the binary jars built by Maven. rpmbuild --rebuild will repackage the binary jars into binary rpms, but won't rebuild from source. To do that, you'd need to unzip the source zipfile included in the source rpm, configure Maven to point to a suitable Maven repo (like a publicly-available Fedora-Maven repo), and build using Maven. This may not have quite the simplicity of using rpmbuild or mock, but it should enable people who want to rebuild to do so without undue effort.

Bootstrapping

To bootstrap the Koji-Maven system we'd need to import a large number of Java artifacts (binary jars and .pom files), either from other Maven repos or from our own Java packages, similar to the way we initially bootstrapped Koji by importing binary rpms. To bootstrap an environment suitable for building a large Java project (like JBoss), something on the order of 500 new packages would need to be added to Koji-Maven. How to handle this flood of new packages, to ensure that they have maintainers, and to get them all rebuilt from source will be a significant effort.