From Fedora Project Wiki
(New page: == Supporting EPEL Builds in Koji == Using Koji to build EPEL packages has been a goal for a long time. However, it has been held up by the desire to build against official RHEL package...)
 
No edit summary
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Supporting EPEL Builds in Koji ==
== Supporting EPEL Builds in Koji ==


Using Koji to build EPEL packages has been a goal for a long time.  However, it has been held up by the desire to build against official RHEL packages while not making those packages public.  Right now Koji can only build against packages it has a local copy of, under <code>/mnt/koji/packages</code>, and that directory is served via http to the public.
Using Koji to build EPEL packages has been a goal for a long time.  However, it has been held up by the desire to build against official RHEL packages while not making those packages public.  Right now Koji can only build against packages it has a local copy of, under <code>/mnt/koji/packages</code>, and that directory is served via http to the public.


This is a proposal to enable Koji to populate the minimal buildroot and resolve dependencies using packages from an admin-configurable remote yum repository.  A minimum amount of information about packages from the remote repository would be inserted into the local Koji database to allow the package to be traced back to its origin.  This would enable Koji to pull RHEL packages from a private repo for the purpose of building EPEL packages, without making the RHEL packages public.  Note that everything '''built''' by Koji would still be available to the public.  This approach has the additional benefit of greatly simplifying the bootstrapping process for people running their own Koji instances.  They can simply consume packages from an existing yum repo and skip the entire package import process.
This is a proposal to enable Koji to populate the minimal buildroot and resolve dependencies using packages admin-configurable remote yum repositories.  A minimum amount of information about packages from the remote repositories would be inserted into the local Koji database to allow the packages to be traced back to their origins.  This would enable Koji to pull RHEL packages from a private repo for the purpose of building EPEL packages, without making the RHEL packages public.  Note that everything '''built''' by Koji would still be available to the public.  This approach has the additional benefit of greatly simplifying the bootstrapping process for people running their own Koji instances.  They can simply consume packages from an existing yum repo and skip the entire package import process.
 
=== Finding the packages ===
 
Koji generates a yum config for every buildroot it creates and uses.  This config points to a repo that has been created by Koji from packages imported into or built by Koji and associated with the build tag.  Adding additional, external repos raises questions about which repo a package should come from if the package sets overlap.  There is a strong feeling that if a package exists in the Koji-managed local repo (whose contents the Koji admin has full control over) it should ''always'' be preferred over am external repo (whose contents the Koji admin may have little or no control over).  As an example, if a package is available in a remote repo, and a custom version of the same package is built in Koji, the Koji version should always be the one available in the buildroots.  It should not be overridden when a remote repo decides to update that package to a newer version, or the customizations (which we assume were made for good reason) would get silently lost.  In particular, always preferring the local Koji packages over the remote repo packages allows for reverting a package to a version earlier than what is in the remote repo, which may be necessary to resolve build problems or conflicts.


=== Implementation ===
After discussions with the yum developers, it was decided that the best way to support this preference for the Koji-managed repo would be to merge the local repodata and the remote repodata into a single repo.  During this merge process, any packages provided by the local repo would be included in the merged repodata, and the corresponding packages from the remote repo would be elided.  This filtering process would need to be done at the source rpm level, to avoid subpackages from the remote repo slipping in to the repodata.  In addition, the the tool would need to accept a ''blacklist'' of packages (srpm names) that will never get included in the repodata, from any repo, to support ''blocking'' packages.  Seth Vidal and James Antill have generously offered to spearhead development of this '''mergerepo''' tool.


==== Finding the packages ====
A new field, <code>remote_repo_url</code> will be added to the <code>tag_config</code> table in the Koji database to hold the URL of an additional repo, configurable on a per-tag basis.  The url may contain placeholders for <code>arch</code> and <code>tag name</code> that will be replaced with the appropriate values for the repo being created.  At repo creation time, the tag inheritance tree will be walked and the urls for all remote repos associated with tags in that tree will be collected.  These urls will be passed, in inheritance order, to the <code>createrepo</code>/<code>mergerepo</code> tool, which will generate the final repodata by filtering out duplicate packages, using either the ''first-match-wins'' rule described above (with the Koji-managed repo being the first repo), or possibly some other configurable filtering rule (''highest-nvr-wins'' being another possibility).  The resulting repo will then be used for subsequent builds against the tag.


Koji generates a yum config for every buildroot it creates and uses.  This config currently points to a single repo that has been created by Koji from packages associated with the build tag.  To enable the use of a remote repo, a second repo will be added to this yum config.  A new field, <code>remote_repo_url</code> will be added to the <code>tag_config</code> table in the Koji database to hold the URL of the additional repo, configurable on a per-tag basis.  Priorities in the yum config will be setup such that if a package exists in both the local Koji environment and the remote repo, it will be pulled from the local environment, even if it's an older version.  This will provide an admin with maximum control over the Koji buildroot contents, and the ability to upgrade or downgrade packages as necessary.  This may require the yum-priorities plugin to be installed on the build hosts.
=== Tracking the packages ===


==== Tracking packages ====
When a build is complete Koji uploads the list of rpms in the buildroot to the hub to be inserted into the database for tracking purposes.  Currently, every rpm in the list must exist in the Koji database or an error will be raised.  Under this proposal, if there are no remote repos configured in the tag hierarchy, this behaviour doesn't change, and Koji will work exactly as it does now.


When a build is complete Koji uploads the list of packages in the buildroot to the hub to be inserted into the database for tracking purposesCurrently, every package in the list must exist in the Koji database or an error will be raisedUnder this proposal, if the remote repo in the <code>tag_config</code> table is null, this behaviour doesn't change, and Koji will work exactly as it does now.
If there are remote repos enabled in the tag hierarchy, Koji will load the repodata used to perform that build.  For each rpm found in the buildroot it will query the repodata for the <code>baseurl</code> associated with that rpmIf the <code>baseurl</code> corresponds to the location of the locally-managed Koji packages, then information about that rpm already exists in the Koji database, and it will be handled normallyIf the <code>baseurl</code> points to somewhere other than the Koji package store, then that rpm came from a remote repo.  For each of these rpms an entry will be created in the <code>rpminfo</code> table that stores the name, version, release, and some additional metadata, pulled from the repodata, that allows the rpm to be identified.  An additional field, <code>origin</code>, will be added to the table, and this will be populated with <code>baseurl</code> associated with the rpm.  This field will be populated with a common value, ''local'', for all locally-managed rpms.  The new <code>rpminfo</code> entry will then be associated with the buildroot via the <code>buildroot_listing</code>, just as a locally-managed rpm would.  A remote <code>rpminfo</code> entry will not be associated with a build, and a constraint will be added to the table to ensure that only entries whose origin is not ''local'' may have a null <code>build_id</code>.  The XML-RPC API will be updated to include the <code>origin</code> information in the data structures it returns, and the web UI will be updated to indicate that a rpm came from a remote repository, and provide the url to that repository.


NOT YET COMPLETE
Adding rpms from remote repositories into the <code>rpminfo</code> table does raise some issues.  Right now that table enforces uniqueness of (''name, version, release, arch'').  This is appropriate when all rpms are being managed locally in Koji, and we want to prevent rpms with the same ''N-V-R.A'' but different contents from existing in the system.  However, remote repos may have rpms that duplicate locally-managed rpms, and it may be appropriate for one tag to pull in an rpm from a remote repo that exists locally in another tag.  For this reason, the <code>rpminfo_unique_nvra</code> constraint on the <code>rpminfo</code> table will be expanded to include the <code>origin</code> field as well.  Each remote repo will now have its own namespace for ''N-V-R.A''.  Locally-managed rpms will still have the same uniqueness constraints because they will all share the ''local'' namespace.  ''N-V-R.A'' uniqueness will also be enforced on remote repos by using the data obtained from the repodata.  If a remote rpm is used in a build and its ''N-V-R.A'' already exists in the same namespace, but has a different <code>payloadhash</code>, an error should be raised.

Latest revision as of 18:43, 15 July 2008

Supporting EPEL Builds in Koji

Using Koji to build EPEL packages has been a goal for a long time. However, it has been held up by the desire to build against official RHEL packages while not making those packages public. Right now Koji can only build against packages it has a local copy of, under /mnt/koji/packages, and that directory is served via http to the public.

This is a proposal to enable Koji to populate the minimal buildroot and resolve dependencies using packages admin-configurable remote yum repositories. A minimum amount of information about packages from the remote repositories would be inserted into the local Koji database to allow the packages to be traced back to their origins. This would enable Koji to pull RHEL packages from a private repo for the purpose of building EPEL packages, without making the RHEL packages public. Note that everything built by Koji would still be available to the public. This approach has the additional benefit of greatly simplifying the bootstrapping process for people running their own Koji instances. They can simply consume packages from an existing yum repo and skip the entire package import process.

Finding the packages

Koji generates a yum config for every buildroot it creates and uses. This config points to a repo that has been created by Koji from packages imported into or built by Koji and associated with the build tag. Adding additional, external repos raises questions about which repo a package should come from if the package sets overlap. There is a strong feeling that if a package exists in the Koji-managed local repo (whose contents the Koji admin has full control over) it should always be preferred over am external repo (whose contents the Koji admin may have little or no control over). As an example, if a package is available in a remote repo, and a custom version of the same package is built in Koji, the Koji version should always be the one available in the buildroots. It should not be overridden when a remote repo decides to update that package to a newer version, or the customizations (which we assume were made for good reason) would get silently lost. In particular, always preferring the local Koji packages over the remote repo packages allows for reverting a package to a version earlier than what is in the remote repo, which may be necessary to resolve build problems or conflicts.

After discussions with the yum developers, it was decided that the best way to support this preference for the Koji-managed repo would be to merge the local repodata and the remote repodata into a single repo. During this merge process, any packages provided by the local repo would be included in the merged repodata, and the corresponding packages from the remote repo would be elided. This filtering process would need to be done at the source rpm level, to avoid subpackages from the remote repo slipping in to the repodata. In addition, the the tool would need to accept a blacklist of packages (srpm names) that will never get included in the repodata, from any repo, to support blocking packages. Seth Vidal and James Antill have generously offered to spearhead development of this mergerepo tool.

A new field, remote_repo_url will be added to the tag_config table in the Koji database to hold the URL of an additional repo, configurable on a per-tag basis. The url may contain placeholders for arch and tag name that will be replaced with the appropriate values for the repo being created. At repo creation time, the tag inheritance tree will be walked and the urls for all remote repos associated with tags in that tree will be collected. These urls will be passed, in inheritance order, to the createrepo/mergerepo tool, which will generate the final repodata by filtering out duplicate packages, using either the first-match-wins rule described above (with the Koji-managed repo being the first repo), or possibly some other configurable filtering rule (highest-nvr-wins being another possibility). The resulting repo will then be used for subsequent builds against the tag.

Tracking the packages

When a build is complete Koji uploads the list of rpms in the buildroot to the hub to be inserted into the database for tracking purposes. Currently, every rpm in the list must exist in the Koji database or an error will be raised. Under this proposal, if there are no remote repos configured in the tag hierarchy, this behaviour doesn't change, and Koji will work exactly as it does now.

If there are remote repos enabled in the tag hierarchy, Koji will load the repodata used to perform that build. For each rpm found in the buildroot it will query the repodata for the baseurl associated with that rpm. If the baseurl corresponds to the location of the locally-managed Koji packages, then information about that rpm already exists in the Koji database, and it will be handled normally. If the baseurl points to somewhere other than the Koji package store, then that rpm came from a remote repo. For each of these rpms an entry will be created in the rpminfo table that stores the name, version, release, and some additional metadata, pulled from the repodata, that allows the rpm to be identified. An additional field, origin, will be added to the table, and this will be populated with baseurl associated with the rpm. This field will be populated with a common value, local, for all locally-managed rpms. The new rpminfo entry will then be associated with the buildroot via the buildroot_listing, just as a locally-managed rpm would. A remote rpminfo entry will not be associated with a build, and a constraint will be added to the table to ensure that only entries whose origin is not local may have a null build_id. The XML-RPC API will be updated to include the origin information in the data structures it returns, and the web UI will be updated to indicate that a rpm came from a remote repository, and provide the url to that repository.

Adding rpms from remote repositories into the rpminfo table does raise some issues. Right now that table enforces uniqueness of (name, version, release, arch). This is appropriate when all rpms are being managed locally in Koji, and we want to prevent rpms with the same N-V-R.A but different contents from existing in the system. However, remote repos may have rpms that duplicate locally-managed rpms, and it may be appropriate for one tag to pull in an rpm from a remote repo that exists locally in another tag. For this reason, the rpminfo_unique_nvra constraint on the rpminfo table will be expanded to include the origin field as well. Each remote repo will now have its own namespace for N-V-R.A. Locally-managed rpms will still have the same uniqueness constraints because they will all share the local namespace. N-V-R.A uniqueness will also be enforced on remote repos by using the data obtained from the repodata. If a remote rpm is used in a build and its N-V-R.A already exists in the same namespace, but has a different payloadhash, an error should be raised.