Product Definition Center
Summary
The Product Definition Center (PDC) is a webapp and API designed for storing and querying product metadata. We want to stand up an instance in Fedora Infrastructure and automatically populate it with data from our existing releng tools/processes. It will enable us to develop more sane tooling down the road for future releases.
Owner
- Name: Ralph Bean
- Email: rbean@redhat.com
- Release notes owner:
Current status
- Targeted release: Fedora 24
- Last updated: 2015-11-04
- Tracker bug: <will be assigned by the Wrangler>
Detailed Description
We need something more sophisticated than we have now to model releng processes. Right now, we have a collection of shell scripts, python bits, and koji tasks that all know "how to do" whatever it is that needs to be done. Whatever artifacts they produce, is what we produce.
When we introduced new types of artifacts (server/cloud/workstation, vagrant, docker, atomic, etc..) as requirements for releng in the past few years, we started to strain the existing processes. Those scripts became much more complicated and difficult to debug.
Long term, we would like to move to a more structured architecture for releng workflow, one that uses basic software engineering paradigms, like MVC. To start on that journey, we're looking to deploy something which can serve just as the M there (the Model).
With such a thing, we could rewrite some of our scripts to behave dynamically in response to state of the model. In the best case scenario (read: utopia), we would simply define a new variant of a deliverable in the model, and our tools would produce it. (Of course, things will involve more work than that).
Requirements
- We need something which can be queried to find out what types of artifacts releng is supposed to be producing.
- We need something which can be queried to find out what specific artifacts releng produced in the past (yesterday, last week, etc..).
- We need something which can be queried to find out what inputs go into which artifacts.
- We would like to be able to tier the mapping of inputs to artifacts, so that we can model layered builds.
- We need something which can be queried to find the QE status of a compose and the QE status of an artifact.
- That system should be eventually consistent with respect to the rest of our infrastructure.
Design
For this central know-it-all system, we're going to deploy PDC. We have a dev instance set up, but without any data in it, it is useless. We need to populate it, both initially and over time.
Ideas for populating it over time:
- Approach 1: We could instrument all of our existing releng tools to feed info to PDC about what they are doing, as they do it.
- Approach 2: Write a pdc-updater project. It will be a single service that listens for general activity from those tools on the fedmsg bus, and updates PDC about what they're doing.
Problems with Approach 1: we have to modify all the tools. If the PDC API changes, we need to modify it in all those places. We have to distribute PDC credentials to all those tools. None of those tools will work if PDC is not present.
We're going to go with Approach 2. The problem it bears is that a message could theoretically be dropped, so we'll have to write an audit script which can run once a day/week in a cron job. It will comb through all our systems and make sure that what PDC thinks is true, is actually true.
List of pdc-updater interactions
For some background, check out the PDC API first.
This is a base list -- we will likely add new interactions as we go along. Some of these ideas might not actually make sense in practice when we go to implement them, and we'll have to revise.
- When new packages are added to pkgdb, add them to pdc.
- When new packages are added to pkgdb, add them to the pdc bugzilla-components API.
- When new composes are completed by the releng/scripts/, add them to pdc.
- When new images are built in koji, add them the pdc images/ API.
- When new rpms are built in koji, add them to the pdc rpms/ API.
- When new commits are pushed to dist-git, add them to the pdc changesets/ API.
- When new users are added in FAS, add them the persons db.
We will then manage the releases/relase-types/release-variants/products db tables (with scripts) by hand when we go to branch a new release, or add a new artifact, etc.
Open Questions
- pkgdb currently has a notion of 'collections' which indicate what branches we have active (F24? F23? EPEL7?). We use the pkgdb API around town in lots of scripts to figure out what kinds of things to render, show, and update, etc.. It was kind of like a primordial PDC. So, now that we (will) have PDC, do we update PDC from pkgdb when an admin adds a new collection there. Or do we update pkgdb from PDC when an admin adds a new release there. Do we make PDC the canonical source of truth about what releases/etc we are building, and have pkgdb just mirror that, or vice versa? I'm inclined to favor the former (making PDC the canonical source).
- We'll use the component-groups feature to indicate what rings things are in. Should PDC just be the place to get and update that info, or should pkgdb grow that feature and PDC can just mirror pkgdb?
The Hand-Wavy Future
Beyond having a system that knows what inputs go into which releng artifacts (PDC), it would be great to then develop tooling around that data source. For instance:
- it would be cool if when we're doing the rawhide compose we can look at see that nothing has changed in XFCE so we don't rebuild that livecd, but we do rebuild other artifacts where things actually changed.
- furthermore, with that kind of knowledge we can rebuild artifacts as their inputs change (fedmsg) instead of doing things on a nightly or semi-annual basis like we do now.
- it would be cool to produce reports on the different editions and their artifacts over time. i.e., show how the size of the workstation image is growing (so we can fix it) or show how the size of the cloud image is shrinking (so we can celebrate).
- it would be cool to automatically impose gating via taskotron for some artifacts, depending on what "rings" (Fedora.NEXT) the inputs are in and what policies we have associated with those rings.
- leverage taskotron QA checks to create side-tags where we automatically rebuild stuff in the event of soname bumps. We could then also auto-gate artifacts and keep them from reaching the next step in the process if (for instance) things fail depcheck. Say, stuff in ring 0 and ring 1 require tests X, Y, and Z, but ring 2 requires less. we could make sure that "rawhide is never broken".
- it could be auspicious to build artifacts immediately (as their inputs change) but to gate publication to the mirrors on some sort of human sign-off from releng.
These are all things that are not a part of this Change, but are ideas that will be easier to implement after this Change is completed.
If PDC is the system that knows what we build and what goes into what, consider that also that pungi/koji knows how to build those things (or, it should). We're missing then a third system that knows when to do those rebuild. For a time we were thinking of writing it from scratch and calling the system Outhouse. Think of it as a rewrite of the collection of shell scripts in the releng repo into a continuously-running daemon. After discussions at Flock 2015, we started considering using a privileged instance of Taskotron for this instead of writing something from scratch.
We considered that we can't necessarily use the qa instance of taskotron as-is. We would need a releng trigger system to have rights to do things with admin permissions in koji, and the existing taskotron instance is in the QA network -- the nodes there are of an insufficient security grade.
We could deploy a second instance of the taskotron software on release engineering maintained nodes (call it "relengotron") to deal with this.
Writing relengotron tasks -- Check out the format for taskotron tasks. We would need to write new taskotron "directives" for interfacing with PDC and pungi, but after that, the task of writing releng "rules" would be relatively straightforward, and would be readable -- and maintainable!
Why a Change Proposal?
Josh Boyer asked the question on the rel-eng list. This is entirely self-contained in Rel-Eng and Infrastructure.
- Is there anything FESCo needs to review here?
- Is it even FESCo's responsibility to approve it?
The answer is perhaps "no" to both questions. However, we're submitting a F24 Change mainly to raise visibility of the effort.
Benefit to Fedora
If Fedora is the sausage, then the releng toolchain is "how the sausage gets made". We'll hopefully end up with a sausage-making pipeline that is less gross and more maintainable.
Scope
Note that this change should not affect any other development efforts. It does not require new instrumentation of any of our existing tools and so, should it fail as a project, there is no need for a contingency plan to back things out -- we can just abandon it.
- Proposal owners:
- Set up a devel instance of PDC (already done here).
- Write pdc-updater, the daemon that updates PDC with data from our existing toolchain (via fedmsg).
- Write an audit script that checks that PDC's data is consistent.
- Set up and deploy staging and production instances of PDC and pdc-updater in fedora-infra.
- Run the audit scripts to ensure that PDC's knowledge is consistent with the actual state of our release infra.
- Install the audit script in cron (or something) and attach it to a nagios alert, so we're made aware of inconsistencies.
- Other developers: N/A (not a System Wide Change)
- Release engineering: N/A (not a System Wide Change)
- List of deliverables: N/A (not a System Wide Change)
- Policies and guidelines: N/A (not a System Wide Change)
- Trademark approval: N/A (not needed for this Change)
Upgrade/compatibility impact
N/A (not a System Wide Change)
How To Test
The audit script should let us know if PDC's data is consistent with our release infra's output.
User Experience
N/A (not a System Wide Change)
Dependencies
N/A (not a System Wide Change)
Contingency Plan
- Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
- Contingency deadline: N/A (not a System Wide Change)
- Blocks release? N/A (not a System Wide Change), No
- Blocks product? N/A (not a System Wide Change)
Documentation
N/A (not a System Wide Change)