Bowlofeggs (talk | contribs) (→Detailed Description: Update the ascii art diagram for the current plan) |
Bowlofeggs (talk | contribs) m (→Detailed Description: Fix diagram offset) |
||
Line 41: | Line 41: | ||
<pre> | <pre> | ||
+--+--------+ +------------------------+ | +--+--------+ +------------------------+ | ||
| koji +^--------------+ fedpkg container build + | | koji +^--------------+ fedpkg container build + | ||
+--+---+----+ +------------------------+ | +--+---+----+ +------------------------+ |
Revision as of 16:49, 7 September 2016
Fedora Scale-Out Docker Registry
Summary
This is a proposal for a change to the Fedora Infrastructure and Fedora Release Engineering tooling to provide a scalable Docker Registry solution for Fedora that is integrated with the Fedora Docker Layered Image Build Service.
Owner
- Name: Adam Miller and Randy Barlow
- Email: maxamillion@fedoraproject.org and bowlofeggs@fedoraproject.org
- Release notes owner:
Current status
Detailed Description
+--+--------+ +------------------------+ | koji +^--------------+ fedpkg container build + +--+---+----+ +------------------------+ | ^ ^ | +--+---+----+ | | +----------------------+ | OSBS | | docker/distribution | | +---------^+ registry | +-----------+ | | | (candidate builds) | +---------------+------- | | | | | | | +----------v----------+ | | | Small Python script | | | +----------+----------+ | | | | | | | +----------------+ +---------v----------+ | | | Mirror Manager | | Mirror list | | master mirror | | | +-----------------+--+ +----+----+------+ | ^ | | | | | | | | | | ^ | | +---------+-------------------+ | | | | | | +------------------------^+ "Mirror Network" | | | | | (All our ^olunteer mirrors) | | | | +---------------------+ | | | | | +-----------------------------+ | | | | | | | | | | | | | | | | | | | | | ^ | ^ +--+----+---+---+-+ | Users | | (docker pull) | +-----------------+
Background
registry: a collection of docker image repositories
repository: named after an image and is a collection of multiple tags of an that image
tag: an arbitrary string assigned to a specific docker image (identified by the image's sha256 checksum) NOTE: The "latest" tag is special and is assumed if no tag is provided. This is true also for a 'docker pull' operation and an image tagged "latest" will be the default image pulled by users.
Proposal
Pulp[0] + Crane[1] + MirrorManager[2] + Docker Distribution[3]
- Pulp is a platform for managing repositories of content, such as software packages, and making it available to a large numbers of consumers. It is also capable of managing docker content.
- Crane is a stand-alone python flask wsgi application written by the Pulp team to serve as a API entry point for the docker client and will answer to an user's 'docker pull'. It does not however create content manifests or provide hosting for docker image content, instead it depends on someone creating the manifest metadata themselves or having pulp publish it and serves 302 redirects to the docker client so they can find where the docker images actually live.
- MirrorManager is what Fedora uses to manage the public mirror network and distribute content.
- Docker Distribution is the defacto standard open source implementation of the Docker Registry V2 API spec[5]. It provides many features but the ability to have it's back-end storage be provided by a "mirror network" much like the one Fedora has at it's disposal is not one of them. The reason we need this in place is because the mechanism by which you could push a docker image directly to Pulp in Docker Registry v1 no longer exists in v2 so we must instead perform a "sync" operation between the two. (This is a common problem for all known "third party" v2 registry implementations).
Workflow
- OSBS will perform Builds, as these builds complete they will be pushed to the docker-distribution (v2) registry, these will be considered "candidate images". Pulp will sync and publish the candidate repository.
- Testing will occur using the "candidate images" (details of how we want to handle that are outside the scope of this proposal).
- A "candidate image" will be marked stable once it's criteria have been satisfied to do so. (This is vague because this is a topic of ongoing discussion and work to decide what criteria an image will need to abide by before being considered "stable" and promoted as such)
- Once stable, pulp will publish that repository's content to a directory, we will split that content and sync the image layers along with their metadata to Mirror Manager master mirror. We will also sync the repo metadata published by Pulp to somewhere Crane can pick it up. (This could and will likely be something that Bodhi triggers via the Pulp REST API)
- Mirror Manager will distribute to the mirrors the image layers and their metadata.
- Crane will get the new repository metadata and will serve redirects to the new content relative to download.fedoraproject.org which will perform another redirect (via MirrorManager) where the docker client upon a "docker pull" will find it's content.
Technical Details
Some more in depth technical items around this solution that I think the Fedora Infrastructure Team are likely interested in:
Pulp Requirements
- An AMPQ message queue, currently qpid and rabbitmq are supported upstream. However, the requirement appears to stem from the use of Celery[5] and Celery upstream supports redis[6] as a broker backend so I have requested that it be made available as supported option Pulp[7]. This will obviously take some amount of dev time, but we can plan for that if adding a message queue to Fedora Infra is a show stopper.
- MongoDB, this is currently a hard requirement but postgresql is planned replace MongoDB in the future[8] (probably a year-ish timeline on that). The question is, can we wait that long from a Fedora Project standpoint for the new feature before having a solution in place? I imagine some of this will need to be planned/scoped as time goes on and we learn more but it's worth keeping in mind
- Storage. I've been told Pulp likes a lot of storage, I don't know hard numbers for what we'd need since we're getting into uncharted territory but I've heard that a few hundred GB is not uncommon in pulp deployments when combining the MongoDB storage needs with all the artifacts in the repos.
Crane Requirements
- Crane is just a small python wsgi app written in flask
Potential Problems
After some discussion in #fedora-releng, we came to the realization that we will need to take some action to get the docker client to work with Fedora's extensive mirror network. One concern was that mirror manager cannot simply send an HTTP 302 redirect to a mirror, as mirrors often lag the current state of the repositories. Thus, mirror manager will need to return the current metadata to the client so that the client knows which blob checksums are current, and it will need to redirect requests for those blobs to mirrors. The yum client receives its redirects through Metalink responses that list multiple options for retrieving the packages.
Testing was performed with the docker client to determine whether it handles metalink responses correctly, and it was determined that the docker client does not currently understand metalink responses when retrieving blob files from a docker registry. In order to leverage Fedora's mirror network to distribute docker blobs, we will need to make a contribution to the docker client so that it will understand how to interpret MirrorManager's metalink responses so that it can try again with a different mirror when receiving a 404 response from an out-of-date mirror.
The testing was performed with a modified version of Crane. A patch was applied that caused crane to send a 200 metalink response to all requests for blobs that contained a link to a path being served by Pulp via httpd. The httpd logs were watched for activity, but the docker client did not attempt to retrieve any blobs during a docker pull and complained about the blob not matching the expected checksum. The checksum complaint is likely due to docker hashing the metalink XML and comparing it to the expected blob's checksum, since it is clear that the docker client did not visit Pulp to download the blob.
General Notes
A couple of things to note about maintenance and uptime considerations:
The Intermediate docker-distribution registry is needed for builds in koji+OSBS
Pulp will be required for "promotion" of builds from candidate to testing or stable
Crane will be required for end users out in the world to access in order to actually pull down Docker images from us.
The only service here that needs to be public end-user facing (i.e. wide open to the internet and not have access locked to a FAS group) is Crane. All other components should be able to be locked down similar to the "Fedora internal" components koji (builders, etc), bodhi (signing, etc) and similar.
- [0] http://www.pulpproject.org/
- [1] https://github.com/pulp/crane
- [2] https://github.com/fedora-infra/mirrormanager2/
- [3] https://github.com/docker/distribution/
- [4] https://docs.docker.com/registry/spec/api/
- [5] http://www.celeryproject.org/
- [6] http://redis.io/
- [7] https://pulp.plan.io/issues/1900
- [8] https://pulp.plan.io/issues/1803
Benefit to Fedora
This will allow for Fedora to provide packages, software, and other content in the form of a Docker Image as an officially released artifact from the Fedora Project that is released and hosted much in the same way RPMs are today. These images can then be included in the distribution in various ways. This could potentially be used by the Modularization effort or by any other part of the Fedora.next initiative that may arise.
Scope
Proposal owners
Proposal owners shall have to:
- Implement the proposed Design of a Scaled-Out Docker Registry
- Deploy Pulp
- Deploy Crane
- Deploy Docker-Distribution Registry
- Integrate with MirrorManager for content distribution
- Document the system
Task matrix
This is a RACI matrix for tasks required to implement the RelEng Automation Workflow Engine. Work is tracked in Taiga: http://taiga.cloud.fedoraproject.org/project/acarter-fedora-docker-atomic-tooling/wiki/home
Is this current?
It is, as of 2016-09-07
Definitions
Here, we're using what Wikipedia calls "RACI (alternative scheme)":
- Responsible
- The person responsible for the performance of the task. There should be exactly one person with this assignment for each task.
- Assists
- Those who assist completion of the task.
- Consulted
- Those whose opinions are sought; and with whom there is two-way communication.
- Informed
- Those who are kept up-to-date on progress; and with whom there is one-way communication.
Task Table
Task | Subtask | Responsible | Assists | Consulted | Informed | Current Status |
---|---|---|---|---|---|---|
Implement the proposed design of a Scaled-Out Docker Registry | Adam Miller | 0% | ||||
Deploy solution, including ansible playbooks added for Fedora Infrastructure Ansible repo | Adam Miller | 0% | ||||
Deploy Pulp | Adam Miller | 0% | ||||
Deploy Crane | Adam Miller | 0% | ||||
Deploy docker-distribution registry | Adam Miller | 0% | ||||
Integrate with MirrorManager for content distribution | Adam Miller | 0% | ||||
Document the system | Adam Miller | 0% |
Glossary of Nicknames
- maxamillion Adam Miller
- bowlofeggs (rbarlow) Randy Barlow
Various Task Notes
Functional Requirements
The following features are functional requirements
- Users must be able to perform a
docker pull registry.fedoraproject.org/fedora
and have the actual image layer data come from a local mirror via mirrormanager.
Other developers
- (anything here)?
Upgrade/compatibility impact
N/A (not a System Wide Change)
How To Test
Once the service is deployed, users can perform the following on their systems to test.
$ dnf -y install docker $ systemctl start docker $ docker pull registry.fedoraproject.org/fedora
N/A (not a System Wide Change)
User Experience
N/A (not a System Wide Change)
Dependencies
N/A (not a System Wide Change)
Contingency Plan
- Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
- Contingency deadline: N/A (not a System Wide Change)
- Blocks release? No (not a System Wide Change)
- Blocks product? N/A
Documentation
FIXME