What
This is a project to convert our current package source control (CVS) into git. See Dist_Git_Proposal.
Who
- Jesse Keating
- Toshio Kuratomi
Road Map
There may be multiple proofs of concept as we tweak various components and layouts, but generally they will follow these phases:
Phase 1
This phase is all about the import. This is where we convert the CVS content into git, and then provide access to the content via git:// anon clones. Content won't be modified other than imported and branches created for the various releases.
Phase 2
This phase is all about providing write access to the repos via ssh. It is during this phase that we will figure out how to generate an ACL set and apply these ACLs to the repos, including branch level ACLs. Once this phase completes we will be able to offer write access to the git repos for maintainers.
Phase 3
This is the tooling phase, where we develop fedpkg, the utility to replace the Make system. Once this phase is complete we will have a package maintainers can download and use to interact with their repos in various ways.
Phase 4
This is the building phase where we allow packages to be built in a test koji instance from our dist-git repos.
Implementation Plans
Currently we are evaluating a number of tools to perform the tasks needed.
CVS import
parsecvs is the tool we're currently using to convert CVS history into git format. It has been used to convert a number of projects, including xorg and the Gnome projects, so it has a good track record. It is quite fast, is able to translate CVS commit names into full git like name+addresses, and seems to handle our packages well. A script has been written that processes the CVS ,V files via parsecvs and creates the proper branches for release subdirs. The last full run took roughly 900 minutes to complete.
A trail import is available to the public via a public test system. Modules are exported via the git:// or ssh:// protocol, and the url format is git://pkgs.stg.fedoraproject.org/<module>
or ssh://[fedoraccount@]pkgs.stg.fedoraproject.org/<module>
For example, in order to clone the yum module anonymously, one would enter:
git clone git://pkgs.stg.fedoraproject.org/yum
To clone the yum module with write access, one would enter:
git clone ssh://pkgs.stg.fedoraproject.org/yum
after which git push
should just work, provided you have pkgdb rights to commit to the yum module.
Git ACLs
Unlike CVS, where we used subdirectories for "branches", and thus would be able to apply filesystem ACLs on the subdirs, git does not provide an easy way to do filesystem ACLs at a branch level. Therefor we will need to use an extra layer in order to accomplish our needs. This is not unlike our current use of CVS, where we rely on file system group ID to provide write access, and then use the cvs Avail system to restrict that down.
Currently we are evaluating gitolite to provide the ACLs. It has the ability to provide users write access only to specific branches.
gitolite does have a few problems for our use that we are working out with upstream.
- It defines user groups internally rather than using getent
- It is designed around every user logging in through a single system user via ssh keys
- The config file system does not quite scale to our size
gitolite upstream has created a branch of the code for multi-user huge config file use, namely the Fedora case, and is committed to making it work for us.
A preliminary script has been written that takes data from pkgdb and getent in order to draft a gitolite config file which can then be "compiled" into what gitolite uses internally to check ACLs.
gitolite works by running a gl-auth-command when an ssh connection is initiated. This is controlled by .ssh/authorized_keys, much in the way that our current CVS server setup is done. gl-auth-command will then check ACLs against a pre-compiled hash and if so allowed, will pass the rest of your ssh command on to the local git command. The update hook in each repo will also check permissions to see if you have rights to do whatever it is you are doing on whichever ref you are trying to do it on (master, F-12, a tag, etc..) This does bring up the wrinkle of admins with shell access to the git server, who we can't force gl-auth-command in via authorized_keys. These few people will bypass the first auth check, and we'll have to rely upon file system permissions + the repo update hook to deny them access to things which they shouldn't access. A symlink on the filesystem will need to be provided so that people interacting directly with git use the same paths as people who interact via gl-auth-command (which defines its own git root path).
Another problem is that when using the update hook, and not using gl-auth-command (eg people who have full shell access) the update hook will not allow writing due to missing data in the environment. There are a couple different ways this could be fixed.
- An early check in the update hook that exits 0 if gl-auth-command wasn't used, bypassing branch level ACLs
- Running a secondary ssh server for admin shells, forcing all git traffic through gl-auth-command
- Forcing gl-auth-command for every ssh user, with a check in gl-auth-command that detects non-git actions and drops to the shell if the user is an admin user
All have their pros and cons, but the last option seems the best currently, due to not having to run more daemons, having every git action go through gl-auth-command and subsequently the update hook, and by being able to define a git base path to keep the urls short. This solution has one small wrinkle. Currently we have 'no-pty' as one of the options when using a forced command in authorized_keys. This means that admins would not be able to get a shell due to lack of a pty, so for certain users we'd have to not use that option. This may require some changes within FAS itself to make ssh auth commands and options more generic and defined by each group. This is the route we're currently taking to provide ssh write access to the repos.
Branch ACLs
There is ongoing discussion as to how to manage user created branches in our repos. The ACL system can limit these branches to a particular name space (or 3..) which is almost necessary for the ACL system to work. There are some open questions though:
- What namespace to use (private- is what CVS used)
- Who should have rights to create/commit to such branches (currently anybody with filesystem write access is allowed)
- Should builds be allowed to happen from branches (I'd prefer all official builds come from origin/master or official release branches)
fedpkg
See Dist_Git_Proposal#fedpkg for a feature list. Ideally fedpkg would operate somewhat like the koji client or python-bugzilla client, or even git works, where fedpkg takes a series of global options, and then each command takes further options. This helps in isolating development on the tool and makes adding features to specific commands quite easy (look into python-argparse for this).
Initial code drop with some functioning targets has been done. You can find the code in the fedora-packager package. An anonymous checkout can be done with:
git clone git://git.fedorahosted.org/git/fedora-packager
There are currently two files being worked on, fedpkg.py which is the command line tool to gather options/arguments, and fedpkg/__init__.py which is a python module housing the code to actually do stuff.
koji
Koji upstream already has some code to deal with git repos, however our proposed layout will be different enough to require modification. No modification has been done yet.
Recently we attempted to hook dist-git into koji and discovered some areas in koji that need modification. Namely the use of "make" needs to be optional and configurable as we don't use make, we use fedpkg. Other than that the existing code for dealing with git repos works, as long as fedpkg crafts the url sent to koji correctly. A instance of koji is running in the stg environment and it will be modified so that we can build things from dist-git with it.
When
Current target for conversion is shortly after Fedora 13 release.
How can I help?
Find Jesse Keating (Oxf13) on freenode IRC