m (Formatting: header1 -> header2) |
|||
Line 1: | Line 1: | ||
= Problem Space = | == Problem Space == | ||
The user experience on Fedora VMs [[Features/EC2|running on Amazon EC2]] would benefit from yum mirrors hosted within Amazon's cloud network. In particular... | The user experience on Fedora VMs [[Features/EC2|running on Amazon EC2]] would benefit from yum mirrors hosted within Amazon's cloud network. In particular... | ||
Line 8: | Line 8: | ||
* Users with hundreds of EC2 instances will not place additional load on existing public mirrors. | * Users with hundreds of EC2 instances will not place additional load on existing public mirrors. | ||
= Solution Overview = | == Solution Overview == | ||
The most economical solution is to place yum mirrors in region-specific S3 buckets and direct clients toward the mirrors inside their respective regions. Fedora will need to create these buckets and keep them up to date with a script since S3 does not support direct filesystem access reliably. This is the solution Amazon uses for their newly-released [http://aws.amazon.com/amazon-linux-ami/ Amazon Linux] repositories. | The most economical solution is to place yum mirrors in region-specific S3 buckets and direct clients toward the mirrors inside their respective regions. Fedora will need to create these buckets and keep them up to date with a script since S3 does not support direct filesystem access reliably. This is the solution Amazon uses for their newly-released [http://aws.amazon.com/amazon-linux-ami/ Amazon Linux] repositories. | ||
= AWS Credentials = | == AWS Credentials == | ||
Fedora needs an AWS account to use for managing these buckets. For a script to be able to push things to a S3 bucket it needs a set of REST keys that give it access. People most commonly use the keys for the account that pays for and manages the S3 bucket. To minimize damage in case of compromise, however, each region will use a separate set of credentials. We can do this with either per-region sub-accounts and [http://aws.amazon.com/about-aws/whats-new/2010/02/09/announcing-consolidated-billing-for-aws-accounts/ consolidated billing] or the new [http://aws.amazon.com/iam/ IAM service] and per-region, task-specific keys. | Fedora needs an AWS account to use for managing these buckets. For a script to be able to push things to a S3 bucket it needs a set of REST keys that give it access. People most commonly use the keys for the account that pays for and manages the S3 bucket. To minimize damage in case of compromise, however, each region will use a separate set of credentials. We can do this with either per-region sub-accounts and [http://aws.amazon.com/about-aws/whats-new/2010/02/09/announcing-consolidated-billing-for-aws-accounts/ consolidated billing] or the new [http://aws.amazon.com/iam/ IAM service] and per-region, task-specific keys. | ||
= S3 Buckets = | == S3 Buckets == | ||
The S3 buckets themselves will contain mirrors of Fedora's i686 and x86_64 repositories for every release we publish on EC2. Clients inside EC2 can then access yum repositories via region-specific URIs such as http://fedora-mirror-us-west-1.s3.amazonaws.com/fedora/linux/releases/13/Everything/x86_64/os/. | The S3 buckets themselves will contain mirrors of Fedora's i686 and x86_64 repositories for every release we publish on EC2. Clients inside EC2 can then access yum repositories via region-specific URIs such as http://fedora-mirror-us-west-1.s3.amazonaws.com/fedora/linux/releases/13/Everything/x86_64/os/. | ||
Line 24: | Line 24: | ||
{{admon/note|Question|Does the hostname in the above URI resolve to an IP address internal to EC2 when an EC2 instance attempts to resolve it?}} | {{admon/note|Question|Does the hostname in the above URI resolve to an IP address internal to EC2 when an EC2 instance attempts to resolve it?}} | ||
= Client Access = | == Client Access == | ||
Yum needs to know which region a given client resides in so it can use the correct region's mirror. We cannot do this via MirrorManager's normal IP block-based mechanism because EC2 instances' IP addresses are too volatile. | Yum needs to know which region a given client resides in so it can use the correct region's mirror. We cannot do this via MirrorManager's normal IP block-based mechanism because EC2 instances' IP addresses are too volatile. | ||
Line 82: | Line 82: | ||
* '''Requires changes to MirrorManager.''' (same as above) | * '''Requires changes to MirrorManager.''' (same as above) | ||
= Updating S3 Mirrors = | == Updating S3 Mirrors == | ||
S3 buckets are accessible via a REST API, which makes normal filesystem access difficult and very slow at best. Instead we will use a script that fetches updated packages and metadata files and pushes them to each region's S3 bucket. This script will either run on Fedora's regular infrastructure or on one EC2 instance per region, each of which uses separate credentials. | S3 buckets are accessible via a REST API, which makes normal filesystem access difficult and very slow at best. Instead we will use a script that fetches updated packages and metadata files and pushes them to each region's S3 bucket. This script will either run on Fedora's regular infrastructure or on one EC2 instance per region, each of which uses separate credentials. | ||
Line 88: | Line 88: | ||
{{admon/note|Idea|Can Amazon's [http://aws.amazon.com/vpc/ VPC service] connect to Fedora's [[OpenVPN_Infrastructure_SOP|OpenVPN gateway]] so Release Engineering people can access them normally?}} | {{admon/note|Idea|Can Amazon's [http://aws.amazon.com/vpc/ VPC service] connect to Fedora's [[OpenVPN_Infrastructure_SOP|OpenVPN gateway]] so Release Engineering people can access them normally?}} | ||
= Action Items = | == Action Items == | ||
=== Finalize this Proposal === | === Finalize this Proposal === |
Revision as of 00:36, 1 October 2010
Problem Space
The user experience on Fedora VMs running on Amazon EC2 would benefit from yum mirrors hosted within Amazon's cloud network. In particular...
- Such mirrors will be considerably faster.
- Data transfer charges will be reduced.
- Intra-region S3-to-EC2 traffic is free.
- Intra-zone data transfer between EC2 instances is free.
- Users with hundreds of EC2 instances will not place additional load on existing public mirrors.
Solution Overview
The most economical solution is to place yum mirrors in region-specific S3 buckets and direct clients toward the mirrors inside their respective regions. Fedora will need to create these buckets and keep them up to date with a script since S3 does not support direct filesystem access reliably. This is the solution Amazon uses for their newly-released Amazon Linux repositories.
AWS Credentials
Fedora needs an AWS account to use for managing these buckets. For a script to be able to push things to a S3 bucket it needs a set of REST keys that give it access. People most commonly use the keys for the account that pays for and manages the S3 bucket. To minimize damage in case of compromise, however, each region will use a separate set of credentials. We can do this with either per-region sub-accounts and consolidated billing or the new IAM service and per-region, task-specific keys.
S3 Buckets
The S3 buckets themselves will contain mirrors of Fedora's i686 and x86_64 repositories for every release we publish on EC2. Clients inside EC2 can then access yum repositories via region-specific URIs such as http://fedora-mirror-us-west-1.s3.amazonaws.com/fedora/linux/releases/13/Everything/x86_64/os/.
Since Amazon charges for data transfer from S3 buckets to the rest of the Internet these buckets will be accessible only by clients inside EC2. S3's REST API allows one to create ACLs based on host IP addresses, so we will prevent outside access to these mirrors by allowing access only to EC2-internal IP addresses (the 10.x.x.x range).
Client Access
Yum needs to know which region a given client resides in so it can use the correct region's mirror. We cannot do this via MirrorManager's normal IP block-based mechanism because EC2 instances' IP addresses are too volatile.
While the VM images Fedora will provide are restricted to specific regions, encoding regions directly into these images presents two main difficulties:
- We have to spin images once for each region instead of using the same image globally.
- Users can re-bundle their own versions of Fedora's stock images and start them in different regions, not only negating the benefits of this system for users, but also causing those who fund the mirrors to have to pay for data transfer.
A running instance can query EC2 to discern which region it is located inside via its internal REST API. We can use this information either at boot time or whenever yum is called to ensure yum has up-to-date information as to where it resides.
At the Cloud SIG meeting on 30 Sep 2010 [1] we discussed several possible solutions of this nature:
1. Make yum configuration files point directly to region-specific URIs
An init script can add or edit baseurl
directives in /etc/yum.repos.d/fedora*.repo that point yum to the correct mirror. Yum tries mirrors specified this way before it tries mirrors given by a mirrorlist, allowing for graceful fallback.
Advantages
- Simplicity. This is easy to write and maintain. Users can easily see where their packages come from.
- No software changes required. This uses only existing yum and mirrormanager features.
Disadvantages
- Munges configuration files.
rpm -V fedora-release
will fail. - Dependent on users leaving configuration files alone. Users who replace their yum configuration files entirely, say via puppet, will see this init script edit the VM's yum configuration every time the machine boots.
- RPM throws warnings when performing distro upgrades. Fedora's repository configuration files are marked
config(noreplace)
.
2. Add a yum variable that sends region information to MirrorManager
Recent versions of yum replace variables like $varname
in its configuration files with the contents of /etc/yum/vars/varname, if such a file exists. An init script can learn what region a VM resides in at boot time and write it to a file in that directory. Then yum can pass this as an extra parameter to MirrorManager, which looks up the value and prepends the relevant mirror(s), if any to the mirrorlist it returns. I mentioned calling the variable prepend
to help keep this facility generic.
To do this we would need to add something akin to &prepend=$mirrorlist_prepend
to Fedora's stock repo configuration files. An init script would then write /etc/yum/vars/mirrorlist_prepend at boot time on EC2 instances, while bare metal machines would lack that file and send that portion of the URI, verbatim, to MirrorManager. MirrorManager then needs to be able to interpret this value and act accordingly.
MirrorManager ignores parameters it does not recognize, so sending such a URI to a server that does not support the parameter in question still results in a useful mirrorlist.
Advantages
- Leaves configuration files untouched.
- Extensible. This gives MirrorManager a facility for specifying preferred sets of yum mirrors without relying on clients' IP addresses.
- Easy statistic gathering. Fedora's EC2 instances will clearly identify themselves as such in their HTTP requests.
- Easy to opt-out. Disabling this functionality is as easy as editing one's yum configuration.
Disadvantages
- Stock Mirrorlist URIs are longer.
- Requires changes to the fedora-release RPM. Stock repo configuration files will need to include the extra parameter.
- Requires changes to MirrorManager. While older versions of MirrorManager will still work correctly, this will require a few UI and database schema changes to work.
3. Have a yum plugin send region information to MirrorManager
We can have a yum configuration plugin query what EC2 region a VM resides in and append the result to the URI yum queries MirrorManager with in the same manner as the solution above. Fedora's stock EC2 VM image would have this yum plugin installed by default, while everyone else would not.
Advantages
- Leaves configuration files untouched.
- Extensible. (same as above)
- Easy statistic gathering. (same as above)
- Easy to opt-out. Users can disable this functionality by uninstalling the yum plugin.
Disadvantages
- Requires us to maintain another yum plugin.
- Requires changes to MirrorManager. (same as above)
Updating S3 Mirrors
S3 buckets are accessible via a REST API, which makes normal filesystem access difficult and very slow at best. Instead we will use a script that fetches updated packages and metadata files and pushes them to each region's S3 bucket. This script will either run on Fedora's regular infrastructure or on one EC2 instance per region, each of which uses separate credentials.
Action Items
Finalize this Proposal
- Decide whether to use IAM or AWS sub-accounts.
- Decide who will manage "official" Fedora AWS credentials.
- Decide whether to run the S3 bucket population script on Fedora servers or EC2 instances.
- Decide how these scripts and possibly EC2 instances will be managed. (Involve Infrastructure in the discussion.)
Implement and Document
- Ask Amazon officials what support/subsidies they can provide for our finalized proposal.
- Reserve appropriately-named S3 buckets for Fedora's yum mirrors in each AWS region.
- Add appropriate ACLs to these S3 buckets.
- Write yum-plugin-aws.
- Add AWS region flag support to MirrorManager.
- Document and script repository population and updating.
- Document when and how to retire S3-based yum mirrors of old releases.