From Fedora Project Wiki
m (formatting)
(→‎Problem Space: Link to feature page)
Line 1: Line 1:
= Problem Space =
= Problem Space =


The user experience on Fedora VMs running on [http://aws.amazon.com/ec2/ Amazon EC2] would benefit from yum mirrors hosted within Amazon's cloud network.  In particular...
The user experience on Fedora VMs [[Features/EC2|running on Amazon EC2]] would benefit from yum mirrors hosted within Amazon's cloud network.  In particular...
* Such mirrors will be considerably faster.
* Such mirrors will be considerably faster.
* Data transfer charges will be reduced.
* Data transfer charges will be reduced.

Revision as of 19:31, 30 September 2010

Problem Space

The user experience on Fedora VMs running on Amazon EC2 would benefit from yum mirrors hosted within Amazon's cloud network. In particular...

  • Such mirrors will be considerably faster.
  • Data transfer charges will be reduced.
    • Intra-region S3-to-EC2 traffic is free.
    • Intra-zone data transfer between EC2 instances is free.
  • Users with hundreds of EC2 instances do not place additional load on existing public mirrors.

Solution Overview

The most economical solution is to place yum mirrors in region-specific S3 buckets and direct clients toward the mirrors inside their respective regions. Fedora will need to create these buckets and keep them up to date with a script since S3 does not support direct filesystem access reliably. This is the solution Amazon uses for their newly-released Amazon Linux repositories.

AWS Credentials

Fedora needs an AWS account to use for managing these buckets. For a script to be able to push things to a S3 bucket it needs a set of REST keys that give it access. People most commonly use the keys for the account that pays for and manages the S3 bucket. To minimize damage in case of compromise, however, each region will use a separate set of credentials. We can do this with either per-region sub-accounts and consolidated billing or the new IAM service and per-region, task-specific keys.

S3 Buckets

The S3 buckets themselves will contain mirrors of Fedora's i686 and x86_64 repositories for every release we publish on EC2. Clients inside EC2 can then access yum repositories via region-specific URIs such as http://s3.amazonaws.com/fedora-mirror-us-west/fedora/linux/releases/13/Everything/x86_64/os/.

Since Amazon charges for data transfer from S3 buckets to the rest of the Internet these buckets will be accessible only by clients inside EC2. S3's REST API allows one to create ACLs based on host IP addresses, so we will prevent outside access to these mirrors by allowing access only to EC2-internal IP addresses (the 10.x.x.x range).

Question
Does the hostname in the above URI resolve to an IP address internal to EC2 when an EC2 instance attempts to resolve it?

Client Access

Yum needs to know which region a given client resides in so it can use the correct region's mirror. We cannot do this via MirrorManager's normal IP block-based mechanism because EC2 instances' IP addresses are too volatile. A running instance can query EC2 to discern which region it is located inside via its internal REST API.

Fedora's Infrastructure team recommends a two-part approach that avoids munging instances' yum configuration files:

  • A yum plugin performs this query and informs MirrorManager of an instance's region via an additional mirrorlist flag.
  • MirrorManager interprets this flag and prepends the URI of the appropriate mirror for the region, if any, to the list of mirror that it returns.

Updating S3 Mirrors

S3 buckets are accessible via a REST API, which makes normal filesystem access difficult and very slow at best. Instead we will use a script that fetches updated packages and metadata files and pushes them to each region's S3 bucket. This script will either run on Fedora's regular infrastructure or on one EC2 instance per region, each of which uses separate credentials.

Idea
Can Amazon's VPC service connect to Fedora's OpenVPN gateway so Release Engineering people can access them normally?

Action Items

Finalize this Proposal

  • Decide whether to use IAM or AWS sub-accounts.
  • Decide who will manage "official" Fedora AWS credentials.
  • Decide whether to run the S3 bucket population script on Fedora servers or EC2 instances.
  • Decide how these scripts and possibly EC2 instances will be managed. (Involve Infrastructure in the discussion.)

Implement and Document

  • Ask Amazon officials what support/subsidies they can provide for our finalized proposal.
  • Reserve appropriately-named S3 buckets for Fedora's yum mirrors in each AWS region well in advance of several releases.
  • Add appropriate ACLs to these S3 buckets.
  • Write yum-plugin-aws.
  • Add AWS region flag support to MirrorManager.
  • Document and script repository population and updating.
  • Document when and how to retire S3-based yum mirrors of old releases.