CloudFS
Summary
A "cloud ready" version of GlusterFS, including additional auth*/crypto/multi-tenancy.
Owner
- Name: Jeff Darcy
- Email: jdarcy@redhat.com
Current status
- Targeted release: Fedora 15
- Last updated: 2010-11-12
- Percentage of completion: 20%
Detailed Description
CloudFS is intended to be a version of an existing distributed/parallel filesystem that's suitable for deployment by a provider as a permanent, shared service. It could also be used as infrastructure for hosting virtual-machine images, and in fact the underlying technology (GlusterFS) is often used in this role currently. Users can already deploy this class of filesystem privately in the cloud, within their own virtual machines, but they pay both a performance and an administrative cost for doing so. Running servers natively and doing the configuration/administration only once could be a compelling option for anyone building a Fedora-based cloud, but requires some additional features. Specifically, CloudFS provides:
- Stronger authentication and authorization.
- Encryption, both on the wire and at rest.
- Multi-tenancy (isolating tenants' namespaces from one another).
- Quota and accounting support.
All of these features can be implemented in a modular way, so that users can deploy only those they deem necessary or appropriate for their situation.
The long-term plan for CloudFS includes multi-site replication, but that is not part of the current proposal.
Benefit to Fedora
Best-in-class cloud storage with a full and familiar POSIX API, high performance, and strong security. This functionality can be used either as infrastructure for the cloud itself, or as a service providing additional functionality directly to users.
Scope
The scope of work for this feature is mostly limited to the existing glusterfs
package, though it might also require creation of a new package with that as a dependency. Each of the features described above can be implemented as a "translator" (separately loadable module/plugin). There might need to be some enhancement of the GlusterFS CLI etc. to support configuration of currently-unrecognized translators using standard commands. Lastly, some features will also require commands/libraries to manage tenants and tenant-associated artifacts (e.g. encryption keys), to set quotas, to retrieve usage information, etc.
How To Test
As a distributed filesystem, testing requires at least two and ideally four or more server nodes. Since the specific goal of CloudFS is to provide various kinds of protection between tenants it's best to have at least two client nodes mounting as different tenants. All of these nodes must have the glusterfs
package installed, and can be virtual for testing purposes.
Configuration is mostly as described at [1], with additional TBD options on the "gluster volume create" line to load CloudFS-specific translators.
Testing is largely feature-dependent. Referring to the above feature list:
- Auth*: verify that different users can mount and use their own portions of the shared filesystem, and not mount/use each other's.
- Encryption: verify that data are being encrypted in transit (using
tcpdump
or similar), that they are encrypted on disk, that a user with the appropriate keys and configuration can successfully "unwind" both stages of encryption to arrive back at plaintext.
- Multi-tenancy: verify that tenant identities are being determined correctly, that files are being placed into appropriate tenant-specific areas, that tenants cannot see or modify each other's files.
- Quota/accounting: verify that a user cannot exceed quota, that the reported usage matches actual usage.
User Experience
Filesystems are notoriously "invisible" to users when they work. The real "user" experience for CloudFS is actually the experience of the cloud provider or cloud tenant (account holder) as they configure their respective parts of the system. This experience includes the following aspects.
- How hard is it for the provider to add/remove tenants, either manually or as part of their general cloud-provisioning UI (including self-service)?
- How does the provider manage quotas, retrieve usage information for billing, etc.?
- How hard is it for the tenant, once access to a resource (directory on the shared filesystem) has been configured, to construct the necessary
mount
command lines and actually access the resource?
- Is the observed performance, reliability, etc. of storage in CloudFS consistent with users' other expectations of performance in the cloud?
Dependencies
The only major dependency is CLI support from the glusterfs
upstream, as noted above. Even that's not a hard dependency; if worst comes to worst all of the additional CloudFS functionality can be managed using separate programs/libraries.
Contingency Plan
None necessary. The existing glusterfs
functionality will remain intact.
Documentation
TBD. It's not that it doesn't exist; it's just not very organized yet.
Release Notes
None needed.