GFS2
Summary
A cluster filesystem allowing simultaneous access to shared storage from multiple nodes, designed for SAN environments. It is also possible to use GFS2 as a single node (local) filesystem by selecting the "lock_nolock" locking protocol.
Owner
- Name: Steven Whitehouse
- email: <swhiteho@redhat.com>
- mailing list: <cluster-devel@redhat.com>
Current status
- Targeted release: Fedora 42
- Last updated: (30 Mar 2009)
- Percentage of completion: 100%
The previously pending patches have now gone upstream for 2.6.30. We thus have all the most important components of this feature in place.
Detailed Description
GFS2 is part of the upstream kernel, but is still listed as experimental. The plan is that this will become stable before the release of F-11. Also the gfs2-utils package is part of Fedora already, and again we hope to declare this stable before F-11.
Benefit to Fedora
The main benefit is a stable cluster filesystem which works seamlessly with the Red Hat cluster infrastructure.
Scope
Most of the remaining work now is testing and bug fixing.
How To Test
Read the docs, create a filesystem, run an application on it, check to see whether there are any problems/bugs and if so report them via the usual bugzilla process.
We will also be running the Red Hat QE tests, some performance tests and basically anything else that we can get our hands on in order to try and cover as many possible tests as possible. Any filesystem test suite would be a good thing to test with, whether for performance or correctness. We also want to see lots of testing with real applications, Apache, Samba, NFS (over GFS2), exim, sendmail, yourfavouriteapplicationhere, etc. Basically anything that uses the filesystem.
You don't need any special hardware to do single node tests - you can create a filesystem in a single file and mount it loopback. For multiple node tests you will need some shared storage (iSCSI, FC, or some other kind of SAN) plus a method of fencing failed nodes (this can be done manually if you don't have any fencing hardware, but power switches and/or remote access controllers are recommended).
If everything is working correctly, the results should be exactly the same as you'd expect running the application on a local filesystem. One point to watch though is that many applications are not written to run in a clustered environment, so if you are expecting multiple copies of an application to share the same set of data files, then please check that the application does support this mode of operation first. Usually it will require some method for inter-node communication at the application level.
User Experience
The GFS2 filesystem allows sharing of a filesystem across multiple nodes in an HA environment.
Dependencies
This feature depends on the cman package, the corosync package and the dlm kernel module, which are already part of Fedora.
Contingency Plan
If this is not ready in time, we can just push out that date at which we consider GFS2 stable. There are no other packages at the moment which depend on this feature. Bearing in mind that this is almost complete, it is fairly unlikely that we will have to do this.
Documentation
- Cluster Suite on Wikipedia
- GFS/GFS2 on Wikipedia
- Cluster Wiki page
- GFS2 kernel documentation (a very basic introduction)
- GFS2 Glock documentation (if things go wrong, this explains what you need to know in order to find out why)
Release Notes
There are a few local file system operations that are not supported, or that are slightly different on GFS2. Here are the main things to watch out for:
- The flock() system call is not interruptible Bug #421321 - maybe fixed before release.
- The fcntl() F_GETLK returns a pid which may, or may not be on the current node (there is no way to indicate the node on which the process exists with the current interface - beware if you have an application that uses this interface to get a pid to send signals to).
- Leases are not supported with lock_dlm, but they are supported with lock_nolock.
- Locking is based upon a single lock per inode. Applications which either write to a single file from multiple nodes or which insert/remove lots of files from a single directory will be slow. This is the single most frequently asked question regarding GFS/GFS2 performance and often occurs in relation to email/imap spool directories. The answer in each case is to break up the single large spool into separate directories, and to try to keep each set of files "local" to one node, as far as possible. Likewise, don't try to mmap() a file and use it as distributed shared memory: it will work, but it will be so slow that it makes no sense to do so.
- If you've used previous releases of GFS/GFS2 you might be wondering where the "lock modules" have got to. The answer is that they have been merged into the main GFS2 module, so you no longer need to load them separately. The mount options have remained the same though. (N.B. The final part of this is still in the -nmw git tree, but it will be merged in the next kernel.org merge window).
- fallocate is not supported, but is on the TODO list Bug #455572.
- XIP is not supported, but is also on the TODO list Bug #455570).
- FIEMAP is supported, but currently only for regular files and not for xattrs (again the xattr extension is on the TODO list).
- The internal glock state of GFS2 is accessible via debugfs.
- dnotify will work on a "same node" basis, but its use with GFS2 is not recommended.
- inotify will work on a "same node" basis, but we don't currently recommend its use.
Comments and Discussion