Enable btrfs transparent zstd compression by default
Summary
On variants using btrfs as the default filesystem, enable transparent compression using zstd. Compression saves space and can significantly increase the lifespan of flash-based media by reducing write amplification. It can also increase read and write performance.
Owners
- Name: Michel Salim, Davide Cavalca, Josef Bacik
- Email: michel@michel-slm.name, dcavalca@fb.com, josef@toxicpanda.com
Current status
- Targeted release: Fedora 34
- Last updated: 2021-04-24
- FESCo issue: #2538
- Tracker bug: #1916918
- Release notes tracker: #635
Detailed description
Transparent compression is a btrfs feature that allows a btrfs filesystem to apply compression on a per-file basis. Of the three supported algorithms, zstd is the one with the best compression speed and ratio. Enabling compression saves space, but it also reduces write amplification, which is important for SSDs. Depending on the workload and the hardware, compression can also result in an increase in read and write performance.
See https://pagure.io/fedora-btrfs/project/issue/5 for details. This was originally scoped as an optimization for https://fedoraproject.org/wiki/Changes/BtrfsByDefault during Fedora 33.
Feedback
Q: How do I disable this feature?
A: Edit '/etc/fstab' and remove the 'compress=zstd:1' mount option. Save and reboot; or mount -o remount,compress=none /
Q: I want compression only on '/' and not on '/home', can I just modify fstab?
A: No. The compress(-force) mount option is file system wide, it applies to the entire fs, not per subvolume or mount point.
Q: Is there a way to do it per subvolume?
A: Yes, but there are some caveats. The 'btrfs property' command can be used to set compression per subvolume, directory, or file. Unsetting it is...tricky. The 'none' option doesn't unset compression, it prevents the compress mount option from working.
Q: What about 'chattr +c' ?
A: This is the legacy way of setting compression before 'btrfs property' existed, and still works today. However you can't specify an algorithm like you can with 'btrfs property' so it uses the current default, zlib. And zlib also applies if you use the compress(-force) mount option without specifying an algorithm. Tip: COW stands for copy-on-write so 'chattr +C' (capitalized) means nodatacow. Whereas +c (small c) stands for compression.
Q: How are 'chattr +c' and 'btrfs property' the same?
A: Internal to Btrfs they both set a compression flag on the inode.
Q: If I use 'btrfs property' to set compression, what level is used for zstd?
A: Currently the default is always used, level 3. Even if you get 'btrfs property' to set it with 'zstd:1' it will use 3.
Q: Is it safe to mix and match compression algorithms?
A: Yes.
Q: Does compression cause more fragmentation? The 'filefrag' tool shows a lot more extents on compressed files.
A: No. This is a bug or missing feature. The 'filefrag' command uses FIEMAP, and this reports the Btrfs logical address in what appears to be separate, non-contiguous, 128KiB extents. This 128KiB extent is actually the maximum compression block size. The actual physical extent on disk may be contiguous, but FIEMAP currently doesn't have a way of knowing this. So that does mean the filefrag tool is not a reliable indicator of file fragmentation.
Q: Do I have to reboot to change compression options?
A: No, you can just remount any mount point for that file system, specifying the new options, e.g. 'mount -o remount,compress=zstd:5'.
A: Known compression related bugs are cosmetic, i.e. they may be annoying but they're not risking your data. [1] [2] [3] [4]
Q: How is used and free space reporting affected by compression?
A: It depends. 'df' shows the raw physical blocks that are used and free, which means it's (unknowingly) accounting for compression for used space reporting, and assuming no compression for free space. 'ls' 'du' and 'btrfs filesystem du' are looking at the logical blocks that files use which are uncompressed. Unrelated to compression, Btrfs will inline small extents for files less than 2048 bytes, and these files are always counted as 4KiB each. So 'du' can over report, possibly by quite a lot if there are many such small files.
Q: What's the best tool to see how effective compression is?
A: The included 'compsize' tool pointed at a file or directory will report this.
Q: Why use zstd:1 specifically?
a: [5] has an analysis of the various compression levels and their impact, both in terms of CPU usage and disk space savings.
Q: Will /boot be compressed?
A: By default we still put /boot
on ext4
(though /boot
on btrfs is possible with advanced partitioning). GRUB has support for zstd-compressed Btrfs partitions since 2018 so this should work just fine.
We plan a forthcoming change once Changes/UnifyGrubConfig lands to make sure that the GRUB environment is stored in a partition that it can write to.
Currently the hidden GRUB menu in Fedora Workstation (Changes/HiddenGrubMenu, Changes/CleanupGnomeHiddenBootMenuIntegration) will not work on BIOS-based systems; after UnifyGrubConfig, this might not work even on EFI systems, so we are not recommending /boot
on btrfs for now because of this regression.
References:
man 5 btrfs
man btrfs property
man compsize
Benefit to Fedora
Better disk space usage, reduction of write amplification, which in turn helps increase lifespan and performance on SSDs and other flash-based media. It can also increase read and write performance.
Scope
- Proposal owners:
- Update anaconda to perform the installation using
mount -o compress=zstd:1
- Set the proper option in fstab (alternatively: set the XATTR)
- Update disk image build tools to enable compression:
- lorax
- appliance-tools
- osbuild
- imagefactory
- [optional] Add support for setting compression level when defragmenting
- [optional] Add support for setting compression level using
btrfs property
- Update anaconda to perform the installation using
- Other developers:
- anaconda: review PRs as needed
- Release engineering: https://pagure.io/releng/issue/9920
- Policies and guidelines: N/A
- Trademark approval: N/A
Upgrade/compatibility impact
This Change only applies to newly installed systems. See How to test
for converting an existing system.
How to test
Update /etc/fstab
to add the compress=zstd:1
mount option to /
and /home
mount points. Then remount (mount -o remount /
). It's only necessary to remount one of the two mount points. All writes will now be compressed.
Optional: Already written data can be compressed via defragmenting option, on a per directory or file basis. It takes the form btrfs filesystem defragment -czstd -rv /path/to/dir/or/file/
. It's completely OK to skip this step and just allow a file system to become compressed over time via attrition. If you're using snapshots, it's advised you don't run defragment on all snapshots because defragment will unshare the shared extents between them, leading to a potentially significant increase in space consumption.
User experience
Compression will result in file sizes (e.g. as reported by du) not matching the actual space occupied on disk. The compsize utility can be used to examine the compression type, effective compression ration and actual size.
Dependencies
Anaconda will need to be updated to perform the installation using mount -o compress=zstd:1
Contingency plan
- Contingency mechanism: will not include PR patches if not merged upstream and will not enable
- Contingency deadline: Final freeze
- Blocks release? No
- Blocks product? No
Documentation
https://btrfs.wiki.kernel.org/index.php/Compression
Simple Analysis of btrfs zstd compression level
Workstation root fs disk space savings
tl;dr;
For an installed root directory of Fedora 32, zstd:1 yields a 40% storage savings as compared to uncompressed ext4. zstd:9 yields a 43% storage savings.
Test Steps
1. Obtain an “installed.dir” by
1. Prep the image 1. truncate -s 64G installed.raw 2. Booting the live cd (in the case F32) in a qemu-kvm 1. qemu-kvm -drive file=installed.raw -cdrom $ISO -m 1G 3. Installing it in the graphical installer 1. I used regular partitions, and did not create a new one for /home 4. Shutting down the qemu instance3105440 5. Mount it to the directory 1. losetup -Pf installed.raw 2. mkdir -p installed.dir 3. mount /dev/loop1p2 installed.dir
2. Run the script below
#!/bin/sh set -e SIZE=16G if [ $EUID -ne 0 ] ; then echo "Needs to be root" >&2 exit 2 fi for compress_level in 1 3 9 ; do raw=zstd${compress_level}.raw truncate -s $SIZE $raw mkfs.btrfs -f $raw dir=zstd${compress_level}.dir mkdir -p $dir mount -o compress=zstd:${compress_level} $raw $dir perf record -g -o perf${compress_level}.data bash -c "cp -r installed.dir/* $dir ; sync $dir" done </code> ==== Results ==== zstd:1 - 40% savings zstd:3 - 41% savings zstd:9 - 43% savings This is the result from df. Ignore Use% because the installed directory is a different size. <code> Filesystem 1K-blocks Used Available Use% Mounted on /dev/loop1p3 61665068 5458980 53043976 10% /scratch/installed.dir /dev/loop0 16777216 3257192 13356936 20% /scratch/zstd1.dir /dev/loop2 16777216 3197668 13415468 20% /scratch/zstd3.dir /dev/loop3 16777216 3105440 13502704 19% /scratch/zstd9.dir
/dev/urandom copy cpu cost
- note: this test is a lot less scientific than the other one, because it’ll be CPU dependent and I had other stuff going on (chrome was open, etc). It’s likely still a reasonable proxy for CPU impact though.**
tl;dr;
zstd:1 added 9% and zstd:9 added 15% of system time to the baseline.
Test Steps
1. Get a 2GB blah file that will compress very poorly
1. dd if=/dev/urandom of=blah bs=256K count=8192
2. Run the script
#!/bin/sh set -e SIZE=16G if [ $EUID -ne 0 ] ; then echo "Needs to be root" >&2 exit 2 fi for compress_level in 0 1 3 9 ; do raw=zstd${compress_level}.raw truncate -s $SIZE $raw #perf record -g -o perf${compress_level}.data bash -c #perf record -g -o perf${compress_level}.data bash -c mkfs.btrfs -f $raw 2>/dev/null >/dev/null dir=zstd${compress_level}.dir mkdir -p $dir if [ $compress_level -eq 0 ] ; then mount $raw $dir else mount -o compress=zstd:${compress_level} $raw $dir fi echo Compression Level ${compress_level} time bash -c \ "dd if=blah of=$dir/blah ; sync $dir" done
Results
Compression Level 0 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 13.3657 s, 161 MB/s real 0m13.386s user 0m2.891s sys 0m9.486s Compression Level 1 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.8094 s, 145 MB/s real 0m14.913s user 0m3.313s sys 0m10.380s Compression Level 3 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 15.1291 s, 142 MB/s real 0m15.259s user 0m3.261s sys 0m10.720s Compression Level 9 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 15.4792 s, 139 MB/s real 0m15.499s user 0m3.442s sys 0m10.913s
/dev/zero cpu copy test
tl;dr;
Easily compressed data is a lot more expensive at higher levels. zstd:1 added 16%, zstd:9 added 82%
Test Steps
Same as above, but swap /dev/urandom for /dev/zero
Results
Compression Level 0 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.1791 s, 151 MB/s real 0m14.196s user 0m2.971s sys 0m9.909s Compression Level 1 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 16.4391 s, 131 MB/s real 0m16.536s user 0m3.403s sys 0m11.511s Compression Level 3 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.9807 s, 143 MB/s real 0m15.094s user 0m3.398s sys 0m10.833s Compression Level 9 4194304+0 records in 4194304+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 18.0451 s, 119 MB/s real 0m18.066s user 0m3.758s sys 0m12.173s
Release Notes
Transparent compression of the filesystem using zstd is now enabled by default. Use the compsize utility to find out the actual size on disk of a given file.