From Fedora Project Wiki
(→‎How to test: add explicit mount option)
(defragmenting snapshots warning)
 
(12 intermediate revisions by 5 users not shown)
Line 12: Line 12:
== Current status ==
== Current status ==
[[Category:SystemWideChange]]
[[Category:SystemWideChange]]
[[Category:ChangeAnnounced]]
[[Category:ChangeAcceptedF34]]


* Targeted release: [https://fedoraproject.org/w/index.php?title=Releases/34&action=edit&redlink=1 Fedora 34]
* Targeted release: [https://fedoraproject.org/w/index.php?title=Releases/34&action=edit&redlink=1 Fedora 34]
* Last updated: <!-- this is an automatic macro — you don't need to change this line -->  {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}}
* Last updated: <!-- this is an automatic macro — you don't need to change this line -->  {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}}
* FESCo issue: <will be assigned by Wrangler>
* FESCo issue: [https://pagure.io/fesco/issue/2538 #2538]
* Tracker bug: <will be assigned by Wrangler>
* Tracker bug: [https://bugzilla.redhat.com/show_bug.cgi?id=1916918 #1916918]
* Release notes tracker: <will be assed by Wrangler>
* Release notes tracker: [https://pagure.io/fedora-docs/release-notes/issue/635 #635]


== Detailed description ==
== Detailed description ==
Line 28: Line 28:
== Feedback ==
== Feedback ==


(pending initial discussion)
=== Q: How do I disable this feature? ===
A: Edit '/etc/fstab' and remove the 'compress=zstd:1' mount option. Save and reboot; or `mount -o remount,compress=none /`
 
=== Q: I want compression only on '/' and not on '/home', can I just modify fstab? ===
A: No. The compress(-force) mount option is file system wide, it applies to the entire fs, not per subvolume or mount point.
 
=== Q: Is there a way to do it per subvolume? ===
A: Yes, but there are some caveats. The 'btrfs property' command can be used to set compression per subvolume, directory, or file. Unsetting it is...tricky. The 'none' option doesn't unset compression, it prevents the compress mount option from working.
 
=== Q: What about 'chattr +c' ? ===
A: This is the legacy way of setting compression before 'btrfs property' existed, and still works today. However you can't specify an algorithm like you can with 'btrfs property' so it uses the current default, zlib. And zlib also applies if you use the compress(-force) mount option without specifying an algorithm. Tip: COW stands for copy-on-write so 'chattr +C' (capitalized) means nodatacow. Whereas +c (small c) stands for compression.
 
=== Q: How are 'chattr +c' and 'btrfs property' the same? ===
A: Internal to Btrfs they both set a compression flag on the inode.
 
=== Q: If I use 'btrfs property' to set compression, what level is used for zstd? ===
A: Currently the default is always used, level 3. Even if you get 'btrfs property' to set it with 'zstd:1' it will use 3.
 
=== Q: Is it safe to mix and match compression algorithms? ===
A: Yes.
 
=== Q: Does compression cause more fragmentation? The 'filefrag' tool shows a lot more extents on compressed files. ===
A: No. This is a bug or missing feature. The 'filefrag' command uses FIEMAP, and this reports the Btrfs logical address in what appears to be separate, non-contiguous, 128KiB extents. This 128KiB extent is actually the maximum compression block size. The actual physical extent on disk may be contiguous, but FIEMAP currently doesn't have a way of knowing this. So that does mean the filefrag tool is not a reliable indicator of file fragmentation.
 
=== Q: Do I have to reboot to change compression options? ===
A: No, you can just remount any mount point for that file system, specifying the new options, e.g. 'mount -o remount,compress=zstd:5'.
 
=== Q: Is there a list of compression related bugs? ===
A: Known compression related bugs are cosmetic, i.e. they may be annoying but they're not risking your data.
[https://github.com/kdave/btrfs-progs/issues/308]
[https://github.com/kdave/btrfs-progs/issues/309]
[https://github.com/kdave/btrfs-progs/issues/328]
[https://github.com/kdave/btrfs-progs/issues/329]
 
=== Q: How is used and free space reporting affected by compression? ===
A: It depends. 'df' shows the raw physical blocks that are used and free, which means it's (unknowingly) accounting for compression for used space reporting, and assuming no compression for free space. 'ls' 'du' and 'btrfs filesystem du' are looking at the logical blocks that files use which are uncompressed. Unrelated to compression, Btrfs will inline small extents for files less than 2048 bytes, and these files are always counted as 4KiB each. So 'du' can over report, possibly by quite a lot if there are many such small files.
 
=== Q: What's the best tool to see how effective compression is? ===
A: The included 'compsize' tool pointed at a file or directory will report this.
 
=== Q: Why use zstd:1 specifically? ===
a: [https://hackmd.io/kIMJv7yHSiKoAq1MPcCMdw] has an analysis of the various compression levels and their impact, both in terms of CPU usage and disk space savings.
 
=== Q: Will /boot be compressed? ===
A: By default we still put `/boot` on `ext4` (though `/boot` on btrfs is possible with advanced partitioning). GRUB has [https://git.savannah.gnu.org/cgit/grub.git/commit/?id=386128648606a3aa6ae7108d1c9af52258202279 support for zstd-compressed Btrfs partitions] since 2018 so this should work just fine.
 
We plan a forthcoming change once [[Changes/UnifyGrubConfig]] lands to make sure that the GRUB environment is stored in a partition that it can write to.
 
Currently the hidden GRUB menu in Fedora Workstation ([[Changes/HiddenGrubMenu]], [[Changes/CleanupGnomeHiddenBootMenuIntegration]]) will not work on BIOS-based systems; after UnifyGrubConfig, this might not work even on EFI systems, so we are not recommending `/boot` on btrfs for now because of this regression.
 
=== References: ===
 
man 5 btrfs<br />
man btrfs property<br />
man compsize


== Benefit to Fedora ==
== Benefit to Fedora ==
Line 54: Line 108:
== Upgrade/compatibility impact ==
== Upgrade/compatibility impact ==


This Change only applies to newly installed systems. Existing systems on upgrade will be unaffected, but can be converted manually with <code>btrfs filesystem defrag -czstd -r</code>, updating `/etc/fstab` and remounting.
This Change only applies to newly installed systems. See `How to test` for converting an existing system.


== How to test ==
== How to test ==


Existing systems can be converted to use compression manually with <code>btrfs filesystem defrag -czstd -r</code>, updating `/etc/fstab` to add the `compress=zstd:1` mount option and remounting (`mount -o remount /`)
Update `/etc/fstab` to add the `compress=zstd:1` mount option to `/` and `/home` mount points. Then remount (`mount -o remount /`). It's only necessary to remount one of the two mount points. All writes will now be compressed.
 
Optional: Already written data can be compressed via defragmenting option, on a per directory or file basis. It takes the form  <code>btrfs filesystem defragment -czstd -rv /path/to/dir/or/file/</code>. It's completely OK to skip this step and just allow a file system to become compressed over time via attrition. If you're using snapshots, it's advised you don't run defragment on all snapshots because defragment will unshare the shared extents between them, leading to a potentially significant increase in space consumption.
 


== User experience ==
== User experience ==
Line 78: Line 135:


https://btrfs.wiki.kernel.org/index.php/Compression
https://btrfs.wiki.kernel.org/index.php/Compression
== Simple Analysis of btrfs zstd compression level ==
=== Workstation root fs disk space savings ===
==== tl;dr; ====
For an installed root directory of Fedora 32, zstd:1 yields a 40% storage savings as compared to uncompressed ext4. zstd:9 yields a 43% storage savings.
==== Test Steps ====
1. Obtain an “installed.dir” by
    1. Prep the image
        1. truncate -s 64G installed.raw
    2. Booting the live cd (in the case F32) in a qemu-kvm
        1. qemu-kvm -drive file=installed.raw -cdrom $ISO -m 1G
    3. Installing it in the graphical installer
        1. I used regular partitions, and did not create a new one for /home
    4. Shutting down the qemu instance3105440
    5. Mount it to the directory
        1. losetup -Pf installed.raw
        2. mkdir -p installed.dir
        3. mount /dev/loop1p2 installed.dir
2. Run the script below
<pre>
#!/bin/sh
set -e
SIZE=16G
if [ $EUID -ne 0 ] ; then
    echo "Needs to be root" >&2
    exit 2
fi
for compress_level in 1 3 9 ; do
    raw=zstd${compress_level}.raw
    truncate -s $SIZE $raw
    mkfs.btrfs -f $raw
    dir=zstd${compress_level}.dir
    mkdir -p $dir
    mount -o compress=zstd:${compress_level} $raw $dir
    perf record -g -o perf${compress_level}.data bash -c "cp -r installed.dir/* $dir ; sync $dir"
done
</code>
==== Results ====
zstd:1 - 40% savings
zstd:3 - 41% savings
zstd:9 - 43% savings
This is the result from df. Ignore Use% because the installed directory is a different size.
<code>
Filesystem                              1K-blocks    Used Available Use% Mounted on
/dev/loop1p3                            61665068  5458980  53043976  10% /scratch/installed.dir
/dev/loop0                              16777216  3257192  13356936  20% /scratch/zstd1.dir
/dev/loop2                              16777216  3197668  13415468  20% /scratch/zstd3.dir
/dev/loop3                              16777216  3105440  13502704  19% /scratch/zstd9.dir
</pre>
=== /dev/urandom copy cpu cost ===
**note: this test is a lot less scientific than the other one, because it’ll be CPU dependent and I had other stuff going on (chrome was open, etc). It’s likely still a reasonable proxy for CPU impact though.**
==== tl;dr; ====
zstd:1 added 9% and zstd:9 added 15% of system time to the baseline.
==== Test Steps ====
1. Get a 2GB blah file that will compress very poorly
    1. dd if=/dev/urandom of=blah bs=256K count=8192
2. Run the script
<pre>
#!/bin/sh
set -e
SIZE=16G
if [ $EUID -ne 0 ] ; then
    echo "Needs to be root" >&2
    exit 2
fi
for compress_level in 0 1 3 9 ; do
    raw=zstd${compress_level}.raw
    truncate -s $SIZE $raw    #perf record -g -o perf${compress_level}.data bash -c    #perf record -g -o perf${compress_level}.data bash -c
    mkfs.btrfs -f $raw 2>/dev/null >/dev/null
    dir=zstd${compress_level}.dir
    mkdir -p $dir
    if [ $compress_level -eq 0 ] ; then
        mount $raw $dir
    else
        mount -o compress=zstd:${compress_level} $raw $dir
    fi
    echo Compression Level ${compress_level}
    time bash -c \
        "dd if=blah of=$dir/blah ; sync $dir"
done
</pre>
==== Results ====
<pre>
Compression Level 0
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 13.3657 s, 161 MB/s
real    0m13.386s
user    0m2.891s
sys    0m9.486s
Compression Level 1
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.8094 s, 145 MB/s
real    0m14.913s
user    0m3.313s
sys    0m10.380s
Compression Level 3
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 15.1291 s, 142 MB/s
real    0m15.259s
user    0m3.261s
sys    0m10.720s
Compression Level 9
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 15.4792 s, 139 MB/s
real    0m15.499s
user    0m3.442s
sys    0m10.913s
</pre>
=== /dev/zero cpu copy test ===
==== tl;dr; ====
Easily compressed data is a lot more expensive at higher levels. zstd:1 added 16%, zstd:9 added 82%
==== Test Steps ====
Same as above, but swap /dev/urandom for /dev/zero
==== Results ====
<pre>
Compression Level 0
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.1791 s, 151 MB/s
real    0m14.196s
user    0m2.971s
sys    0m9.909s
Compression Level 1
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 16.4391 s, 131 MB/s
real    0m16.536s
user    0m3.403s
sys    0m11.511s
Compression Level 3
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.9807 s, 143 MB/s
real    0m15.094s
user    0m3.398s
sys    0m10.833s
Compression Level 9
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 18.0451 s, 119 MB/s
real    0m18.066s
user    0m3.758s
sys    0m12.173s
</pre>


== Release Notes ==
== Release Notes ==


Transparent compression of the filesystem using zstd is now enabled by default. Use the compsize utility to find out the actual size on disk of a given file.
Transparent compression of the filesystem using zstd is now enabled by default. Use the compsize utility to find out the actual size on disk of a given file.

Latest revision as of 16:49, 24 April 2021

Enable btrfs transparent zstd compression by default

Summary

On variants using btrfs as the default filesystem, enable transparent compression using zstd. Compression saves space and can significantly increase the lifespan of flash-based media by reducing write amplification. It can also increase read and write performance.

Owners

Current status

Detailed description

Transparent compression is a btrfs feature that allows a btrfs filesystem to apply compression on a per-file basis. Of the three supported algorithms, zstd is the one with the best compression speed and ratio. Enabling compression saves space, but it also reduces write amplification, which is important for SSDs. Depending on the workload and the hardware, compression can also result in an increase in read and write performance.

See https://pagure.io/fedora-btrfs/project/issue/5 for details. This was originally scoped as an optimization for https://fedoraproject.org/wiki/Changes/BtrfsByDefault during Fedora 33.

Feedback

Q: How do I disable this feature?

A: Edit '/etc/fstab' and remove the 'compress=zstd:1' mount option. Save and reboot; or mount -o remount,compress=none /

Q: I want compression only on '/' and not on '/home', can I just modify fstab?

A: No. The compress(-force) mount option is file system wide, it applies to the entire fs, not per subvolume or mount point.

Q: Is there a way to do it per subvolume?

A: Yes, but there are some caveats. The 'btrfs property' command can be used to set compression per subvolume, directory, or file. Unsetting it is...tricky. The 'none' option doesn't unset compression, it prevents the compress mount option from working.

Q: What about 'chattr +c' ?

A: This is the legacy way of setting compression before 'btrfs property' existed, and still works today. However you can't specify an algorithm like you can with 'btrfs property' so it uses the current default, zlib. And zlib also applies if you use the compress(-force) mount option without specifying an algorithm. Tip: COW stands for copy-on-write so 'chattr +C' (capitalized) means nodatacow. Whereas +c (small c) stands for compression.

Q: How are 'chattr +c' and 'btrfs property' the same?

A: Internal to Btrfs they both set a compression flag on the inode.

Q: If I use 'btrfs property' to set compression, what level is used for zstd?

A: Currently the default is always used, level 3. Even if you get 'btrfs property' to set it with 'zstd:1' it will use 3.

Q: Is it safe to mix and match compression algorithms?

A: Yes.

Q: Does compression cause more fragmentation? The 'filefrag' tool shows a lot more extents on compressed files.

A: No. This is a bug or missing feature. The 'filefrag' command uses FIEMAP, and this reports the Btrfs logical address in what appears to be separate, non-contiguous, 128KiB extents. This 128KiB extent is actually the maximum compression block size. The actual physical extent on disk may be contiguous, but FIEMAP currently doesn't have a way of knowing this. So that does mean the filefrag tool is not a reliable indicator of file fragmentation.

Q: Do I have to reboot to change compression options?

A: No, you can just remount any mount point for that file system, specifying the new options, e.g. 'mount -o remount,compress=zstd:5'.

Q: Is there a list of compression related bugs?

A: Known compression related bugs are cosmetic, i.e. they may be annoying but they're not risking your data. [1] [2] [3] [4]

Q: How is used and free space reporting affected by compression?

A: It depends. 'df' shows the raw physical blocks that are used and free, which means it's (unknowingly) accounting for compression for used space reporting, and assuming no compression for free space. 'ls' 'du' and 'btrfs filesystem du' are looking at the logical blocks that files use which are uncompressed. Unrelated to compression, Btrfs will inline small extents for files less than 2048 bytes, and these files are always counted as 4KiB each. So 'du' can over report, possibly by quite a lot if there are many such small files.

Q: What's the best tool to see how effective compression is?

A: The included 'compsize' tool pointed at a file or directory will report this.

Q: Why use zstd:1 specifically?

a: [5] has an analysis of the various compression levels and their impact, both in terms of CPU usage and disk space savings.

Q: Will /boot be compressed?

A: By default we still put /boot on ext4 (though /boot on btrfs is possible with advanced partitioning). GRUB has support for zstd-compressed Btrfs partitions since 2018 so this should work just fine.

We plan a forthcoming change once Changes/UnifyGrubConfig lands to make sure that the GRUB environment is stored in a partition that it can write to.

Currently the hidden GRUB menu in Fedora Workstation (Changes/HiddenGrubMenu, Changes/CleanupGnomeHiddenBootMenuIntegration) will not work on BIOS-based systems; after UnifyGrubConfig, this might not work even on EFI systems, so we are not recommending /boot on btrfs for now because of this regression.

References:

man 5 btrfs
man btrfs property
man compsize

Benefit to Fedora

Better disk space usage, reduction of write amplification, which in turn helps increase lifespan and performance on SSDs and other flash-based media. It can also increase read and write performance.

Scope

Upgrade/compatibility impact

This Change only applies to newly installed systems. See How to test for converting an existing system.

How to test

Update /etc/fstab to add the compress=zstd:1 mount option to / and /home mount points. Then remount (mount -o remount /). It's only necessary to remount one of the two mount points. All writes will now be compressed.

Optional: Already written data can be compressed via defragmenting option, on a per directory or file basis. It takes the form btrfs filesystem defragment -czstd -rv /path/to/dir/or/file/. It's completely OK to skip this step and just allow a file system to become compressed over time via attrition. If you're using snapshots, it's advised you don't run defragment on all snapshots because defragment will unshare the shared extents between them, leading to a potentially significant increase in space consumption.


User experience

Compression will result in file sizes (e.g. as reported by du) not matching the actual space occupied on disk. The compsize utility can be used to examine the compression type, effective compression ration and actual size.

Dependencies

Anaconda will need to be updated to perform the installation using mount -o compress=zstd:1

Contingency plan

  • Contingency mechanism: will not include PR patches if not merged upstream and will not enable
  • Contingency deadline: Final freeze
  • Blocks release? No
  • Blocks product? No

Documentation

https://btrfs.wiki.kernel.org/index.php/Compression


Simple Analysis of btrfs zstd compression level

Workstation root fs disk space savings

tl;dr;

For an installed root directory of Fedora 32, zstd:1 yields a 40% storage savings as compared to uncompressed ext4. zstd:9 yields a 43% storage savings.

Test Steps

1. Obtain an “installed.dir” by

   1. Prep the image
       1. truncate -s 64G installed.raw
   2. Booting the live cd (in the case F32) in a qemu-kvm 
       1. qemu-kvm -drive file=installed.raw -cdrom $ISO -m 1G
   3. Installing it in the graphical installer
       1. I used regular partitions, and did not create a new one for /home
   4. Shutting down the qemu instance3105440
   5. Mount it to the directory
       1. losetup -Pf installed.raw
       2. mkdir -p installed.dir
       3. mount /dev/loop1p2 installed.dir

2. Run the script below

#!/bin/sh

set -e

SIZE=16G

if [ $EUID -ne 0 ] ; then
    echo "Needs to be root" >&2
    exit 2
fi

for compress_level in 1 3 9 ; do
    raw=zstd${compress_level}.raw
    truncate -s $SIZE $raw
    mkfs.btrfs -f $raw
    dir=zstd${compress_level}.dir
    mkdir -p $dir
    mount -o compress=zstd:${compress_level} $raw $dir
    perf record -g -o perf${compress_level}.data bash -c "cp -r installed.dir/* $dir ; sync $dir"
done
</code>

==== Results ====

zstd:1 - 40% savings
zstd:3 - 41% savings
zstd:9 - 43% savings

This is the result from df. Ignore Use% because the installed directory is a different size.

<code>
Filesystem                              1K-blocks     Used Available Use% Mounted on
/dev/loop1p3                             61665068  5458980  53043976  10% /scratch/installed.dir
/dev/loop0                               16777216  3257192  13356936  20% /scratch/zstd1.dir
/dev/loop2                               16777216  3197668  13415468  20% /scratch/zstd3.dir
/dev/loop3                               16777216  3105440  13502704  19% /scratch/zstd9.dir

/dev/urandom copy cpu cost

    • note: this test is a lot less scientific than the other one, because it’ll be CPU dependent and I had other stuff going on (chrome was open, etc). It’s likely still a reasonable proxy for CPU impact though.**

tl;dr;

zstd:1 added 9% and zstd:9 added 15% of system time to the baseline.

Test Steps

1. Get a 2GB blah file that will compress very poorly

   1. dd if=/dev/urandom of=blah bs=256K count=8192

2. Run the script

#!/bin/sh

set -e

SIZE=16G

if [ $EUID -ne 0 ] ; then
    echo "Needs to be root" >&2
    exit 2
fi

for compress_level in 0 1 3 9 ; do
    raw=zstd${compress_level}.raw
    truncate -s $SIZE $raw    #perf record -g -o perf${compress_level}.data bash -c    #perf record -g -o perf${compress_level}.data bash -c
    mkfs.btrfs -f $raw 2>/dev/null >/dev/null
    dir=zstd${compress_level}.dir
    mkdir -p $dir
    if [ $compress_level -eq 0 ] ; then
        mount $raw $dir
    else
        mount -o compress=zstd:${compress_level} $raw $dir
    fi
    echo Compression Level ${compress_level}
    time bash -c \
         "dd if=blah of=$dir/blah ; sync $dir"
done

Results

Compression Level 0
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 13.3657 s, 161 MB/s

real    0m13.386s
user    0m2.891s
sys     0m9.486s
Compression Level 1
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.8094 s, 145 MB/s

real    0m14.913s
user    0m3.313s
sys     0m10.380s
Compression Level 3
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 15.1291 s, 142 MB/s

real    0m15.259s
user    0m3.261s
sys     0m10.720s
Compression Level 9
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 15.4792 s, 139 MB/s

real    0m15.499s
user    0m3.442s
sys     0m10.913s

/dev/zero cpu copy test

tl;dr;

Easily compressed data is a lot more expensive at higher levels. zstd:1 added 16%, zstd:9 added 82%

Test Steps

Same as above, but swap /dev/urandom for /dev/zero

Results

Compression Level 0
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.1791 s, 151 MB/s

real    0m14.196s
user    0m2.971s
sys     0m9.909s
Compression Level 1
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 16.4391 s, 131 MB/s

real    0m16.536s
user    0m3.403s
sys     0m11.511s
Compression Level 3
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 14.9807 s, 143 MB/s

real    0m15.094s
user    0m3.398s
sys     0m10.833s
Compression Level 9
4194304+0 records in
4194304+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 18.0451 s, 119 MB/s

real    0m18.066s
user    0m3.758s
sys     0m12.173s

Release Notes

Transparent compression of the filesystem using zstd is now enabled by default. Use the compsize utility to find out the actual size on disk of a given file.