swap on ZRAM
Summary
Swap is useful, except when slow.[1] ZRAM is a RAM disk that uses always-on compression [2]. It has a size assigned at create time, but the RAM usage is dynamically allocated and deallocated, on demand. This ZRAM block device behaves like any other, it can be formatted with a file system or mkswap, which is the intention with this change proposal.
There is more than one change indicated in this proposal. Each is opt-in (owner assumes editions/spins are to be excluded unless they ask to be included):
1. Include systemd rust-zram-generator[3]. This does not enable swap-on-ZRAM. It only makes the generator available.
2. Include a default zram-generator configuration. If present, it enables swap-on-ZRAM during startup.
3. Do not create swap partition/LV for default installations.
[1]
There is a tl;dr section at the top. Highly recommend reading the whole article.
In defence of swap: common misconceptions
https://chrisdown.name/2018/01/02/in-defence-of-swap.html
[2]
https://www.kernel.org/doc/Documentation/blockdev/zram.txt
[3]
https://github.com/systemd/zram-generator
Owner
- Name: Chris Murphy
- Email: chrismurphy@fedoraproject.org
Current status
- Targeted release: Fedora 33
- Last updated: 2020-05-30
- FESCo issue: <will be assigned by the Wrangler>
- Tracker bug: <will be assigned by the Wrangler>
- Release notes tracker: <will be assigned by the Wrangler>
Detailed Description
Basic function:
The system will use RAM normally up until it's full, and then start paging out to the swap-on-ZRAM device, just as if it were a real swap. But, there is no free lunch. The ZRAM driver starts to allocate memory at roughly 1/2 the rate of page outs, due to compression. This means swap is not as effective at page eviction, the rate is ~50% instead of 100%. But it is orders of magnitude faster that disk based swap.
ZRAM has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, living entirely inside RAM, the overhead is the sole cost, there is no preallocation of RAM.
Default configuration:
Always create a ZRAM device regardless, of RAM size, using a ZRAM to RAM ratio of 1:2, and capped to 4GiB [4], with a higher than typical swap priority [5].
[4]
RFE: should be able to set a cap on zram device size
https://github.com/systemd/zram-generator/issues/10
[5]
RFE: should set priority #8
https://github.com/systemd/zram-generator/issues/8
These values seem reasonable, based on prior work. Anaconda has two examples in which it sets swap size to 50% RAM: the no hibernation case, common outside x86; and it's own current swap-on-ZRAM implementation. Fedora IoT's implementation uses a ZRAM to RAM ratio of 1:2, or 50% as well.
In the summary, three changes are listed. What does the opt in look like for these? What's the benefit of this apparent complexity?
(1) only = generator present, users can enable by creating a configuration file. No other changes. Ideally FESCo approves this Fedora wide so that the generator is available everywhere without exception. Makes it easier to converge on to reduce user confusion,
(1) + (2) = swap-on-ZRAM is enabled, and with a higher priority than default for swap-on-drive. Both co-exist, but swap-on-ZRAM is favored first. Hibernation is still possible if the swap-on-drive partition is big enough and all other requirements are met.
(1) + (2) + (3) = swap-on-ZRAM is enabled, no disk-based swap present. Fedora Workstation edition plans on doing this (pending test day results and feedback). [6]
[6]
Pending test day results, Fedora Workstation Edition anticipates proceeding with full opt in: swap-on-ZRAM is enabled. New default installs will not have a swap-on-disk present. If the user opts to create swap via Custom partitioning at installation time, the installation will have two swaps: swap-on-ZRAM, swap-on-disk. The swap-on-ZRAM will have higher priority, thus being favored over disk based swap. The kernel is smart enough to know it can't hibernation to a ZRAM device, and will instead use disk based swap. This works reliably in testing so far, and we'll want to thoroughly beat it up during the test day.
Test Day:
Change owner recommends all editions/spins opt in and participate in the soon to be scheduled test day. The defaults will be conservative. e.g. 50% RAM used for the ZRAM device with a cap of 4GiB. Test day will help fine tune this.
Feedback
Why not zswap?
Zswap is a similar idea, similar "z" affection, but with a totally different implementation. It is swap specific, uses a RAM cache, and requires a conventional swap partition existing already. It might be true certain workloads are better suited for using zswap. But swap-on-ZRAM depends only on volatile storage. This is simpler and it's more secure. Whereas zswap "spilling over" into the real swap on disk can leak user data if that swap device isn't encrypted. This is certainly a valid future feature for a new generator, or possibly zram-generator could be extended to include support for zswap via the configuration file.
https://www.kernel.org/doc/Documentation/vm/zswap.txt
You're enabling it on upgrades?
That's the current plan. There are some difficulties with this right now in Fedora, needing to use Supplements to cause new things to be dragged in on upgrades. As a technical matter, feature owner is confident it will improve the experience of users who have configured their systems with no swap. As a non-technical matter, it's recognized that (a) hey pal, you're messing with my customizations, not cool! and (b) swap always sucks I don't care if it has a 'Z' in the name! The dilemma is that not applying it to upgrades fragments the Fedora user base, and the overall experience people are having, and giving feedback on to make Fedora better. So all of this has to be balanced out.
Why systemd zram-generator?
It's the most upstream implementation to date, and leverages existing systemd infrastructure setup the ZRAM block device, format it as swap, and swapon - all during early boot. There are also systemd-fstab-generator
Also the idea is to converge on one particular implementation to avoid user confusion. And while the alternatives are nice and work fine, using a systemd generator rather than a systemd service unit is ideal.
https://www.freedesktop.org/software/systemd/man/systemd.generator.html
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/
Why not a bigger ZRAM device?
It's possible some workloads will have data that doesn't compress well. Hence not going with a 1:1 ZRAM to RAM ratio. Even a 2:1 ratio is not unreasonable *if* the compression ratio is 2:1. However, it's possible a system can actually get "wedged in" to a kind of swap thrashing very similar to conventional swap-on-disk, except it becomes CPU and memory bound, rather than IO bound. Feature owner thinks it's better to just oom than get overly aggressive with the ZRAM device size. And this feature is already well tested with earlyoom used by default on Fedora Workstation since Fedora 32.
Benefit to Fedora
- significantly improves system responsiveness, especially when swap is under pressure; - complements on-going resource control work; - further reduces the time to out-of-memory kill, when workloads exceed limits; - improves both the "no swap" and "existing swap" setups;
Scope
- Proposal owners:
- add zram-generator package to comps for the editions/spins opting in
- means of per edition/spin configurations, if needed
- coordinate a test day
- Other developers:
- RFE's for zram-generator: users are not worse off if they don't happen
- RFE's for zram-generator: users are not worse off if they don't happen
https://github.com/systemd/zram-generator/issues/10
https://github.com/systemd/zram-generator/issues/8
- Anaconda is agreeable to deprecating their built-in implementation in favor of swap-on-ZRAM
Upgrade/compatibility impact
If all editions/spins opt in, add Supplements:fedora-release to zram-generator to pull it in on upgrades.
Existing systems with no swap will have swap-on-ZRAM enabled. Existing systems with conventional swap-on-disk, will also have swap-on-ZRAM enabled (two swap devices), with higher priority for the ZRAM device. Existing swap-on-disk will not be removed.
How To Test
Any hardware. Any version of Fedora.
1. dnf install zram-generator 2. cp /usr/share/doc/zram-generator/zram-generator.conf.example /etc/systemd/zram-generator.conf 3. Edit the configuration 4. Reboot 5. Check that swap is on a ZRAM device zramctl swapon 6. Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram' 7. Check that priority is higher than existing swap if two or more are listed.
User Experience
The user won't notice anything. If their usual workload causes them to dread swap thrashing, they'll be surprised this doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or they might get curious if they 'swapon' and see swap pointing to /dev/zram0 instead of a disk partition or LV.
Dependencies
N/A
Contingency Plan
- Contingency mechanism: Don't ship the generator = big hammer but easy. Also possible to ship the generator, but only selectively ship configuration files = scalpel, pretty easy.
- Contingency deadline: Beta freeze
- Blocks release? No.
- Blocks product? No.
Documentation
Fedora could consider adding a hint in an /etc/fstab comment? There is no man page for this, and the documentation is also minimal besides what's in this feature proposal. So it's an open question how the user should get more information on how to configure and tweak it. But then, they don't have that for swap today either. They just have institutional knowledge.
Hence, a strong test day with a lot of people and press coverage of the feature might help spread the word for institutional knowledge changes coming.
Ideas welcome.
Release Notes
Pending feedback and test day.