From Fedora Project Wiki

Revision as of 21:39, 4 November 2011 by Kay (talk | contribs)

Move all to /usr

Summary

Provide a simple way of mounting almost the entire installed operating system read-only, atomically snapshot it, or share it between multiple hosts to save maintenance and space. Instead of spreading RPM package content all over the place in the filesystem, and artificially separate /bin from /usr/bin and /lib from /usr/lib, move all content to /usr and provide only symlinks in the root filesystem.

/usr on its own filesystem provides a lot of valuable options in custom setups. For historic reasons, we split-off more and more tools from /usr and put them in /. But, advanced features in today's systems can not really bootup with an empty /usr anymore. More and more fails in subtle ways in such setups.

Instead of moving more tools to /, we today already require /usr to be mounted from inside the initramfs, to be available before the real 'init' starts. The split of the root filesystem an /usr serves no purpose in Linux anymore and only complicates or prevents simple and more flexible setups.

Owner

Current status

  • Targeted release: Fedora 17
  • Last updated: 2011-11-04
  • Percentage of completion: 10%

Detailed Description

There is no way to reliably bring up a modern system with an empty /usr, there are two alternatives to fix it: copy /usr back to the rootfs or use an initramfs which can hide the split-off from the system.

Historically /bin, /sbin, /lib had the purpose to contain the utilities to mount /usr. This role can now be taken by the initramfs. Because the initramfs knows, where to find the root partition (which includes /etc), it can parse /etc/fstab and other configuration files and mount /usr before it finally switches the root partition and executes /usr/bin/init. From this point on init mounts the remaining partitions in /etc/fstab and the system starts as usual.

The long-term plan is to clean up the mess and confusion the current split of / vs. /usr has created. All tools will move back to /usr where they belong, and the rootfs will only contain compat-symlinks into /usr. Almost the entire system installed by packages will reside in /usr. This will split all non-host specific data to /usr. /usr can then be seen as the Unix System Resources partition (/System), which defines the base operating system (e.g. F18 or RHEL-7).

This new /usr could be mounted read-only by default, while the rootfs is read-write and contains only empty mount points, compat-symlinks to /usr and the host-specific data like /etc, /root, /srv. Compared to today's setups, the rootfs will be very small. The new /usr could also easily be shared read-only across several systems, and it would contain almost the entire system. Such setups are more efficient, can optionally provide a lot more security, are more flexible, provide more sane options for custom setups, and are much simpler to setup and maintain.

This leaves us with the following well-defined directories, which compose the base of the system:

  • /usr - installed system; shareable; possibly read-only
  • /etc - config data; non-shareable
  • /var - persistent data; non-shareable;
  • /run - volatile data; non-shareable; mandatory tmpfs filesystem

In the process of moving /bin and /sbin to /usr/bin, /usr/sbin can be moved also to /usr/bin.

/
|-- etc
|-- usr
|   |-- bin
|   |-- sbin -> bin
|   |-- lib
|   `-- lib64
|-- run
|-- var
|-- bin -> usr/bin
|-- sbin -> usr/bin
|-- lib -> usr/lib
`-- lib64 -> usr/lib64

Benefit to Fedora

  • Simpler and cleaner overall file system layout, with full compatibility.
  • Clear separation of operating system and host specific resources.
  • Best possible compatibility, no confusion about tools install locations, no $PATH fiddling, all possible paths to a binary will always work.

Scope

  • The ability to share /usr is especially useful for clusters and virtual machines.
  • The ability to mount /usr read-only (e.g. on read-only media) can add to the security of the machine.
  • The entire /usr can safely be snapshotted during upgrades.

How To Test

  • update a Fedora package with files in /bin, /sbin, /lib or /lib64 via yum

-> see symbolic links in /bin, /sbin, /lib or /lib64 pointing to the file /usr/bin /usr/lib or /usr/lib64

# rpm -qf <symbolic link>

should output ownership of that compat symlink

or

  • install a fresh F17

-> see symbolic toplevel links:

 /lib -> usr/lib
 /lib64 -> usr/lib64
 /sbin -> usr/bin
 /bin -> usr/bin
 /usr/sbin -> bin

User Experience

  • less toplevel directories

Dependencies

  • initramfs (dracut)
  • changes in selinux policies
  • repackaging of packages with content in /bin, /sbin, /lib*
  • move consolehelper real binaries from /usr/sbin/* to /usr/lib/<pkgname>/<tool> or /usr/libexec/<tool> and change consolehelper to look in these places.
  • alternatives symlinks?
  • filesystem rpm, toplevel symlinks

Roadmap

  • Begin changing rpm packages with files in /bin, /sbin, /usr/sbin, /lib, /lib64.
  • Make backward compat symlinks in %post and %ghost those symlinks:
%post
# create compat symlink for tools as long as root directories are not converted to symlinks
if ! test -L /bin; then
    ln -s ../usr/bin/foo /bin/foo
    ln -s ../usr/bin/bar /bin/bar
fi
if ! test -L /sbin; then
    ln -s ../usr/bin/buz /sbin/buz
fi

%files
%ghost %attr(777, root, root) /bin/foo
%ghost %attr(777, root, root) /bin/bar
%ghost %attr(777, root, root) /sbin/buz
  • RPM: 257 packages that install files in the root filesystem.
  • Change SELinux policies.
  • On new installation: create symlinks /bin -> usr/bin, /sbin -> usr/bin, /lib -> usr/lib, /lib64 -> usr/lib64, /usr/sbin -> bin. These links will take care that installed packages do not install compat symlinks in %post.
  • Make sure dracut is able to mount needed filesystems specifies in /etc/fstab before starting systemd.

Contingency Plan

  • We do not support to bootup with an empty /usr today, so moving things to /usr and have compat links in the rootfs should be low risk.
  • If things turn out to get difficult, we can delay the creation of the final /bin /sbin /lib /lib64 compat links to a later release. The symbolic links created in %post and added to the filelist with %ghost provide the compatibility then.

Documentation

Release Notes

  • With this release, packages will not install files anymore in the following directories: /bin /sbin /lib /lib64 and /usr/sbin.
  • Fresh installations of this release, will have the following symbolic links in the toplevel directory:
 /bin -> usr/bin
 /sbin -> usr/sbin
 /lib -> usr/lib

and for 64bit architectures

 /lib64 -> usr/lib64

additionally there is

 /usr/sbin -> bin
  • If you update from a prior release, packages will install symbolic links from file locations in /bin /sbin /lib and /lib64, which they were located previously in to their counterpart in /usr. As soon as one of the toplevel directories only contains symbolic links, this directory can be removed and replaced by a symbolic link pointing to the corresponding directory in /usr.
  • To simplify the previous step, you can boot with the following boot options on the kernel command line, which will invoke a transition script in dracut:
    • "rdupdateusrmove" : will convert those directories, which only contain symbolic links to a symbolic link
    • "rdforceupdateusrmove": will rename those directories to a <dir>.bak and create the symbolic links

Comments and Discussion

FAQ

What problem are you trying to solve?

We want to make /usr shareable in a sane way.

Additional benefits of this feature are:

  • less clutter across the filesystem
  • if you snapshot /usr before updating, you have snapshotted the OS at once.

What is currently broken with having /usr as a separate partition?

http://www.freedesktop.org/wiki/Software/systemd/separate-usr-is-broken

I don’t have /usr as a separate partition. What changes for me?

Nothing changes in functionality. All the old paths are reachable, because there a compat symlinks in place, which will not go away (at least not in the near future). All your scripts and binaries should work, like they did before. For the upgrade process to work, you will find /sbin, /bin, /lib and /lib64 mostly containing symbolic links. As soon, as these directories only contain symbolic links, the whole directory is replaced by only one symbolic link. These three or four toplevel symbolic links will stay there as long as the linux elf loader ABI is defined with “/lib/ld-linux.so.2” or their architecture specific counterpart like “/lib64/ld-linux-x86-64.so.2”, and as long as scripts use “#!/bin/sh”.

I have /usr as a separate partition. What changes for me?

Not sure, how you managed to do that. In general, having /usr as a separate partition does not really work right now. See http://www.freedesktop.org/wiki/Software/systemd/separate-usr-is-broken. But with this feature implemented, things will now come back to a sane and supported way of having a /usr mount point.

Why don’t you fix the /usr situation by putting all the relevant binaries in /bin /sbin /lib and /lib64?

and

So, why don’t you just mount /usr from the initramfs and leave the files where they are?

Ok, so imagine you have a /usr mounted from a network location and you want to update a package. So maybe you mount the master copy of /usr on your master machine and update /usr with your package manager. Then you provide a new copy of the master /usr to the other machines, when they reboot. They all have the new updated /usr now. But what about /sbin /bin /lib and /lib64? They still have the old binaries. No glibc security update for them. So, every machine has to update these directories via rsync or such (rpm will not work with a readonly /usr). This doubles the maintenance to keep both parts of the system in sync.

You are doing it wrong! /bin and /sbin are there to rescue a broken /usr!

The most critical filesystem is /boot, because the kernel lives there. So the purpose of having /bin and /sbin for /usr repairing relied on _two_ working filesystems ( / and /boot). If either of them was broken, you were not able to rescue /usr. The role of the rescue system can easily be fulfilled by a rescue initramfs. So having the rescue initramfs in /boot, which contains the fsck utils, is in the same danger of becoming corrupted as the kernel. Now you only have to pull out your rescue CD, if /boot is corrupted and not if / is corrupted.

Then, let’s share /bin /sbin /lib /lib64 and /usr and mount them all from the initramfs!

Now, you get a feeling, that moving everything to /usr might make things easier....

Why don’t you move all /usr contents to / and forget about /usr?

Because this introduces a lot of new toplevel directories, which all have to be mount points then to be shared across other hosts.

Ok, but what about a root filesystem on the network and mounting local filesystems only?

Then you would share the toplevel directory hierarchy among all hosts. Hosts would need to mount /etc and /var for host-only versions. Especially /etc/fstab is not accessible, without adding information to the initramfs on how to mount it. For every host-only additional top level directory like /opt and /srv, you would have to have a mountpoint.

Can’t we just move things to /usr and merge */bin with */sbin maybe later?

We can do that, but then we would have the work twice for some packages and their %post section would generate even more compat symlinks for the transition process.

Why do you want to merge /sbin with /bin?

Because, we felt, that if you merge directories anyway, we can merge even more and get rid of things, which have no clear usage anymore. /sbin is not containing static binaries anymore. /sbin should not contain daemons, which are started by root manually. What is left, are mostly tools, which do not work for normal users, because they have no write access for the devices on which they operate. But most of them have read access and some of them really print valuable information even for the non-root user.

You might also want to read this excellent mail from Lennart Poettering

Why should root not start daemons from the terminal?

If you are root, then you can easily generate a service file and run your daemon in a clean managed way.

When services are spawned it is essential that this happens from a clean execution context that is fully detached from the context of the user that asks for it. Otherwise context settings of that user are inherited by the daemon that better should not be. Examples for this are: resource limits, environment variables and similar, more problematic are audit trails, SELinux contexts or scheduling settings. Some daemons are written in a style that resets many of these process settings on initialization. However this is never complete, simply because new process settings are added more quickly to the kernel's struct task than any userspace developer could make sure to reset. On top of that, some of the settings in struct task cannot be reset at all. The effective result is that the services are executed with inherited settings that better should not be inherited thus making behaviour of the daemon highly dependent on the user context starting it and skewing audit trails and suchlike. The solution is to fork off all services from a defined, clean execution context, like systemd does it, from PID1, so that all process settings are in a defined state and the same on every invocation.

If you just want to test a daemon, you can start it even with the daemon not being in your $PATH all the time. And if you want to regularly run it, then you really should make it a service, which can be started on your demand by the service manager (systemd).