(initial draft) |
(→Scope: add links to pull requests) |
||
(50 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
<!-- The actual name of your proposed change page should look something like: Changes/Your_Change_Proposal_Name. This keeps all change proposals in the same namespace --> | <!-- The actual name of your proposed change page should look something like: Changes/Your_Change_Proposal_Name. This keeps all change proposals in the same namespace --> | ||
= | = Package information on ELF objects = | ||
Package information on ELF objects | |||
== Summary == | == Summary == | ||
All binaries (executables and shared libraries) are annotated with an ELF note that identifies the rpm | All binaries (executables and shared libraries) are annotated with an ELF note that identifies the rpm for which this file was built. This allows binaries to be identified when they are distributed without any of the rpm metadata. `systemd-coredump` uses this to log package versions when reporting crashes. | ||
== Owner == | == Owner == | ||
Line 12: | Line 12: | ||
This should link to your home wiki page so we know who you are. | This should link to your home wiki page so we know who you are. | ||
--> | --> | ||
* Name: [[User:Zbyszek|Zbigniew Jędrzejewski-Szmek]] | * Name: [[User:Zbyszek|Zbigniew Jędrzejewski-Szmek]], Lennart Poettering | ||
* Email: zbyszek@in.waw.pl, mzsrqben@0pointer.net | |||
* Email: zbyszek@in.waw.pl | |||
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo) | <!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo) | ||
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address> | * FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address> | ||
Line 20: | Line 19: | ||
== Current status == | == Current status == | ||
[[Category: | [[Category:ChangeAcceptedF36]] | ||
[[Category:SystemWideChange]] | [[Category:SystemWideChange]] | ||
* Targeted release: [[Releases/ | * Targeted release: [[Releases/36 | Fedora 36 ]] | ||
* Last updated: <!-- this is an automatic macro — you don't need to change this line --> {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}} | * Last updated: <!-- this is an automatic macro — you don't need to change this line --> {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}} | ||
<!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page | <!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page | ||
Line 31: | Line 30: | ||
ON_QA -> change is fully code complete | ON_QA -> change is fully code complete | ||
--> | --> | ||
* FESCo issue: | * FESCo issue: [https://pagure.io/fesco/issue/2687 #2687] | ||
* Tracker bug: | * Tracker bug: [https://bugzilla.redhat.com/show_bug.cgi?id=1956946 #1956946] | ||
* Release notes tracker: | * Release notes tracker: [https://pagure.io/fedora-docs/release-notes/issue/769 #769] | ||
== Detailed Description == | == Detailed Description == | ||
People mix binaries (programs and libraries) from different distributions (for example using Fedora containers on Debian or vice versa), and distribute binaries without packaging metadata (for example by stripping everything except the binary from a container image, also removing `/usr/lib/.build-id/*`), compile their own rpm packages (for internal distribution and installation), and compile and distribute their own binaries. Sometimes we need to introspect a binary and figure out its provenance, for example when a program crashes and we are looking at a core dump, but also when we have a binary without the packaging metadata. We have some very good mechanisms to show the provenance: when a file is installed through the package manager we can directly list the providing package, but even without this we can use build-ids embedded in the binary to uniquely identify the originating build. But those mechanisms work best when we're in the realm of a single distribution. In particular, build-ids can be easily tied to a source rpm, but only when the source rpm is part of the distribution and the build-id was registered in the appropriate database which maps build-ids to real package names. When we move outside of the realm of a single distribution, it can be hard to figure out where a given binary originates from. If we know that a binary is from a given distribution, we may be able to use some distro-specific mechanism to figure out this information. But those mechanisms will be different for different distributions and will often require network access. With this change we aim to provide a mechanism that is is very simple, provides a "human-readable" origin information without further processing, is portable across distros, and works without network access. | |||
The directly motivating '''use case is display of core dumps'''. Right now we have build-ids, but those are just opaque hexadecimal numbers that are not meaningful to users. We would like to immediately list versions of packages involved in the crash (including both the program and any libraries it links to). It is not enough to query the rpm database to do the equivalent of `rpm -qf …`: very often programs crash after some packages have been upgraded and the binaries loaded into memory are not the binaries that are currently present on disk, or when through some mishap, the binaries on disk do not match the installed rpms. A mechanism that works without rpm database lookup or network access allows this information to be showed immediately in `coredumpctl` listings and journal entries about the crash. This includes crashes that happen in the initrd and sandboxed systems. | |||
A second motivating '''use case is when users distribute their own binaries''' and would like to collect crash information. Build-ids are a solution that is technically possible, but easy to get wrong in practice: users would need to immediately record the build-id after the build and store the mapping to program names, versions, and build number in some database. It's much easier to be able to record something during the build in the build product itself. | |||
A third motivating '''use case is the mixing of Fedora binaries with programs and libraries from other distributions''', both with our binaries being used as the base for foreign binaries, and the other way around. Whilst most distributions provide some mechanism to figure out the source build information, those mechanisms vary by distribution and may not be easy to access from a "foreign" system. Such mixing is expected with containers, flatpaks, snaps, Python binary wheels, anaconda packages, and quite often when somebody compiles a binary and puts it up on the web for other people to download. | |||
'''We propose a new mechanism which is designed to be very simple but extensible: a small JSON document is embedded in a section in the ELF binary'''. This document '''can be easily read by a human''' if necessary, but it is also well-defined and '''can be processed programatically'''. For example, `systemd-coredump` will immediately make use of this to display package ''nevra'' information for crashes. The format is also '''easy to generate''', so it can be added to '''any build system''', either using the helpers that we provide or even reimplemented from scratch. | |||
For the case where we mix binaries from different distros (the third motivating use case above), this approach is the most useful when this system is used by all distros and even non-distro builds. The more widely it is used, the more useful it becomes. The specification was developed in collaboration with Debian developers, and we hope that Fedora and Debian will lead the way for this to become as widely used as build-ids. But even if the information is only available from some distros, it is still useful, except that fallback mechanisms need to be implemented. | |||
=== Existing system: `.note.gnu.build-id` === | |||
We already have build-ids: every ELF object has a `.note.gnu.build-id` note, and given a core file, we can read the build-id and look it up in the rpm database (`dnf repoquery --whatprovides debuginfo(build-id) = …`) to map it to a package name. | We already have build-ids: every ELF object has a `.note.gnu.build-id` note, and given a core file, we can read the build-id and look it up in the rpm database (`dnf repoquery --whatprovides debuginfo(build-id) = …`) to map it to a package name. | ||
Build-ids | Build-ids are unique and compact and very generic and work as expected in general. But they have some downsides: | ||
* build-ids are not very informative for users. Before the build-id is converted back to the appropriate package, it's | * build-ids are not very informative for users. Before the build-id is converted back to the appropriate package, it's completely opaque. | ||
* build-ids require a working rpm database to map to the package name. | * build-ids require a working rpm database or an internet connection to map to the package name. | ||
Three important cases: | |||
* minimal containers: the rpm database is not installed in the containers. The information about build-ids needs to be stored externally, so package name information is not available immediately, but only after offline processing. The new note doesn't depend on the rpm db in any way. | * minimal containers: the rpm database is not installed in the containers. The information about build-ids needs to be stored externally, so package name information is not available immediately, but only after offline processing. The new note doesn't depend on the rpm db in any way. | ||
* self-built and external packages: unless a lot of care is taken to keep access to the debuginfo packages, this information may be lost. The new note is available even if the repository metadata gets lost. Users can easily provide equivalent information in a format that makes sense in their own environment. It should work even when rpms and | * handling of a core from a container, where the container and host have different distros | ||
* self-built and external packages: unless a lot of care is taken to keep access to the debuginfo packages, this information may be lost. The new note is available even if the repository metadata gets lost. Users can easily provide equivalent information in a format that makes sense in their own environment. It should work even when rpms and debs and other formats are mixed, e.g. during container image creation. | |||
=== New system: `.note.package` === | |||
The new note is created and propagated similarly to `.note.gnu.build-id`. The difference is that we inject the information about package ''nevra'' from the build system. | |||
The implementation is very simple: `%{build_ldflags}` are extended with a command to insert a custom note as a separate section in an ELF object. See [https://github.com/systemd/package-notes/blob/main/hello.spec hello.spec] for an example. This is done in the default macros, so all packages that use the prescribed link flags will be affected. | |||
The note is a compact json string. This allows the format to be trivially extensible (new fields can be added at will), easy to process (json is extremely popular and parsers are widely available). Using a single field rather than a set of separated notes is more space-efficient. With multiple fields the padding and alignment requirements cause unnecessary overhead. | |||
The | The system was designed with cross-distro collaboration and is flexible enough to identify binaries from different packaging formats and build systems (rpms, debs, custom binaries). | ||
The overhead is | See https://systemd.io/COREDUMP_PACKAGE_METADATA/ for detailed description of the format. | ||
If we do this | |||
One of the advantages of using an ELF note, as opposed to say a series of extended attributes on the binary itself, is that the ELF note gets automatically captured and copied into a core file by the kernel. Extended attributes would have to be copied manually, which might not even be possible because the binary on disk may have been removed by the time the crash is analyzed. | |||
Precise measurements TBD once we | |||
The overhead is about 200 bytes for each ELF object. | |||
We have about overall 33200 files in `/usr/s?bin/` and about 36600 `.so` files (F35, single architecture, | |||
results from `dnf repoquery -l 2>/dev/null | rg '^/usr/s?bin/' | sort -u | wc -l`, | |||
`dnf repoquery -l 2>/dev/null | rg '^/usr/lib64/.*\.so$' |sort -u|wc -l`). | |||
If we do this for the whole distro, we get 69800 × 200 = 13 MB. | |||
For a typical installation, we can expect about 300–400 kB. | |||
Thus the overhead of additionally used space is neglible (also see the Feedback section for more discussion). | |||
Precise measurements TBD once this is turned on and we have real | |||
measurements for a larger number of builds. | |||
=== Examples === | |||
<pre> | |||
$ objdump -s -j .note.package build/libhello.so | |||
build/libhello.so: file format elf64-x86-64 | |||
Contents of section .note.package: | |||
02ec 04000000 63000000 7e1afeca 46444f00 ....c...~...FDO. | |||
02fc 7b227479 7065223a 2272706d 222c226e {"type":"rpm","n | |||
030c 616d6522 3a226865 6c6c6f22 2c227665 ame":"hello","ve | |||
031c 7273696f 6e223a22 302d312e 66633335 rsion":"0-1.fc35 | |||
032c 2e783836 5f363422 2c226f73 43706522 .x86_64","osCpe" | |||
033c 3a226370 653a2f6f 3a666564 6f726170 :"cpe:/o:fedorap | |||
034c 726f6a65 63743a66 65646f72 613a3333 roject:fedora:33 | |||
035c 227d0000 "}.. | |||
</pre> | |||
<pre> | |||
$ readelf --notes build/hello | grep "description data" | sed -e "s/\s*description data: //g" -e "s/ //g" | xxd -p -r | jq | |||
readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x10de | |||
readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x10af | |||
readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x119f | |||
{ | |||
"type": "rpm", | |||
"name": "hello", | |||
"version": "0-1.fc35.x86_64", | |||
"osCpe": "cpe:/o:fedoraproject:fedora:33" | |||
} | |||
</pre> | |||
<pre> | |||
$ coredumpctl info | |||
PID: 44522 (fsverity) | |||
... | |||
Package: fsverity-utils/1.3-1 | |||
build-id: ac89bf7175b04d7eec7f6544a923f45be111f0be | |||
Message: Process 44522 (fsverity) of user 1000 dumped core. | |||
Found module /home/bluca/git/fsverity-utils/libfsverity.so.0 with build-id: fa40fdfb79aea84167c98ca8a89add9ac4f51069 | |||
Metadata for module /home/bluca/git/fsverity-utils/libfsverity.so.0 owned by FDO found: { | |||
"packageType" : "deb", | |||
"package" : "fsverity-utils", | |||
"packageVersion" : "1.3-1" | |||
} | |||
Found module linux-vdso.so.1 with build-id: aba08e06103f725e26f1d7c178fb6b76a564a35d | |||
Found module libpthread.so.0 with build-id: e91114987a0147bd050addbd591eb8994b29f4b3 | |||
Found module libdl.so.2 with build-id: d3583c742dd47aaa860c5ae0c0c5bdbcd2d54f61 | |||
Found module ld-linux-x86-64.so.2 with build-id: f25dfd7b95be4ba386fd71080accae8c0732b711 | |||
Found module libcrypto.so.1.1 with build-id: 749142d5ee728a76e7cdc61fd79d2311a77405a2 | |||
Found module libc.so.6 with build-id: 18b9a9a8c523e5cfe5b5d946d605d09242f09798 | |||
Found module fsverity with build-id: ac89bf7175b04d7eec7f6544a923f45be111f0be | |||
Metadata for module fsverity owned by FDO found: { | |||
"packageType" : "deb", | |||
"package" : "fsverity-utils", | |||
"packageVersion" : "1.3-1" | |||
} | |||
Stack trace of thread 44522: | |||
#0 0x00007fe7c8af26f4 __GI___nanosleep (libc.so.6 + 0xc66f4) | |||
#1 0x00007fe7c8af262a __sleep (libc.so.6 + 0xc662a) | |||
#2 0x00005608481407dd main (fsverity + 0x27dd) | |||
#3 0x00007fe7c8a5009b __libc_start_main (libc.so.6 + 0x2409b) | |||
#4 0x000056084814094a _start (fsverity + 0x294a) | |||
</pre> | |||
== Feedback == | == Feedback == | ||
<!-- Summarize the feedback from the community and address why you chose not to accept proposed alternatives. This section is optional for all change proposals but is strongly suggested. Incorporating feedback here as it is raised gives FESCo a clearer view of your proposal and leaves a good record for the future. If you get no feedback, that is useful to note in this section as well. For innovative or possibly controversial ideas, consider collecting feedback before you file the change proposal. --> | <!-- Summarize the feedback from the community and address why you chose not to accept proposed alternatives. This section is optional for all change proposals but is strongly suggested. Incorporating feedback here as it is raised gives FESCo a clearer view of your proposal and leaves a good record for the future. If you get no feedback, that is useful to note in this section as well. For innovative or possibly controversial ideas, consider collecting feedback before you file the change proposal. --> | ||
See [https://github.com/systemd/systemd/issues/18433 systemd issue #18433] for upstream discussion and implementation proposals. | |||
=== Concerns about additional changes to files === | |||
Concerns regarding the effect on RPMCoW were raised [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/4Z652ASFD4ZYF3HUWO2KPLVSZDGUDFJH/ on the mailing list]. | |||
Also on IRC the following concerns were raised about changes to ELF files: | |||
<pre> | |||
17:32:30 <Eighth_Doctor> I think zbyszek underestimates how much of a problem it is to stamp every ELF binary with ''nevra'' data | |||
17:32:44 <mhroncok> zbyszek: so, assuming python has ~100 ELF .so files and I change one text file | |||
17:33:22 <mhroncok> (ignore for the time being that the .so files often changed because of toolchain updates and assume they are stable) | |||
</pre> | |||
'''ELF files in Fedora already vary between different package versions'''. The official version of the package is embedded in the `.gnu_debuglink` section since the [https://fedoraproject.org/wiki/Changes/ParallelInstallableDebuginfo Parallel Installable Debuginfo proposal] was adopted. And since that varies between rebuilds, `.note.gnu.build-id` link which is calculated over all sections ''also'' varies. Thus every file has two sections that vary, and we're adding a third. | |||
<small> | |||
<pre> | |||
$ readelf -W -p .gnu_debuglink /usr/bin/true | |||
String dump of section '.gnu_debuglink': | |||
[ 0] true-8.32-31.fc35.x86_64.debug | |||
[ 21] ?�� | |||
</pre> | |||
I tested this with python3.10. So far there are 13 builds of that package in F35: | |||
`python3.10-3.10.0-1.fc35`, | |||
`python3.10-3.10.0~a6-1.fc35`, | |||
`python3.10-3.10.0~a6-2.fc35`, | |||
`python3.10-3.10.0~a7-1.fc35`, | |||
`python3.10-3.10.0~b1-1.fc35`, | |||
`python3.10-3.10.0~b2-2.fc35`, | |||
`python3.10-3.10.0~b2-3.fc35`, | |||
`python3.10-3.10.0~b3-1.fc35`, | |||
`python3.10-3.10.0~b4-1.fc35`, | |||
`python3.10-3.10.0~b4-2.fc35`, | |||
`python3.10-3.10.0~b4-3.fc35`, | |||
`python3.10-3.10.0~rc1-1.fc35`, | |||
`python3.10-3.10.0~rc2-1.fc35`. | |||
I extracted the builds (for `.x86_64`) and made a list of all `.so` files (1368 files), and calculated sha256 hashes for them. No two files repeat, there are 1368 distinct hashes. So the files are indeed different between builds. | |||
Note that this range of Python versions encompasses periods when the package is under development and undergoes significant changes (alpha versions), and when it's only undergoing small changes (rc versions). | |||
The fact that we get different files in each build is not surprising, but even if we ignore debuglink/build-id, binaries differ between builds ''anyway''. Even sizes tend to vary between builds: there are 636 distinct `.so` file sizes, i.e. on average any given size only repeats twice (presumably most often for the same file). Running `diffoscope` on `.so` files from different builds shows small changes in the assembly which I did not analyze further. | |||
If people have specific questions, for example about overhead in some scenario, I'd be happy to answer them. Until now, the issues that were raised were very vague, so it's impossible to answer them. | |||
</small> | |||
==== Won't this affect the Reproducible Builds effort? ==== | |||
No! | |||
[https://reproducible-builds.org/docs/definition/ Reproducible Builds are defined] as being able to get the same output binary given the same inputs. If the metadata described here changes, it can only happen because the input (sources) have changed too, so the binary is allowed to change as well. Reproducible builds are not about minimizing changes between builds of different versions, but about having identical build results for the same version in the same build environment. '''Given the same source version, the metadata will be the same, and hence the ELF note will be the same too.''' | |||
<small> | |||
Furthermore, for each package, only the version field should change, the rest is fixed, at least with the proposed set of metadata. But many packages already embed not only the upstream version, but also the distro revision in their binaries, so an unknown number of them would change anyway given a new revision. For example: | |||
<pre> | |||
$ gcc --version | head -n1 | |||
gcc (Debian 11.2.0-10) 11.2.0 | |||
$ clang --version | head -n1 | |||
Debian clang version 11.1.0-4 | |||
$ qemu-system-x86_64 --version | head -n1 | |||
QEMU emulator version 6.1.0 (Debian 1:6.1+dfsg-8) | |||
</pre> | |||
</small> | |||
=== Why not just use the rpm database? === | |||
<pre> | |||
17:34:33 <dcantrell> The main reason for this appears to be that we need the RPM db locally to resolve build-ids to package names. But since containers wipe /var/lib/rpm, we can't do that. So the solution is to put the ''nevra'' in ELF metadata? | |||
17:34:39 <dcantrell> That feels like the wrong approach. | |||
</pre> | |||
First, there are legitimate reasons to strip packaging metadata from images. For example, for an initrd image from rpms, I get 117 MB of files (without compression), and out of this `/var/lib/rpm` is 5.9 MB, and `/var/lib/dnf` is 4.2 MB. This is an overhead of 9%. This is ''not much'', but still too much to keep in the image unless necessary. Similar ratios will happen for containers of similar size. Reducing image size by one tenth is important. There is no `rpm` or `dnf` in the image, to the package database is not even usable without external tools. As discussed on IRC (https://meetbot.fedoraproject.org/teams/fesco/fesco.2021-05-11-17.01.log.html), the containers ''we'' build don't wipe this metadata, but custom Dockerfiles do that. There are various cases where the '''rpm database is not accessible'''. | |||
Second, as described in Description section above, '''not everybody and everything uses rpm'''. The Fedora motto is "we make an operating system and we make it easy for you to do useful stuff with it" (and yes, this is an actual quote from the official docs), and this stuff involves reusing our binaries in containers and custom installations and whatnot, not just straightforward installations with `dnf`. And in the other direction, '''people will build their own binaries that are not packaged as packages'''. But it is still important to be able to figure out the exact version of a binary, especially after it crashes. | |||
=== Why not just use debuginfod? === | |||
Access to the network is not a given, and from the systemd-coredump point of view, it is undesirable from the sandbox that it runs in. There are also privacy issues to consider, as querying debuginfod servers may expose information (about what is running, in what versions, what is crashing, etc). Thus such queries need to be opt-in and under user control. | |||
However, the two features are not mutually exclusive, but in reality augment each other, in the cases where access to the network is allowed. While there has been an effort to build federated debuginfod servers, there is no guarantee that the build-id under scrutiny is referenced on one of those. Thus, adding the URL for the debuginfod to the package note would help ensuring that for each binary (and build-id), it is always known which debuginfod server to contact to fetch the relevant symbols. | |||
=== Why do this in Fedora? === | |||
<pre> | |||
17:36:49 <mhroncok> I don't understand how non-rpm distros and custom built binaries are affected by our rpm-build environment :/ | |||
</pre> | |||
The idea is that we inject this into our build system, and Debian injects this into their build system, and so on… As mentioned, this is '''a cross-distro effort'''. Also, people can use it in their custom build systems if they build and distribute binaries internally. The scheme would obviously be most useful if used comprehensively, but it's still useful when available partially. '''We hope that Fedora can lead the way.''' | |||
(This is similar to build-ids: when initially adopted, they were used only by some distros, but were useful even then. Nowadays, with comprehensive adoption, they are even more useful.) | |||
<small>https://hpc.guix.info/blog/2021/09/whats-in-a-package/ contains a nice description of a pathological case of packaging hacks and binary redistribution. When trying to unravel something like this, information embedded directly in the binaries would be quite useful.</small> | |||
For the case of non-native programs on a distro, there is no expectation that the distro will provide support. In this scenario, the proposal targets end users and/or third party software providers, who will benefit from identifying program versions more easily. | |||
But a special case occurs when you have e.g. a Fedora container running on a Debian host, and the user wants to report a crash from the process inside the container. When the core dump is analyzed on the host, the software figures out that it's some Fedora package. That bug could be manually reported to Fedora and one would expect it to be handled, because the container environment is mostly abstracted from the host and the container interacts just with the kernel. | |||
=== Further adoption === | |||
mizdebsk: "I will be proposing a '''separate system-wide F36 change for embedding package NVR inside Java JAR files''' iff the ELF change is accepted" [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/FMJM7PEZL6WI424GJEETGEUO5E5XHRHF/ link]. | |||
flussence: "This looks like it'd be '''really useful in Gentoo'''. Right now all of this info is stored in the package database (a tree of flat files with no index) and it takes a linear scan of that to find out who owns a given library - which happens a lot due to the safety nets they've added over the years. Often the package manager overhead dwarfs the time spent actually compiling things." [https://lwn.net/Articles/875138/ link] | |||
== Benefit to Fedora == | == Benefit to Fedora == | ||
A simple and reliable way to gather information about package versions of | A simple and reliable way to gather information about package versions of programs is added. | ||
It enhances, instead of replacing, the existing mechanisms. | |||
It is particularly useful when reporting crash dumps, but can also be used for image introspection and forensincs, license checks and version scans on containers, etc. | |||
If we adopt this in Fedora, Fedora leads the way on implementing the standard. Fedora binaries used in any context can be easily recognized. Fedora binaries provide a better basis to build things. | |||
If other distros adopt this, we can introspect and report on those binaries easily within the Fedora context. For example, when somebody is using a container with some programs that originate in the Debian ecosystem, we would be able to identify those programs without tools like `apt` or `dpkg-query`. Core dump analaysis executed in the Fedora host can easily provide useful information about programs from foreign builds. | |||
== Implementation in Other Distributions == | |||
=== Microsoft CBL-Mariner === | |||
[https://en.wikipedia.org/wiki/CBL-Mariner CBL-Mariner] is an [https://github.com/microsoft/CBL-Mariner open source] Linux distribution created by Microsoft, targeted at first-party and container workloads on Azure. It is used both as a container runner host and a base container image. | |||
Mariner adopted the ELF stamping packaging metadata spec in [https://github.com/microsoft/CBL-Mariner/blob/1.0/SPECS/mariner-rpm-macros/gen-ld-script.sh version 1.0], initially to add OS metadata, and [https://github.com/microsoft/CBL-Mariner/commit/3c22062735a1a660963c5bd200d138721e2e10ab package-level metadata will be added in a following release]. | |||
=== Debian === | |||
A package-level proof-of-concept is included in the [https://github.com/systemd/package-notes/blob/main/dh_package_notes package-notes] repository. | |||
A [https://salsa.debian.org/bluca/debhelper/-/tree/notes_metadata system-level proof-of-concept] that enables ELF stamping by default in all builds implicitly will be proposed for adoption in the future. | |||
== Scope == | == Scope == | ||
* Proposal owners: | * Proposal owners: | ||
** create a specification | ** create a specification (First version DONE: [https://systemd.io/COREDUMP_PACKAGE_METADATA COREDUMP_PACKAGE_METADATA]. We might need to make some adjustments based on the deployment in Fedora, but no big changes are expected.) | ||
** provide a patch for `redhat-rpm-config` to insert appropriate compilation options | ** write a script to generate the package note (First version DONE: [https://github.com/systemd/package-notes/blob/main/generate-package-notes.py generate-package-notes.py]) | ||
** extend systemd's coredumpctl to extract and display this information | ** provide a patch for `redhat-rpm-config` to insert appropriate compilation options [https://src.fedoraproject.org/rpms/redhat-rpm-config/pull-request/167 redhat-rpm-config PR #167], [https://pagure.io/fedora-comps/pull-request/704 comps PR #704] | ||
** extend systemd's coredumpctl to extract and display this information (DONE: [https://github.com/systemd/systemd/pull/19135 PR #19135], available in systemd-249) | |||
** submit pull request to Packaging Guidelines | ** submit pull request to Packaging Guidelines | ||
Line 72: | Line 284: | ||
** possibly add support in abrt? | ** possibly add support in abrt? | ||
* Release engineering: | * Release engineering: There should be no impact. | ||
There should be no impact. | |||
* Policies and guidelines: | * Policies and guidelines: | ||
Line 88: | Line 299: | ||
== How To Test == | == How To Test == | ||
<pre> | |||
$ bash -c 'kill -SEGV $$' | $ bash -c 'kill -SEGV $$' | ||
$ coredumpctl | $ coredumpctl | ||
TIME PID UID GID SIG COREFILE | TIME PID UID GID SIG COREFILE EXE SIZE PACKAGE | ||
Mon 2021-03-01 14:37:22 CET 855151 1000 1000 SIGSEGV present | Mon 2021-03-01 14:37:22 CET 855151 1000 1000 SIGSEGV present /usr/bin/bash 51.7K bash-5.1.0-2.fc34.x86_64 | ||
</pre> | |||
== User Experience == | == User Experience == | ||
`coredumpctl` should display information about package versions. | `coredumpctl` should display information about package versions. | ||
`readelf --notes` or similar tools can be used on `.so` files and compiled programs | |||
to extract the JSON blurb that describes the originating package. | |||
== Dependencies == | == Dependencies == | ||
Line 107: | Line 323: | ||
== Documentation == | == Documentation == | ||
* https://systemd.io/COREDUMP_PACKAGE_METADATA/ | |||
* https://github.com/systemd/package-notes | |||
See also [[Changes/DebuginfodByDefault]]. | |||
== Release Notes == | == Release Notes == |
Latest revision as of 21:18, 13 January 2022
Package information on ELF objects
Summary
All binaries (executables and shared libraries) are annotated with an ELF note that identifies the rpm for which this file was built. This allows binaries to be identified when they are distributed without any of the rpm metadata. systemd-coredump
uses this to log package versions when reporting crashes.
Owner
- Name: Zbigniew Jędrzejewski-Szmek, Lennart Poettering
- Email: zbyszek@in.waw.pl, mzsrqben@0pointer.net
Current status
- Targeted release: Fedora 36
- Last updated: 2022-01-13
- FESCo issue: #2687
- Tracker bug: #1956946
- Release notes tracker: #769
Detailed Description
People mix binaries (programs and libraries) from different distributions (for example using Fedora containers on Debian or vice versa), and distribute binaries without packaging metadata (for example by stripping everything except the binary from a container image, also removing /usr/lib/.build-id/*
), compile their own rpm packages (for internal distribution and installation), and compile and distribute their own binaries. Sometimes we need to introspect a binary and figure out its provenance, for example when a program crashes and we are looking at a core dump, but also when we have a binary without the packaging metadata. We have some very good mechanisms to show the provenance: when a file is installed through the package manager we can directly list the providing package, but even without this we can use build-ids embedded in the binary to uniquely identify the originating build. But those mechanisms work best when we're in the realm of a single distribution. In particular, build-ids can be easily tied to a source rpm, but only when the source rpm is part of the distribution and the build-id was registered in the appropriate database which maps build-ids to real package names. When we move outside of the realm of a single distribution, it can be hard to figure out where a given binary originates from. If we know that a binary is from a given distribution, we may be able to use some distro-specific mechanism to figure out this information. But those mechanisms will be different for different distributions and will often require network access. With this change we aim to provide a mechanism that is is very simple, provides a "human-readable" origin information without further processing, is portable across distros, and works without network access.
The directly motivating use case is display of core dumps. Right now we have build-ids, but those are just opaque hexadecimal numbers that are not meaningful to users. We would like to immediately list versions of packages involved in the crash (including both the program and any libraries it links to). It is not enough to query the rpm database to do the equivalent of rpm -qf …
: very often programs crash after some packages have been upgraded and the binaries loaded into memory are not the binaries that are currently present on disk, or when through some mishap, the binaries on disk do not match the installed rpms. A mechanism that works without rpm database lookup or network access allows this information to be showed immediately in coredumpctl
listings and journal entries about the crash. This includes crashes that happen in the initrd and sandboxed systems.
A second motivating use case is when users distribute their own binaries and would like to collect crash information. Build-ids are a solution that is technically possible, but easy to get wrong in practice: users would need to immediately record the build-id after the build and store the mapping to program names, versions, and build number in some database. It's much easier to be able to record something during the build in the build product itself.
A third motivating use case is the mixing of Fedora binaries with programs and libraries from other distributions, both with our binaries being used as the base for foreign binaries, and the other way around. Whilst most distributions provide some mechanism to figure out the source build information, those mechanisms vary by distribution and may not be easy to access from a "foreign" system. Such mixing is expected with containers, flatpaks, snaps, Python binary wheels, anaconda packages, and quite often when somebody compiles a binary and puts it up on the web for other people to download.
We propose a new mechanism which is designed to be very simple but extensible: a small JSON document is embedded in a section in the ELF binary. This document can be easily read by a human if necessary, but it is also well-defined and can be processed programatically. For example, systemd-coredump
will immediately make use of this to display package nevra information for crashes. The format is also easy to generate, so it can be added to any build system, either using the helpers that we provide or even reimplemented from scratch.
For the case where we mix binaries from different distros (the third motivating use case above), this approach is the most useful when this system is used by all distros and even non-distro builds. The more widely it is used, the more useful it becomes. The specification was developed in collaboration with Debian developers, and we hope that Fedora and Debian will lead the way for this to become as widely used as build-ids. But even if the information is only available from some distros, it is still useful, except that fallback mechanisms need to be implemented.
Existing system: .note.gnu.build-id
We already have build-ids: every ELF object has a .note.gnu.build-id
note, and given a core file, we can read the build-id and look it up in the rpm database (dnf repoquery --whatprovides debuginfo(build-id) = …
) to map it to a package name.
Build-ids are unique and compact and very generic and work as expected in general. But they have some downsides:
- build-ids are not very informative for users. Before the build-id is converted back to the appropriate package, it's completely opaque.
- build-ids require a working rpm database or an internet connection to map to the package name.
Three important cases:
- minimal containers: the rpm database is not installed in the containers. The information about build-ids needs to be stored externally, so package name information is not available immediately, but only after offline processing. The new note doesn't depend on the rpm db in any way.
- handling of a core from a container, where the container and host have different distros
- self-built and external packages: unless a lot of care is taken to keep access to the debuginfo packages, this information may be lost. The new note is available even if the repository metadata gets lost. Users can easily provide equivalent information in a format that makes sense in their own environment. It should work even when rpms and debs and other formats are mixed, e.g. during container image creation.
New system: .note.package
The new note is created and propagated similarly to .note.gnu.build-id
. The difference is that we inject the information about package nevra from the build system.
The implementation is very simple: %{build_ldflags}
are extended with a command to insert a custom note as a separate section in an ELF object. See hello.spec for an example. This is done in the default macros, so all packages that use the prescribed link flags will be affected.
The note is a compact json string. This allows the format to be trivially extensible (new fields can be added at will), easy to process (json is extremely popular and parsers are widely available). Using a single field rather than a set of separated notes is more space-efficient. With multiple fields the padding and alignment requirements cause unnecessary overhead.
The system was designed with cross-distro collaboration and is flexible enough to identify binaries from different packaging formats and build systems (rpms, debs, custom binaries).
See https://systemd.io/COREDUMP_PACKAGE_METADATA/ for detailed description of the format.
One of the advantages of using an ELF note, as opposed to say a series of extended attributes on the binary itself, is that the ELF note gets automatically captured and copied into a core file by the kernel. Extended attributes would have to be copied manually, which might not even be possible because the binary on disk may have been removed by the time the crash is analyzed.
The overhead is about 200 bytes for each ELF object.
We have about overall 33200 files in /usr/s?bin/
and about 36600 .so
files (F35, single architecture,
results from dnf repoquery -l 2>/dev/null | rg '^/usr/s?bin/' | sort -u | wc -l
,
dnf repoquery -l 2>/dev/null | rg '^/usr/lib64/.*\.so$' |sort -u|wc -l
).
If we do this for the whole distro, we get 69800 × 200 = 13 MB.
For a typical installation, we can expect about 300–400 kB.
Thus the overhead of additionally used space is neglible (also see the Feedback section for more discussion).
Precise measurements TBD once this is turned on and we have real measurements for a larger number of builds.
Examples
$ objdump -s -j .note.package build/libhello.so build/libhello.so: file format elf64-x86-64 Contents of section .note.package: 02ec 04000000 63000000 7e1afeca 46444f00 ....c...~...FDO. 02fc 7b227479 7065223a 2272706d 222c226e {"type":"rpm","n 030c 616d6522 3a226865 6c6c6f22 2c227665 ame":"hello","ve 031c 7273696f 6e223a22 302d312e 66633335 rsion":"0-1.fc35 032c 2e783836 5f363422 2c226f73 43706522 .x86_64","osCpe" 033c 3a226370 653a2f6f 3a666564 6f726170 :"cpe:/o:fedorap 034c 726f6a65 63743a66 65646f72 613a3333 roject:fedora:33 035c 227d0000 "}..
$ readelf --notes build/hello | grep "description data" | sed -e "s/\s*description data: //g" -e "s/ //g" | xxd -p -r | jq readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x10de readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x10af readelf: build/hello: Warning: Gap in build notes detected from 0x1091 to 0x119f { "type": "rpm", "name": "hello", "version": "0-1.fc35.x86_64", "osCpe": "cpe:/o:fedoraproject:fedora:33" }
$ coredumpctl info PID: 44522 (fsverity) ... Package: fsverity-utils/1.3-1 build-id: ac89bf7175b04d7eec7f6544a923f45be111f0be Message: Process 44522 (fsverity) of user 1000 dumped core. Found module /home/bluca/git/fsverity-utils/libfsverity.so.0 with build-id: fa40fdfb79aea84167c98ca8a89add9ac4f51069 Metadata for module /home/bluca/git/fsverity-utils/libfsverity.so.0 owned by FDO found: { "packageType" : "deb", "package" : "fsverity-utils", "packageVersion" : "1.3-1" } Found module linux-vdso.so.1 with build-id: aba08e06103f725e26f1d7c178fb6b76a564a35d Found module libpthread.so.0 with build-id: e91114987a0147bd050addbd591eb8994b29f4b3 Found module libdl.so.2 with build-id: d3583c742dd47aaa860c5ae0c0c5bdbcd2d54f61 Found module ld-linux-x86-64.so.2 with build-id: f25dfd7b95be4ba386fd71080accae8c0732b711 Found module libcrypto.so.1.1 with build-id: 749142d5ee728a76e7cdc61fd79d2311a77405a2 Found module libc.so.6 with build-id: 18b9a9a8c523e5cfe5b5d946d605d09242f09798 Found module fsverity with build-id: ac89bf7175b04d7eec7f6544a923f45be111f0be Metadata for module fsverity owned by FDO found: { "packageType" : "deb", "package" : "fsverity-utils", "packageVersion" : "1.3-1" } Stack trace of thread 44522: #0 0x00007fe7c8af26f4 __GI___nanosleep (libc.so.6 + 0xc66f4) #1 0x00007fe7c8af262a __sleep (libc.so.6 + 0xc662a) #2 0x00005608481407dd main (fsverity + 0x27dd) #3 0x00007fe7c8a5009b __libc_start_main (libc.so.6 + 0x2409b) #4 0x000056084814094a _start (fsverity + 0x294a)
Feedback
See systemd issue #18433 for upstream discussion and implementation proposals.
Concerns about additional changes to files
Concerns regarding the effect on RPMCoW were raised on the mailing list.
Also on IRC the following concerns were raised about changes to ELF files:
17:32:30 <Eighth_Doctor> I think zbyszek underestimates how much of a problem it is to stamp every ELF binary with ''nevra'' data 17:32:44 <mhroncok> zbyszek: so, assuming python has ~100 ELF .so files and I change one text file 17:33:22 <mhroncok> (ignore for the time being that the .so files often changed because of toolchain updates and assume they are stable)
ELF files in Fedora already vary between different package versions. The official version of the package is embedded in the .gnu_debuglink
section since the Parallel Installable Debuginfo proposal was adopted. And since that varies between rebuilds, .note.gnu.build-id
link which is calculated over all sections also varies. Thus every file has two sections that vary, and we're adding a third.
$ readelf -W -p .gnu_debuglink /usr/bin/true String dump of section '.gnu_debuglink': [ 0] true-8.32-31.fc35.x86_64.debug [ 21] ?��
I tested this with python3.10. So far there are 13 builds of that package in F35:
python3.10-3.10.0-1.fc35
,
python3.10-3.10.0~a6-1.fc35
,
python3.10-3.10.0~a6-2.fc35
,
python3.10-3.10.0~a7-1.fc35
,
python3.10-3.10.0~b1-1.fc35
,
python3.10-3.10.0~b2-2.fc35
,
python3.10-3.10.0~b2-3.fc35
,
python3.10-3.10.0~b3-1.fc35
,
python3.10-3.10.0~b4-1.fc35
,
python3.10-3.10.0~b4-2.fc35
,
python3.10-3.10.0~b4-3.fc35
,
python3.10-3.10.0~rc1-1.fc35
,
python3.10-3.10.0~rc2-1.fc35
.
I extracted the builds (for .x86_64
) and made a list of all .so
files (1368 files), and calculated sha256 hashes for them. No two files repeat, there are 1368 distinct hashes. So the files are indeed different between builds.
Note that this range of Python versions encompasses periods when the package is under development and undergoes significant changes (alpha versions), and when it's only undergoing small changes (rc versions).
The fact that we get different files in each build is not surprising, but even if we ignore debuglink/build-id, binaries differ between builds anyway. Even sizes tend to vary between builds: there are 636 distinct .so
file sizes, i.e. on average any given size only repeats twice (presumably most often for the same file). Running diffoscope
on .so
files from different builds shows small changes in the assembly which I did not analyze further.
If people have specific questions, for example about overhead in some scenario, I'd be happy to answer them. Until now, the issues that were raised were very vague, so it's impossible to answer them.
Won't this affect the Reproducible Builds effort?
No!
Reproducible Builds are defined as being able to get the same output binary given the same inputs. If the metadata described here changes, it can only happen because the input (sources) have changed too, so the binary is allowed to change as well. Reproducible builds are not about minimizing changes between builds of different versions, but about having identical build results for the same version in the same build environment. Given the same source version, the metadata will be the same, and hence the ELF note will be the same too.
Furthermore, for each package, only the version field should change, the rest is fixed, at least with the proposed set of metadata. But many packages already embed not only the upstream version, but also the distro revision in their binaries, so an unknown number of them would change anyway given a new revision. For example:
$ gcc --version | head -n1 gcc (Debian 11.2.0-10) 11.2.0 $ clang --version | head -n1 Debian clang version 11.1.0-4 $ qemu-system-x86_64 --version | head -n1 QEMU emulator version 6.1.0 (Debian 1:6.1+dfsg-8)
Why not just use the rpm database?
17:34:33 <dcantrell> The main reason for this appears to be that we need the RPM db locally to resolve build-ids to package names. But since containers wipe /var/lib/rpm, we can't do that. So the solution is to put the ''nevra'' in ELF metadata? 17:34:39 <dcantrell> That feels like the wrong approach.
First, there are legitimate reasons to strip packaging metadata from images. For example, for an initrd image from rpms, I get 117 MB of files (without compression), and out of this /var/lib/rpm
is 5.9 MB, and /var/lib/dnf
is 4.2 MB. This is an overhead of 9%. This is not much, but still too much to keep in the image unless necessary. Similar ratios will happen for containers of similar size. Reducing image size by one tenth is important. There is no rpm
or dnf
in the image, to the package database is not even usable without external tools. As discussed on IRC (https://meetbot.fedoraproject.org/teams/fesco/fesco.2021-05-11-17.01.log.html), the containers we build don't wipe this metadata, but custom Dockerfiles do that. There are various cases where the rpm database is not accessible.
Second, as described in Description section above, not everybody and everything uses rpm. The Fedora motto is "we make an operating system and we make it easy for you to do useful stuff with it" (and yes, this is an actual quote from the official docs), and this stuff involves reusing our binaries in containers and custom installations and whatnot, not just straightforward installations with dnf
. And in the other direction, people will build their own binaries that are not packaged as packages. But it is still important to be able to figure out the exact version of a binary, especially after it crashes.
Why not just use debuginfod?
Access to the network is not a given, and from the systemd-coredump point of view, it is undesirable from the sandbox that it runs in. There are also privacy issues to consider, as querying debuginfod servers may expose information (about what is running, in what versions, what is crashing, etc). Thus such queries need to be opt-in and under user control.
However, the two features are not mutually exclusive, but in reality augment each other, in the cases where access to the network is allowed. While there has been an effort to build federated debuginfod servers, there is no guarantee that the build-id under scrutiny is referenced on one of those. Thus, adding the URL for the debuginfod to the package note would help ensuring that for each binary (and build-id), it is always known which debuginfod server to contact to fetch the relevant symbols.
Why do this in Fedora?
17:36:49 <mhroncok> I don't understand how non-rpm distros and custom built binaries are affected by our rpm-build environment :/
The idea is that we inject this into our build system, and Debian injects this into their build system, and so on… As mentioned, this is a cross-distro effort. Also, people can use it in their custom build systems if they build and distribute binaries internally. The scheme would obviously be most useful if used comprehensively, but it's still useful when available partially. We hope that Fedora can lead the way. (This is similar to build-ids: when initially adopted, they were used only by some distros, but were useful even then. Nowadays, with comprehensive adoption, they are even more useful.)
https://hpc.guix.info/blog/2021/09/whats-in-a-package/ contains a nice description of a pathological case of packaging hacks and binary redistribution. When trying to unravel something like this, information embedded directly in the binaries would be quite useful.
For the case of non-native programs on a distro, there is no expectation that the distro will provide support. In this scenario, the proposal targets end users and/or third party software providers, who will benefit from identifying program versions more easily. But a special case occurs when you have e.g. a Fedora container running on a Debian host, and the user wants to report a crash from the process inside the container. When the core dump is analyzed on the host, the software figures out that it's some Fedora package. That bug could be manually reported to Fedora and one would expect it to be handled, because the container environment is mostly abstracted from the host and the container interacts just with the kernel.
Further adoption
mizdebsk: "I will be proposing a separate system-wide F36 change for embedding package NVR inside Java JAR files iff the ELF change is accepted" link.
flussence: "This looks like it'd be really useful in Gentoo. Right now all of this info is stored in the package database (a tree of flat files with no index) and it takes a linear scan of that to find out who owns a given library - which happens a lot due to the safety nets they've added over the years. Often the package manager overhead dwarfs the time spent actually compiling things." link
Benefit to Fedora
A simple and reliable way to gather information about package versions of programs is added. It enhances, instead of replacing, the existing mechanisms. It is particularly useful when reporting crash dumps, but can also be used for image introspection and forensincs, license checks and version scans on containers, etc.
If we adopt this in Fedora, Fedora leads the way on implementing the standard. Fedora binaries used in any context can be easily recognized. Fedora binaries provide a better basis to build things.
If other distros adopt this, we can introspect and report on those binaries easily within the Fedora context. For example, when somebody is using a container with some programs that originate in the Debian ecosystem, we would be able to identify those programs without tools like apt
or dpkg-query
. Core dump analaysis executed in the Fedora host can easily provide useful information about programs from foreign builds.
Implementation in Other Distributions
Microsoft CBL-Mariner
CBL-Mariner is an open source Linux distribution created by Microsoft, targeted at first-party and container workloads on Azure. It is used both as a container runner host and a base container image. Mariner adopted the ELF stamping packaging metadata spec in version 1.0, initially to add OS metadata, and package-level metadata will be added in a following release.
Debian
A package-level proof-of-concept is included in the package-notes repository. A system-level proof-of-concept that enables ELF stamping by default in all builds implicitly will be proposed for adoption in the future.
Scope
- Proposal owners:
- create a specification (First version DONE: COREDUMP_PACKAGE_METADATA. We might need to make some adjustments based on the deployment in Fedora, but no big changes are expected.)
- write a script to generate the package note (First version DONE: generate-package-notes.py)
- provide a patch for
redhat-rpm-config
to insert appropriate compilation options redhat-rpm-config PR #167, comps PR #704 - extend systemd's coredumpctl to extract and display this information (DONE: PR #19135, available in systemd-249)
- submit pull request to Packaging Guidelines
- Other developers:
- possibly add support in abrt?
- Release engineering: There should be no impact.
- Policies and guidelines:
The new flags should be mentioned in Packaging Guidelines.
- Trademark approval: N/A (not needed for this Change)
N/A
- Alignment with Objectives:
It might be relevant for Minimization. Even though it increases the image size a tiny bit, it makes minimized images work a bit better.
Upgrade/compatibility impact
No impact.
How To Test
$ bash -c 'kill -SEGV $$' $ coredumpctl TIME PID UID GID SIG COREFILE EXE SIZE PACKAGE Mon 2021-03-01 14:37:22 CET 855151 1000 1000 SIGSEGV present /usr/bin/bash 51.7K bash-5.1.0-2.fc34.x86_64
User Experience
coredumpctl
should display information about package versions.
readelf --notes
or similar tools can be used on .so
files and compiled programs
to extract the JSON blurb that describes the originating package.
Dependencies
None.
Contingency Plan
- Contingency mechanism: Remove the new compilation flags. Rebuild any packages that were build with the new flags.
- Contingency deadline: Beta freeze.
- Blocks release? No.
Documentation
See also Changes/DebuginfodByDefault.