Parallel Installable Debuginfo
Summary
debuginfo packages can be installed in parallel to make it easier to observe what programs are doing or to debug when they have crashed. That way debugging, tracing or profiling programs can be done independent of whether they are 32bit, 64bit, a slightly newer or older version than currently installed or even from a different architecture.
Owner
- Name: Mark Wielaard
- Email: mjw@redhat.com
- Release notes owner:
Current status
- Targeted release: Fedora 25
- Last updated: 2016-03-03
- Tracker bug: <will be assigned by the Wrangler>
Detailed Description
Currently only one version of a debuginfo package can be installed for a given package. Even on a multi-lib system you cannot install the 64-bit and 32-bit versions of a debuginfo package in parallel (technically you sometimes can, because of RPM file coloring, the 64bit version of the .debug files win over the 32bit version - causing lots of confusion). But there are various situation where having multiple versions of the debuginfo package installed help with tracing, profiling, debugging and/or crash analysis (see the Benefit to Fedora section below). There are various things provided by a debuginfo file that might conflict preventing parallel installation of different versions:
- build-id file /usr/lib/debug/.build-id/xx/yyyy...yyy which is a symlink to the main ELF file.
- build-id.debug file /usr/lib/debug/.build-id/xx/yyyy...yyy.debug which is a symlink to the .debug ELF file.
- The .debug files under /usr/lib/debug/ with file path names mirroring the main ELF file paths under / with .debug added.
- The source files under /usr/src/debug/<name>-<version>/
They can be made non-conflicting in the following ways:
- The main build-id file should not be in the debuginfo file, but in the main package (this was always a problem since the package and debuginfo package installed might not match). If we want to make usr/lib/debug/ a network resource then we will need to move the symlink to another location (maybe /usr/lib/.build-id). Unfortunately this means a change will be necessary for debuginfo consumers to that depend on the old location. We could keep the old symlink and point it to the new location to work around it. But I will audit the consumers to see which depend on it and discuss if we can have a new standard location.
- build-ids are globally unique identifiers. They will be different across arches. But might match between minor releases if the exact same ELF image is produced. The linker will get an option to hash in the full nvr to make sure all build-ids are always fully unique.
- The .debug file names will be changed to main ELF file name-vr.debug. This name will also be set in the .gnu_debuglink section of the main file by changing the options given to eu-strip in the rpm find-debuginfo.sh script.
- The source files will be moved under /usr/src/debug/<name>-<version>-<release>.<arch>/. This needs changes to the rpm debugedit program which rewrites the DWARF source file information.
These changes will make all files in any debuginfo file unique so they don't conflict when installed in parallel. There should be no changes necessary to programs (gdb, perf, valgrind, systemtap, systemd-coredump, eu-stack, abrt-hook-ccpp, etc.) that use build-ids or .gnu_debuglink to lookup DWARF debug information and source references for tracing, profiling and debugging.
It would be good to tweak dnf debuginfo-install to know about parallel installable debuginfo packages and maybe have an easy option to install the debuginfo for a core file or for the packages running in a container.
Alternative solutions currently rejected:
- Move main ELF image build-id file under /usr/lib/.build-id/xx/yyyy...yyy when moving into main pages. Because existing programs probably depend on the link being under /usr/lib/debug/.
- Since when the build-id is identifical also the ELF file is identical we could mark all build-id.debug files as replacable in the rpm. It isn't clear that works for symlinks though (but we could reverse the symlink direction from debug file to build-id file). And currently you can identify the exact package nvr installed given just one build-id. That would be impossible if multiple packages could contain the same build-id/ELF image file.
- Do away with the old .gnu_debuglink way of accessing files under /usr/lib/debug and just not install .debug files and only support build-id based debug lookups. Because it isn't clear build-ids are 100% available and all programs work with build-id lookups instead through .gnu_debuglink names.
- Move the .debug files under a subdir like the sources. /usr/lib/debug/<name>-<version>-<release>.<arch>/. This cannot easily be expressed in .gnu_debuglink, which officially only allows a basename.
Benefit to Fedora
By having the possibily of installing different versions of debuginfo packages on a Fedora system it will be easier for users and developers to trace, profile or debug issues in programs not installed on the main Fedora system. For example observing a 32bit program running on a 64bit architecture, introspecting packages running in containers that might have different versions installed than the host system and analysing coredumps of older or newer programs than are installed on the main system.
It also makes it possible to have the debuginfo package files under /usr/lib/debug and /usr/src/debug become a network resource shared to clients that don't necessarily have exactly the same versions of software installed.
Scope
- Proposal owners: Patches need to be developed against the linker (binutils ld and gold) to accept a hash value to seed the build-id calculation, against rpm debugedit to rewrite source paths (currently source paths can only be smaller, this change might create larger paths) and the rpm find-debuginfo.sh script to change the paths, symlinks and .gnu_debuglink names as outlined in the Detailed Description. And the dnf debuginfo-install plugin might be patches to provide subcommands for pulling in debuginfo packages found by build-id in core files and/or programs running in containers.
- Other developers: Upstream binutils, rpm and dnf maintainers have to review the proposed patches. If accepted the package maintainers will have to decide whether those patches can be backported for the next fedora release. Once all changes are in a package debuginfo needs to be regenerated before it becomes parallel installable.
- Release engineering: Needs to be discussed. In theory no changes apart from those listed above are needed. But if we want to support installing cross-architectures (not just multi-lib arch) debuginfo then some way needs to be found to get those in the right repodata.
- List of deliverables: N/A (Still Unknown)
- Policies and guidelines: No changes, the debuginfo related rpm macros won't change. They will just start producing parallel installable debuginfo packages once all changes are in place.
- Trademark approval: N/A (not needed for this Change)
Upgrade/compatibility impact
This feature doesn't need a mass rebuild. But before being able to install parallel installable debuginfo packages old debuginfo packages need to be upgraded to a new parallel installable version. The dnf debuginfo-install plugin should help with that. It would be good to have a way to detect old vs new debuginfo packages so dnf can provide clear warnings/feedback. An open question is whether or not dnf upgrade should handle debuginfo packages at all and/or it should upgrade/remove old versions.
How To Test
Unfinished. Once the last dnf debuginfo-install changes are done it should be possible to do the following things easily with a few simple steps:
- point stap at a 32bit process on a 64bit architecture and use DWARF for glibc to do low level probes.
- run perf against the processes running in a container with the debuginfo installed on the host.
- pull in all debuginfo needed for inspecting a core dump with gdb
User Experience
dnf debuginfo-install will be able to install multiple versions of a debuginfo package. Making it possible to more easily trace, profile and debug various programs. But not other user visible changes should be observed.
Dependencies
N/A (not a System Wide Change)
Contingency Plan
- Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
- Contingency deadline: N/A (not a System Wide Change)
- Blocks release? N/A (not a System Wide Change), Yes/No
- Blocks product? product
Documentation
Not yet written.