From Fedora Project Wiki
(First version)
 
(Submitting Change to FESCo)
 
(7 intermediate revisions by 2 users not shown)
Line 15: Line 15:
For change proposals to qualify as self-contained, owners of all affected packages need to be included here. Alternatively, a SIG can be listed as an owner if it owns all affected packages.  
For change proposals to qualify as self-contained, owners of all affected packages need to be included here. Alternatively, a SIG can be listed as an owner if it owns all affected packages.  
This should link to your home wiki page so we know who you are.  
This should link to your home wiki page so we know who you are.  
-->
* Name: [[User:FASAccountName| Your Name]]
<!-- Include you email address that you can be reached should people want to contact you about helping with your change, status is requested, or technical issues need to be resolved. If the change proposal is owned by a SIG, please also add a primary contact person. -->
* Email: <your email address so we can contact you, invite you to meetings, etc. Please provide your Bugzilla email address if it is different from your email in FAS>
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo)
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address>
-->
-->


Line 26: Line 20:
* Name: [[User:Salimma| Michel Lind]]
* Name: [[User:Salimma| Michel Lind]]
* Name: José Relvas
* Name: José Relvas
* Emails: zbyszek@in.waw.pl, salimma@fedoraproject.org  
* Emails: zbyszek@in.waw.pl, salimma@fedoraproject.org


== Current status ==
== Current status ==
[[Category:ChangePageIncomplete]]
[[Category:ChangeReadyForFesco]]
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
Line 47: Line 41:
ON_QA -> change is fully code complete
ON_QA -> change is fully code complete
-->
-->
* [<link to devel-announce post will be added by Wrangler> Announced]
* [https://lists.fedoraproject.org/archives/list/devel-announce@lists.fedoraproject.org/thread/H3FZE62EMABEDHTUCWWFCKIVSZTAVOY6/ Announced]
* [<will be assigned by the Wrangler> Discussion thread]
* [https://discussion.fedoraproject.org/t/f42-change-proposal-optimized-binaries-for-the-amd64-x86-64-architecture-v2-self-contained/142032 Discussion thread]
* FESCo issue: <will be assigned by the Wrangler>
* FESCo issue: [https://pagure.io/fesco/issue/3342 #3342]
* Tracker bug: <will be assigned by the Wrangler>
* Tracker bug: <will be assigned by the Wrangler>
* Release notes tracker: <will be assigned by the Wrangler>
* Release notes tracker: <will be assigned by the Wrangler>
Line 57: Line 51:
This is an updated version of [[Changes/Optimized_Binaries_for_the_AMD64_Architecture]].
This is an updated version of [[Changes/Optimized_Binaries_for_the_AMD64_Architecture]].


Fedora binaries for the AMD64 / x86_64 architecture are compiled with code-generation flags that support almost all CPU variants. But newer generations of processors gained additional instructions that may be used to generate faster code. A vendor-independent x86-64 psABI supplement defines four "microachitecture levels": `x86-64-v1` (the baseline, our code targets this), `x86-64-v2` (+`SSE3`, CentoOS targets this), `x86-64-v3` (+`AVX`), `x86-64-v4` (+`AVX512`) [1]. When code is compiled for a higher microarchitecture level it will crash (with `SIGILL`, "illegal instruction") on CPUs which do not support it. Benchmark results show small differences in performance: usually in the range from -5% to 10%, with no discernible difference for most code, but '''some''' applications benefit, with gains of 120% in some benchmarks [e.g. 2, 4].
Fedora binaries for the AMD64 / x86_64 architecture are compiled with code-generation flags that support almost all CPU variants. But newer generations of processors gained additional instructions that may be used to generate faster code. A vendor-independent x86-64 psABI supplement defines four "microachitecture levels" [6]: `x86-64-v1` (the baseline, our code targets this), `x86-64-v2` (+`SSE3`, CentoOS targets this), `x86-64-v3` (+`AVX`), `x86-64-v4` (+`AVX512`) [1]. When code is compiled for a higher microarchitecture level it will crash (with `SIGILL`, "illegal instruction") on CPUs which do not support it. Benchmark results show small differences in performance: usually in the range from -5% to 10%, with no discernible difference for most code, but '''some''' applications benefit, with gains of 120% in some benchmarks [e.g. 2, 4].


Over the years, various people have expressed interest in raising the required microarchitecture levels. But we have been very conservative in making changes, because support is missing in many older CPUs that are still in use, and in fact, even in some CPUs produced and sold today. By raising the required level we would make Fedora completely unusable on many machines. It also seems that recompiling ''all'' packages with the changed options would largely be a waste of resources, because for most code it makes no difference. But for some of the numerical or cryptographic code there are noticeable gains and it seems to be worth the effort to provide optimized code. This also makes Fedora more attractive to people interested in optimization.
Over the years, various people have expressed interest in raising the required microarchitecture levels. But we have been very conservative in making changes, because support is missing in many older CPUs that are still in use, and in fact, even in some CPUs produced and sold today. By raising the required level we would make Fedora completely unusable on many machines. It also seems that recompiling ''all'' packages with the changed options would largely be a waste of resources, because for most code it makes no difference. But for some of the numerical or cryptographic code there are noticeable gains and it seems to be worth the effort to provide optimized code. This also makes Fedora more attractive to people interested in optimization.
Line 63: Line 57:
The dynamic linker already has the `glibc-hwcaps` mechanism to load optimized implementations of ''shared objects'' [3]. This means that packages can provide optimized libraries and they linker will be automatically load them from separate directories if appropriate.  (For AMD64, this is `/usr/lib64/glibc-hwcaps/x86-64-v{2,3,4}/`.)
The dynamic linker already has the `glibc-hwcaps` mechanism to load optimized implementations of ''shared objects'' [3]. This means that packages can provide optimized libraries and they linker will be automatically load them from separate directories if appropriate.  (For AMD64, this is `/usr/lib64/glibc-hwcaps/x86-64-v{2,3,4}/`.)


This Change is about extending the glibc-hwcaps mechanism to ''executables''. A small helper binary is provided. A program in `/usr/bin` (or another path) is symlinked to this binary. When executed, the program checks the capabilities of the CPU and searches for the most appropriate variant of the target program in a set of directories. If then launches one of the optimized binaries or the "generic" one compiled for the required baseline.
'''This Change is about extending the glibc-hwcaps mechanism to ''executables''. A small helper binary is provided. A program in `/usr/bin` (or another path) is symlinked to this helper. When executed, the helper checks the capabilities of the CPU and searches for the most appropriate variant of the target program in a separate directory hierarchy. If then launches one of the optimized binaries or the "generic" one compiled for the baseline.'''


This means that individual packages "opt in", by moving their binary to the alternative directory hierarchy and replacing it by a symlink, and also providing one or more optimized variants.
This means that individual packages "opt in", by moving their binary to the alternative directory hierarchy and replacing it by a symlink, and also providing one or more optimized variants.
Line 74: Line 68:
[4] https://blog.centos.org/2023/08/centos-isa-sig-performance-investigation/<BR>
[4] https://blog.centos.org/2023/08/centos-isa-sig-performance-investigation/<BR>
[5] https://jasoncc.github.io/gnu_gcc_glibc/gnu-ifunc.html<BR>
[5] https://jasoncc.github.io/gnu_gcc_glibc/gnu-ifunc.html<BR>
[6] https://gitlab.com/x86-psABIs/x86-64-ABI


Glibc-hwcaps together with the new feature in systemd provide a generic mechanism. It will be up to individual packages to actually provide code which makes use of it. Individual package maintainers are encouraged to benchmark their packages after recompilation, and provide the optimized variants if useful. (I.e. the code in question is measurably faster '''and''' the program is run often enough for this to make a difference.)
Glibc-hwcaps together with the new helper provide a generic mechanism. It will be up to individual packages to actually provide code which makes use of it. Individual package maintainers are encouraged to benchmark their packages after recompilation, and provide the optimized variants if useful. (I.e. the code in question is measurably faster '''and''' the program is run often enough for this to make a difference.)


The Change Owners will implement the packaging changes for a few packages while developing the general mechanism and will submit those as pull requests. Other maintainers are asked to do the same for their packages if desired.
The Change Owners will implement the packaging changes for a few packages while developing the general mechanism and will submit those as pull requests. Other maintainers are asked to do the same for their packages if desired.
Line 124: Line 119:
<!-- This is an optional step for system-wide changes to avail of. If you would like to build an initial proof of concept of your change and have a member of Fedora QA help you write and/or run some initial basic tests on your code, please email tests@fedoraproject.org and include the link to your change proposal. This step is *optional*. -->
<!-- This is an optional step for system-wide changes to avail of. If you would like to build an initial proof of concept of your change and have a member of Fedora QA help you write and/or run some initial basic tests on your code, please email tests@fedoraproject.org and include the link to your change proposal. This step is *optional*. -->


Do you require 'QA Blueprint' support? Y/N <!-- Optional Step for System-Wide Changes only -->
Do you require 'QA Blueprint' support? N


== How To Test ==
== How To Test ==
Line 161: Line 156:
* Contingency mechanism: Revert changes in individual packages. This can be either by the maintainers of those packages or by the Change Owners using provenpackager privileges.
* Contingency mechanism: Revert changes in individual packages. This can be either by the maintainers of those packages or by the Change Owners using provenpackager privileges.
* Contingency deadline: any time really. The changes are independent between packages, so we can trivially convert and uncovert individual programs even after release.
* Contingency deadline: any time really. The changes are independent between packages, so we can trivially convert and uncovert individual programs even after release.
<!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? -->
<!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? -->
* Blocks release? N/A (not a System Wide Change), Yes/No <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Blocks release? No
 


== Documentation ==
== Documentation ==

Latest revision as of 22:46, 19 January 2025


Optimized Binaries for the AMD64 / x86_64 Architecture (v2)

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

Summary

Individual packages can provide already optimized libraries via the glibc-hwcaps mechanism. This approach will be extended to executables. The package provides an optimized variant of a binary in a different directory. A symlink to small program which replaces the binary in /usr/bin. At runtime, this program will find the most appropriate variant and execute it.

Which packages provide the optimized code and at which level will be made by individual package maintainers based on benchmark results. A few programs/packages will be updated by the Change Owners to show how the mechanism works.

Owner

Current status

Detailed Description

This is an updated version of Changes/Optimized_Binaries_for_the_AMD64_Architecture.

Fedora binaries for the AMD64 / x86_64 architecture are compiled with code-generation flags that support almost all CPU variants. But newer generations of processors gained additional instructions that may be used to generate faster code. A vendor-independent x86-64 psABI supplement defines four "microachitecture levels" [6]: x86-64-v1 (the baseline, our code targets this), x86-64-v2 (+SSE3, CentoOS targets this), x86-64-v3 (+AVX), x86-64-v4 (+AVX512) [1]. When code is compiled for a higher microarchitecture level it will crash (with SIGILL, "illegal instruction") on CPUs which do not support it. Benchmark results show small differences in performance: usually in the range from -5% to 10%, with no discernible difference for most code, but some applications benefit, with gains of 120% in some benchmarks [e.g. 2, 4].

Over the years, various people have expressed interest in raising the required microarchitecture levels. But we have been very conservative in making changes, because support is missing in many older CPUs that are still in use, and in fact, even in some CPUs produced and sold today. By raising the required level we would make Fedora completely unusable on many machines. It also seems that recompiling all packages with the changed options would largely be a waste of resources, because for most code it makes no difference. But for some of the numerical or cryptographic code there are noticeable gains and it seems to be worth the effort to provide optimized code. This also makes Fedora more attractive to people interested in optimization.

The dynamic linker already has the glibc-hwcaps mechanism to load optimized implementations of shared objects [3]. This means that packages can provide optimized libraries and they linker will be automatically load them from separate directories if appropriate. (For AMD64, this is /usr/lib64/glibc-hwcaps/x86-64-v{2,3,4}/.)

This Change is about extending the glibc-hwcaps mechanism to executables. A small helper binary is provided. A program in /usr/bin (or another path) is symlinked to this helper. When executed, the helper checks the capabilities of the CPU and searches for the most appropriate variant of the target program in a separate directory hierarchy. If then launches one of the optimized binaries or the "generic" one compiled for the baseline.

This means that individual packages "opt in", by moving their binary to the alternative directory hierarchy and replacing it by a symlink, and also providing one or more optimized variants.

Note: the ELF format provides the IFUNC mechanism to dynamically select a variant of a function (symbol) when an executable is loaded [5]. This is in particular used to load code using specific CPU instructions when those are supported. This mechanism is both more general (because it allows arbitrary selection criteria), more fine-grained (because there can be other variants than just a few fixed microarchitecture levels), and more efficient (because only the parts of the code that benefit from this need to be provided in multiple variants). In particular, glibc already makes extensive use of this to provide optimized code, which is then widely used by other libraries and programs. This means that even though we compile code in a way where the lowest baseline is supported, modern CPU instructions are already widely used. This is one of the reasons why compiling for a higher baseline often doesn't make any difference in benchmarks. The IFUNC mechanism or an equivalent mechanism should generally be preferred. Nevertheless, that needs to be implemented in the program or library itself, which is not trivial. The mechanism in this Proposal is intended for the code which do not use IFUNCs or some other similar mechanism.

[1] https://hackweek.opensuse.org/all/projects/support-glibc-hwcaps-and-micro-architecture-package-generation
[2] https://gitlab.archlinux.org/archlinux/rfcs/-/blob/master/rfcs/0002-march.rst
[3] https://sourceware.org/pipermail/libc-alpha/2021-February/122207.html
[4] https://blog.centos.org/2023/08/centos-isa-sig-performance-investigation/
[5] https://jasoncc.github.io/gnu_gcc_glibc/gnu-ifunc.html
[6] https://gitlab.com/x86-psABIs/x86-64-ABI

Glibc-hwcaps together with the new helper provide a generic mechanism. It will be up to individual packages to actually provide code which makes use of it. Individual package maintainers are encouraged to benchmark their packages after recompilation, and provide the optimized variants if useful. (I.e. the code in question is measurably faster and the program is run often enough for this to make a difference.)

The Change Owners will implement the packaging changes for a few packages while developing the general mechanism and will submit those as pull requests. Other maintainers are asked to do the same for their packages if desired.

Optimized variants of programs and libraries MAY be packaged in a separate subpackage. The general packaging rules should be applied, i.e. a separate package or packages SHOULD be created if it is files are large enough.

Available benchmark results [2,4] are narrow and not very convincing. We should plan an evaluation of results after one release. If it turns out that the real gains are too small, we can scrap the effort. On the other hand, we should also consider other architectures. For example, microarchitecture levels z{14,15} for s390x or power{9,10} for ppc64le. Other architectures are not included in this Change Proposal to reduce its scope.

Feedback

Benefit to Fedora

The developers who are interested in this kind of optimization work can perform it within Fedora, without having to build separate repositories. The users who have the appropriate hardware will gain performance benefits. Faster code is also more energy-efficient. The change will be automatic and transparent to users.

Note that other distributions use higher microarchitecture levels. For example RHEL 9 uses x86-64-v2 as the baseline, RHEL 10 uses x86-64-v3, and other distros provide optimized variants (OpenSUSE, Arch Linux, Ubuntu). We implement the same change in Fedora in a way that is scoped more narrowly, and thus vastly cheaper in the sense of development effort, code compilation time, storage and distribution overhead, but should provide the same performance and energy benefits.

Scope

  • Proposal owners:
    • Package hwcaps-loader.
    • Find some example packages to convert (the code must do "number crunching" or string processing, and must not already use IFUNCs or glibc-hwcaps or some other mechanism).
    • Convert a few packages and submit the changes as pull requests.
    • Submit a draft change to Packaging Guidelines
    • Do benchmarks.
  • Other developers:
    • Consider converting some additional packages.
    • Review and merge the Packaging Guidelines change
  • Policies and guidelines: N/A (not needed for this Change)
  • Trademark approval: N/A (not needed for this Change)
  • Alignment with the Fedora Strategy:

Upgrade/compatibility impact

Early Testing (Optional)

Do you require 'QA Blueprint' support? N

How To Test

  • Install one of the converted packages
  • Run the program. If the hardware supports the optimized variant, verify that it was ran. If the hardware does not support any of the optimized variants, verify that the baseline version was executed.


User Experience

The change should be invisible to users, except that some programs may execute more quickly.


Dependencies

Contingency Plan

  • Contingency mechanism: Revert changes in individual packages. This can be either by the maintainers of those packages or by the Change Owners using provenpackager privileges.
  • Contingency deadline: any time really. The changes are independent between packages, so we can trivially convert and uncovert individual programs even after release.
  • Blocks release? No

Documentation

N/A (not a System Wide Change)

Release Notes