Churchyard (talk | contribs) |
Churchyard (talk | contribs) |
||
Line 108: | Line 108: | ||
=== Size impact === | === Size impact === | ||
Sizes calculated in `mock` on x86_64. | |||
{| class="wikitable" | |||
|- | |||
! Situation !! Size of <code>/usr/lib(64)/python3.9</code> in MiB !! Difference in MiB | |||
|- | |||
| Status quo (before this change) || 31.84 || | |||
|- | |||
| Default (non-optimized cache only) || 22.96 || -8.89 | |||
|- | |||
| No cache || 14.72 || -17.12 | |||
|- | |||
| All optimization levels (like before) || 36.35 || +4.50 | |||
|} | |||
=== Speed impact === | === Speed impact === |
Revision as of 13:30, 7 September 2020
Python: Optional Bytecode Cache
Summary
The Python standard library bytecode cache files (e.g. /usr/lib64/python3.9/.../__pycache__/*.pyc
) will be moved from the python3-libs
package to three new optional subpackages (split by optimization level). The non-optimized bytecode cache will be recommended by python3-libs
and installed by default but removable. The bytecode cache optimization level 1 and 2 will not be recommended (and hence will not be installed by default) but will be installable. The default SELinux policy will be adapted not to audit AVC denials when the bytecode cache is created by Python on runtime. This will save 8.89 MiB disk space on default installations or 17.12 MiB on minimal installations (by opting-out from the recommended subpackage with non-optimized bytecode cache). When all three new packages are installed, the size will increase slightly over the status quo (by 4.5 MiB).
Owner
- Name: Miro Hrončok
- Name: Lumír Balhar
- Name: Tomáš Orsava
- Name: Lukáš Vrabec (selinux-policy maintainer)
- Email: python-maint@redhat.com
Current status
- Targeted release: Fedora 34
- Last updated: 2020-09-07
- FESCo issue: <will be assigned by the Wrangler>
- Tracker bug: <will be assigned by the Wrangler>
- Release notes tracker: <will be assigned by the Wrangler>
Detailed Description
What is the Python bytecode cache
When Python code is interpreted, it is compiled to Python bytecode. When a pure Python module is imported for the first time, the compiled bytecode is serialized and cached to a .pyc
file located in the __pycache__
directory next to the .py
source. Subsequent imports use the cache directly, until it is invalidated (for example when the .py
source is edited and its mtime
stamp is bumped) -- at that point, the cache is updated. This behavior is explained in detail in PEP 3147. The invalidation is described in PEP 552.
Python can operate in 3 different optimization levels: 0, 1 and 2. By default, the optimization level is 0. When invoked with the -O
command line option optimization is set to 1, similarly with -OO
it is 2. Bytecode cache for different optimization levels is saved with different filenames as described in PEP 488.
As an example, a Python module located at /path/to/basename.py
will have bytecode cache files for CPython 3.9 stored as:
/path/to/__pycache__/basename.cpython-39.pyc
for the non-optimized bytecode/path/to/__pycache__/basename.cpython-39.opt-1.pyc
for optimization level 1/path/to/__pycache__/basename.cpython-39.opt-2.pyc
for optimization level 2
Python bytecode cache in RPM packages (status quo)
Pure Python modules shipped in RPM packages (and namely the ones shipped trough the python3-libs
package) are located at paths not writable by regular user, under /usr/lib(64)/python3.9/
, hence the bytecode cache is also located in such locations. To work around this problem, the bytecode cache is pre-compiled when RPM packages are built and python3-libs
ships and owns the sources as well as the bytecode cache:
$ rpm -ql python3-libs ... /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-1.pyc /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-2.pyc /usr/lib64/python3.9/__pycache__/ast.cpython-39.pyc ... /usr/lib64/python3.9/ast.py ...
As a result, the package is quite big, essentially shipping all pure Python modules 4 times.
Depending of the module content, its bytecode cache files might be identical across optimization levels. For such cases, the files are hardlinked to reduce the bloat:
$ ls -1i /usr/lib64/python3.9/collections/__pycache__/abc.*pyc 8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.opt-1.pyc 8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.opt-2.pyc 8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.pyc
This is however not possible for all the modules from python3-libs
:
$ ls -1i /usr/lib64/python3.9/__pycache__/ast.*pyc 8438 /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-1.pyc 8440 /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-2.pyc 8441 /usr/lib64/python3.9/__pycache__/ast.cpython-39.pyc
What if the bytecode cache would not be packaged
When the bytecode cache is not packaged, several things happen:
- When non-root users run Python, the imported modules are never cached. As a result, the startup time of Python apps might be slightly larger than necessary until root runs them.
- When root runs Python, the imported modules are cached. As a result untracked
.pyc
files start to pop up in/usr/lib(64)/python3.9/
. When the system is updated to a newer Python version, the untracked files remain on the filesystem until manually cleaned up. - When root runs Python in SELinux restricted context, the imported modules are attempted to be cached but SELinux does not allow that. The result is same as (1) with a lot of noise from SELinux.
Packaging the bytecode cache into optional subpackages
SELinux policy changes
Size impact
Sizes calculated in mock
on x86_64.
Situation | Size of /usr/lib(64)/python3.9 in MiB |
Difference in MiB |
---|---|---|
Status quo (before this change) | 31.84 | |
Default (non-optimized cache only) | 22.96 | -8.89 |
No cache | 14.72 | -17.12 |
All optimization levels (like before) | 36.35 | +4.50 |
Speed impact
Rejected ideas
In this section, we briefly describe ideas that were presented by others or considered by the change owners, but rejected.
Stop shipping mandatory .py
sources, ship only .pyc
cache
Make Python not attempt to write bytecode cache into /usr/lib(64)/python3.9
Not realized ideas
In this section, we briefly describe ideas that were presented by others or considered by the change owners, but were not realized (e.g. for capacity reasons). Such ideas may be realized later.
Store bytecode cache in /var/cache
and/or ~/.cache
Apply this change to all Python RPM packages
Feedback
Benefit to Fedora
Scope
- Proposal owners:
- Other developers: N/A (not a System Wide Change)
- Release engineering: #Releng issue number (a check of an impact with Release Engineering is needed)
- Policies and guidelines: N/A (not a System Wide Change)
- Trademark approval: N/A (not needed for this Change)
- Alignment with Objectives:
Upgrade/compatibility impact
N/A (not a System Wide Change)
How To Test
N/A (not a System Wide Change)
User Experience
Dependencies
N/A (not a System Wide Change)
Contingency Plan
- Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
- Contingency deadline: N/A (not a System Wide Change)
- Blocks release? N/A (not a System Wide Change), Yes/No
- Blocks product? product
Documentation
N/A (not a System Wide Change)