From Fedora Project Wiki
(Announcing the change proposal)
Line 44: Line 44:


== Current status ==
== Current status ==
[[Category:ChangeReadyForWrangler]]
[[Category:ChangeAnnounced]]
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->

Revision as of 19:22, 23 June 2020


Automatic RPM dependencies on Python Extras

Summary

The Python RPM dependency generator (that generates python3.Xdist(foo) requirements) will be adapted to also generate requirements on Python extras (e.g. python3.Xdist(foo[bar])) whenever upstream metadata indicate such dependency. An easy opt out mechanism will exist. A supported way of adding metapackages that provide such Python extras (e.g. python3.Xdist(foo[bar])) will be introduced. Change owners will add the missing metapackages that would otherwise cause broken dependencies (in non-modular packages).

Owner

Current status

  • Targeted release: Fedora 33
  • Last updated: 2020-06-23
  • FESCo issue: <will be assigned by the Wrangler>
  • Tracker bug: <will be assigned by the Wrangler>
  • Release notes tracker: <will be assigned by the Wrangler>

Detailed Description

The problem

Python extras are a way for a Python package (called "distribution" or "distribution package" upstream) to declare that extra dependencies are required for additional functionality.

For example Python package requests has several standard dependencies (e.g. urllib3). But it also declares an extra named requests[security] which lists additional dependencies (e.g. pyOpenSSL) if you want to use this additional functionality. The Python package code handles the missing optional dependency gracefully -- e.g. it won't crash but might instruct the user to install requests[security] if needed by a warning or an actionable error message.

Python packages included in Fedora as RPMs automatically create a special Provides in the format python3.Xdist(foo) (and python3dist(foo)) where foo is the upstream Python package distribution name and X is the Python minor version. That way you can require any Python package without knowing under which name it was packaged in Fedora. And these tags are also automatically used by the Python dependency generator, which reads upstream Python metadata and creates dependencies on these Provides.

However, Python extras are not yet handled by the Provides tags which leads to imperfections and problems in declared dependencies.

Status quo

Currently in Fedora (before this change), no package provides python3.Xdist(foo[bar]) for the foo[bar] Python extra. As a direct result of this, no package can require it. The automatic RPM Python dist dependency generator only generates an incomplete requirement on the base package (python3.Xdist(foo)) in such cases.

The transitive extra dependencies were often needed to be hardcoded manually. I.e. when foo requires bar[baz], package bar does not require the additional dependencies for the bar[baz] extra. Thus foo needs to hardcode those dependencies manually. For example: [1]. This leads to possibly missing, broken and/or outdated superfluous dependencies.

Extras metapackages

In this change proposal, we propose to solve the problem using metapackages. The following metapackage represents the setuptools_scm[toml] extra for the python3-setuptools_scm RPM package (python-setuptools_scm source package):

%package -n python3-setuptools_scm+toml
Summary: Metapackage for python3-setuptools_scm: toml extra
Requires: python3-setuptools_scm = %{?epoch:%{epoch}:}%{version}-%{release}

%description -n python3-setuptools_scm+toml
This is a metapackage bringing in toml extra requires for python3-setuptools_scm.
It contains no code, just makes sure the dependencies are installed.

%files -n python3-setuptools_scm+toml
%ghost %{python3_sitelib}/*.egg-info

Notice several things:

  • The package has a hard dependency on python3-setuptools_scm = %{?epoch:%{epoch}:}%{version}-%{release}. While this could be in theory generated by the dependency generator, the change owners have decided not to do that to allow certain leeway for experimentation. However, the dependency will created by the macro helper below. Technically, %{?_isa} should also be used for arched packages, but in practice we believe it can be omitted.
  • The package contains no files except the %ghost metadata. This is needed for the dependency generator to have access to the upstream metadata of this package.

The updated RPM Python dist dependency generator parses the extras name from the subpackage name by splitting it on the + sign. This naming scheme is not new, it is copied from Rust packaging. Five Python packages in Fedora already use the same scheme for similar metapackages representing Python extras. And normalized Python distribution package names (or extras names) don't naturally contain the + sign. (Neither do existing Fedora packages prefixed with python3-, except the 5 components already mentioned.)

The metapackage can have additional features if desired. For example:

  • It can obsolete/provide other names (e.g. obsoleted extras packages)
  • It can have manual strong or weak dependencies on other (possibly non-Python) packages
  • It can contain files excluded from the "base" package (if such files only make sense with the extra and the base package does not fail without them)

The "base" package (in this case python3-setuptools_scm) can optionally Require/Recommend/Suggest a Python extras metapackage if the packager deems it useful.

The change for the RPM Python dist dependency generator is prepared in:

Macro helper

For the most common case, the change owners have prepared a macro helper in https://src.fedoraproject.org/rpms/python-rpm-macros/pull-request/59

To generate the example above, it should be used like this:

%{?python_extras_subpkg:%python_extras_subpkg -n python3-setuptools_scm -i %{python3_sitelib}/*.egg-info toml}
  • The %{?python_extras_subpkg:...} way of using this macro ensures the spec file remains valid for older Fedora/EL releases, where this code will do nothing.
  • The -n option specifies the name of the "base" package.
  • The -i option specifies the %files %ghost path (glob) to the the metadata directory (the .dist-info or .egg-info directory)
  • The one or more positional arguments specify the extra(s) name(s) — multiple metapackages are generated when multiple names are provided.

Other possible arguments:

  • The -f option (conflicts with -i and -F) can specify the relative path to the filelist for this metapackage (which should contain the %files %ghost path (glob) to the the metadata directory). This API is prepared for integration with pyproject-rpm-macros.
  • The -F flag (conflicts with -i and -f) can be used to skip the %files section entirely (if the packager wants to construct it manually).

Note that this macro generates all the subpackage definition sections (%package including the Summary and Requires on the base package, %description and %files), and hence it cannot be extended with custom Provides/Obsoletes/Requires/etc. This macro is designed to fit the most common uses. It doesn't currently cover all use cases. Packagers can, however, construct the subpackage manually if they need custom features not covered by %python_extras_subpkg. In the future, the API of the macro can be extended if there is demand.

See the linked pull request for example outputs.

Due to technical limitations, the macro helper never generates requirement on the arched BASE_PACKAGE%{?_isa} = %{?epoch:%{epoch}:}%{version}-%{release}. It only adds Requires: BASE_PACKAGE = %{?epoch:%{epoch}:}%{version}-%{release}) because a macro cannot reliably detect if the subpackage is arched or not. The change owners believe the resolver will do the right thing by default. If there are problems with this approach, an additional flag (such as -a) can be introduced to indicate an arched base package.

Why is there no automatic extras discovery?

RPM is not capable of creating dynamic subpackages based on the content in %{buildroot} or on the unpacked sources (%{_builddir}) yet.

Hence, we require the packager to manually list which Python extras (if any) should be packaged as metapackages. Not all extras are useful for us anyway, as there are often extras representing the build/dev/doc/test dependencies of the project.

In the future (once/if RPM supports this), the generators can be extended with auto-discovery of Python extras (with filtering).

Automatic provides generator

To continue with our example, the python3-setuptools_scm+toml subpackage will Provide python3.Xdist(setuptools_scm[toml]) (and also python3dist(setuptools_scm[toml])).

An attempt to package a nonexsiting extra (e.g. python3-setuptools_scm+nopenopenope) will result in build failure with an human-readable error message.

Automatic requires generator

If a Python package requires setuptools_scm[toml], the Fedora RPM package will require python3.Xdist(setuptools_scm[toml]) and also python3.Xdist(setuptools_scm). In theory, the second requirement is redundant, but in practice, it makes it easier (and less error prone) to query package dependencies in Fedora (e.g. using dnf repoquery).

The packaged extras will also Require additional dependencies listed in their Python metadata, in the case of python3-setuptools_scm+toml, it will require python3.Xdist(toml) (because on the Python level, setuptools_scm[toml] requires toml).

Packagers can opt out from automatically generated dependencies on Python extras by defining the %_python_no_extras_requires macro to any value (usually 1) in the spec file. This should be only a a temporary measure until the missing extra is packaged. If the upstream dependency information is not accurate, please work with upstream to fix it.

Coordinated effort to avoid breakage

The change owners have collected data about non-modular packages in Copr. Note that ~270 packages failed to build for unrelated reasons and hence we miss data for them. However, ~3300 packages built successfully.

The following extras metapackages will be added to avoid broken dependencies:

autobahn[twisted]
cachecontrol[filecache]
cairocffi[xcb]
cli-helpers[styles]
docker[ssh]
fonttools[ufo]
fonttools[unicode]
ipython[notebook]
lunr[languages]
oauthlib[signedtoken]
pyjwt[crypto]
raven[flask]
requests[security]
requests[socks]
tabulate[widechars]
twisted[tls]
vistir[spinner]

The following components will be modified:

python-autobahn
python-CacheControl
python-cairocffi
python-cli-helpers
python-docker
fonttools
ipython
python-lunr
python-oauthlib
python-jwt
python-raven
python-requests
python-tabulate
python-twisted
python-vistir

When we added the metapackages for these extras in our testing Copr, no new broken requires on Python extras were generated. In other words, these new extras subpackages don't require adding any more extras subpackages. No extras are required by the remaining Python 2 packages in Fedora.

Once the change in the dependency generator is deployed in rawhide, the change owners will monitor all newly added requires on missing extras and will add new metapackages as needed.

5 source packages in Fedora already have Python extras meta-subpackages with the proposed naming pattern, but they don't have any listed %files. They will be non-intrusively adapted via pull requests — by adding the %ghost file entry to the metapackage(s). Maintainers can then decide whether to opt for simpler rawhide only specfile with %python_extras_subpkg or to maintain the current compatibility. This concerns the following 18 subpackages:

python3-dask+{array,bag,dataframe,delayed}
python3-django-storages+{azure,boto,boto3,dropbox,libcloud,sftp}
python3-dns-lexicon+{easyname,gratisdns,henet,hetzner,plesk,route53}
python3-drf-yasg+validation
python3-prometheus_client+twisted

Modular packages

The change owners are only cable of monitoring and adapting non-modular packages. Due to long standing issues, we are unable to inspect, query (or do a targeted rebuild of) modular content:

If there are people available to help with this problem, the change owners will gladly accept their help, we are not excluding modular content because we would like to do it, but because we don't know how to work with it at scale.

How to add Python extras subpackage to my package?

In this section, we'll describe a step-by-step guide of adding the Python extras subpackage to your package. Imagine you maintain python-requests and a maintainer of a dependent package contacts you: "I would like you to add a subpackage for requests[security], because my package requires it."

  1. Locate the %files section for python3-requests package in python-requests.spec.
  2. Find the entry for .egg-info or .dist-info metadata directory. If the entry is generalized with globs like %{python3_sitelib}/*, please make the %files section more explicit while at it. Copy the line with the metadata directory. In this guide we assume it is %{python3_sitelib}/*.egg-info.
  3. Locate the %description of the python3-requests package.
  4. After the description, add: %{?python_extras_subpkg:%python_extras_subpkg -n python3-requests -i %{python3_sitelib}/*.egg-info security} on a separate line.
  5. Build the package (e.g. in local mock).
  6. Verify the python3-requests+security package is build and provides python3dist(requests[security]).
  7. See if the new extras package doesn't have dependencies on packages missing from Fedora (extras or "basic") and proceed with adding those if needed.
  8. Ship the change in Fedora 33+. It should do nothing in Fedora 31/32 or current EPELs.

Packaging guidelines

The change owners will describe this concept in the Python packaging guidelines and will propose the following rules for the Fedora Packaging Committee to approve:

  • Packagers MAY add Python extras metapackages as needed.
  • The Python extras metapackages MUST require the base package (exact NEVR).
  • Packagers MAY add strong or weak dependencies on the extras metapackages from the base package as they see fit.
  • Packagers SHOULD NOT add Python extras metapackages with dependencies only useful for maintaining the package (usually extras called dev/test/doc/build/...).
    • Optional: Packagers MAY package tests separately into the [test] or [testing] extras subpackage.
  • If a Fedora package requires a Python extra of a different package, the extras metapackage MUST be added to that package to avoid broken dependencies.
  • Packagers MAY temporarily disable the automatic requires on extras subpackages (by defining %_python_no_extras_requires) until the missing metapackage is introduced, but they SHOULD notify the maintainer of the package they depend on about the situation.
  • If upstream drops an extra, even though it is discouraged by upstream documentation (see final paragraph), the metapackage SHOULD be Obsoleted from the base package or, if there is continuity, from another extras metapackage.
  • If the upstream Python package name contains +, it MUST be replaced with - in package names (in accordance with the upstream Python package names normalization).

Feedback

This has been briefly discussed in general terms upstream. People tend to agree that some solution is needed. The concrete proposal contained in this Fedora Change is based on the discussion, but has received no feedback yet.

More feedback will be documented here once the change proposal is announced and discussed in Fedora.

Benefit to Fedora

  • Packages will have more accurate automatic dependencies, and the hard-to-maintain and error prone manual transitive (and other) dependencies can be dropped.
  • There will be less missing and redundant dependencies.
  • Python packagers will have less manual dependencies to worry about and less problems to workaround.
  • The handling of Python extras will be standardized.
  • Overall, the Python ecosystem in Fedora will be closer to upstream.

Scope

  • Proposal owners:
    • Polish and merge the code changes for python-rpm-generators and python-rpm-macros linked above.
    • Add the 17 missing extras metapackages listed in this change to avoid broken dependencies (using pull requests or provenpackager powers if need be).
    • Adapt the 5 existing Python extras subpackages listed in this change to work with the dependency generator (using pull requests, or provenpackager powers if need be).
    • Monitor new dependencies on Python extras subpackages, add extras subpackages where needed (using pull requests, or provenpackager powers if need be).
    • Propose the updated Python packaging guidelines to FPC for approval.
    • Provide help and guidance for packagers.
    • Optional: Prepare pyproject-rpm-macros integration of this change.
  • Other developers:
    • No immediate action necessary.
    • They can opt in for more metapackages with extras.
    • They can review and merge pull requests.
    • They should follow the updated Python packaging guidelines if the changes are approved by FPC.
  • Release engineering: No releng impact anticipated. The new dependencies will be primarily generated by the mass rebuild, but if the mass rebuild is missed, the package maintainers or change owners can rebuild the packages that will gain the new automatic Requires is on Python extras.
  • Policies and guidelines: Yes, see detailed description.
  • Trademark approval: Not needed for this Change.

Upgrade/compatibility impact

No impact anticipated.

How To Test

Check that there are packages that Require python3.9dist(basename[extrasname]). You can use the following repoquery:

dnf repoquery --repo=rawhide --whatrequires 'python3.9dist(*\[*\])'

Check that there are Python extras metapackages with the correct Provides, for example by installing the packages returned by the above query, or manually via queries like:

dnf repoquery --repo=rawhide --whatprovides 'python3.9dist(requests\[security\])'

To query all existing Python extras metapackages, you can use:

dnf repoquery --repo=rawhide --provides -a | grep -E 'python(3\.9|2\.7)dist\(\S+\[\S+\]\)'

And lastly, to query all required Python extras metapackages:

dnf repoquery --repo=rawhide --requires -a | grep -E 'python(3\.9|2\.7)dist\(\S+\[\S+\]\)'

User Experience

When installing Python RPM packages, the dependencies are more likely to fulfill user expectations, as they will more closely adhere to the behavior of pip (the Python package installer).

Dependencies

Nothing.

Contingency Plan

  • Contingency mechanism: (What to do? Who will do it?)
    • Soft: The change owners will disable the requirements generator by default and rebuild (or untag if FTBFS) packages with broken dependencies caused by the change.
    • Hard: The change owners will revert everything and rebuild (or untag if FTBFS) packages with new requirements/provides caused by the change.
  • Contingency deadline: Beta freeze
  • Blocks release? No
  • Blocks product? No

Documentation

The packaging guidelines will be the documentation if approved. If not, this Fedora Change shall serve as the documentation.

Release Notes