From Fedora Project Wiki
Line 41: Line 41:
to provide a more cohesive developer experience for the Fedora/RHEL/CentOS
to provide a more cohesive developer experience for the Fedora/RHEL/CentOS
ecosystem.
ecosystem.
== Audiences to be considered ==
There are two main audiences to be considered when looking at user level package management in a software development context. The first group is software developers (the same target audience as the Workstation WG). For these users, the priority is to have access to the standard development environment management tools for their ecosystem. For example, Maven for Java developers, pip for Python developers, gems for Ruby developers, npm for Node.js developers. Many of these users will be comfortable with consuming software directly from the upstream community, while others would prefer to be able to delegate the initial level of review to the wider Fedora community.
The second group can be usefully categorised as "data analysts" (even though the actual category is much broader than that, encompassing mathematicians, engineers, scientists, and more). These are folks that are programming not because they want to publish applications or services for others to use, but because they want to *automate their own work*. Compared to professional developers, these folks are far more likely to be making use of components written in a variety of different languages, including Python, Julia, R, FORTRAN, MATLAB/Octave, Scala, etc.


== Directions to be Explored ==
== Directions to be Explored ==

Revision as of 04:41, 28 November 2014

User Level Package Management

Problem Description

Package management in Fedora is currently focused on system level packaging. This is a very "operations" oriented view of the world, as it emphasises the ability to reproduce systems exactly, while still being able to delegate the task of monitoring for and responding to CVE's to the platform provider (rather than having to track dependencies for security updates directly).

However, for many application developers and data analysts, being able to bring in new dependencies quickly and easily to solve problems is an essential requirement. It's also essential to be able to install a dependency into a local testing or analysis environment *without* affecting the operation of the system itself in any way.

Unfortunately, the relative lack of support for this model at the platform level in Fedora and other Linux distributions has resulted in these communities routing around platform level package management systems, placing significant barriers in the way of effective collaboration between developers, analysts and system administrators.

Solution: endorse layered architectures

The model that has emerged to effectively manage these conflicting requirements is to separate user & developer level dependency management tools from the system level dependency management tools used to create an integrated platform.

The base platform, its dependency management system, and the packages it contains then become primarily the responsibility of system administrators. Security, stability, reliability, & compatibility are the focus at this layer. In a Fedora context, this perspective is best represented by the Base, Server and Cloud WGs (together with the CentOS and EPEL communities).

The Environments & Stacks WG then primarily represents the perspective of application developers and data analysts looking to work on top of that stable foundation. This also includes taking into account that many developers and analysts are writing deliberately cross platform software, and even developers and analysts exclusively targeting Fedora/RHEL/CentOS production systems may prefer (or be required) to use other operating systems on their local systems (whether that's other Linux distributions, Mac OS X, or even Windows).

The Workstation WG takes this work in the other WGs and brings it all together to provide a more cohesive developer experience for the Fedora/RHEL/CentOS ecosystem.

Audiences to be considered

There are two main audiences to be considered when looking at user level package management in a software development context. The first group is software developers (the same target audience as the Workstation WG). For these users, the priority is to have access to the standard development environment management tools for their ecosystem. For example, Maven for Java developers, pip for Python developers, gems for Ruby developers, npm for Node.js developers. Many of these users will be comfortable with consuming software directly from the upstream community, while others would prefer to be able to delegate the initial level of review to the wider Fedora community.

The second group can be usefully categorised as "data analysts" (even though the actual category is much broader than that, encompassing mathematicians, engineers, scientists, and more). These are folks that are programming not because they want to publish applications or services for others to use, but because they want to *automate their own work*. Compared to professional developers, these folks are far more likely to be making use of components written in a variety of different languages, including Python, Julia, R, FORTRAN, MATLAB/Octave, Scala, etc.

Directions to be Explored

Two primary areas of exploration have been identified for user level package management:

  • Embracing the existing practice of using language ecosystem specific tooling to deploy applications to Linux systems
  • Potentially recommending particular language independent tooling for more complex dependency management scenarios

Embracing ecosystem specific tooling (near term)

Fully embracing language specific tooling is likely to be the best way to reach the professional developer audience. ~bkabrda has started setting up a [devpi] instance as a proof of concept for running a filtered mirror of an upstream package repository that only includes packages that have been reviewed and determined to at least meet Fedora's licensing guidelines and to not be obviously malicious.

However, given the wide range of ecosystem specific packaging tools out there, it seems unlikely that this approach will scale in a sustainable way. To provide a more consistent management interface, it may be better to instead adopt a plugin-based solution like [Pulp]. Adding a new language to the "endorsed" list would then be a matter of writing a suitable Pulp plugin and integrating it with the build system, rather than setting up yet another a completely new repo management service within Fedora's infrastructure.

Many language ecosystems don't support the notion of redistribution at all. In these cases, republishing under the same name from a Fedora specific repo would be problematic. The likely near term approach to these communities will be to just place them in the "too hard" basket, and focus on ecosystems where the tools are in place to handle curated redistribution without conflict.

For more details, refer to Env_and_Stacks/Projects/LanguageSpecificRepositories.

Recommending language independent tooling (longer term)

Even if we manage to integrate language specific tooling into the review and build toolchains, it's unlikely it will be feasible to integrate every such toolset into the system management utilities. The language specific tools also don't do a good job of managing arbitrary external dependencies with associated ABI compatibility requirements.

As such, it's worth considering the possibility of recommending a particular user level dependency management toolchain, that allows software to be installed on a per-user basis, rather than as part of the integrated OS platform.

Like repackaging as RPMs for system level dependencies, the use of language independent tooling also makes it possible to handle ecosystems that don't include native support for redistribution by a system integrator.

There are two main possibilities that come to mind on that front:

Nix has many attractive properties (including support for custom build environments and per-user package installations without root access), but has the limitation of being POSIX specific. This makes it significantly less interesting to upstream communities that also encompass Windows based developers and data analysts.

By contrast, conda was created by Continuum Analytics specifically to tackle the problem of allowing scientists and data analysts to easily install the full Python analytical stack, including external C, C++ and FORTRAN dependencies.

While there's no current exploration happening in this area, it's still a possibility well worth considering for possible future investigation.

Other Considerations

Software Collections

Software collections represent a hybrid model for layered architectures, where platform components are selectively upgraded within the context of the existing system level package management system.

This is a useful model, as it automatically integrates with all existing system level auditing tools. However, it is aimed primarily at folks that are already using system level packaging tools, and would like to selectively upgrade particular components. It is less interesting in the context of being able to provide consistent cross-platform dependency management instructions.

From the perspective of user level package management, a software collection becomes just another environment to target, just like targeting the default system environments directly.

Linux Containers and Docker

In a Docker context, the work of the Environments & Stacks WG largely applies to the way container images are built. From a development perspective, containers don't actually change all that much relative to any other system for dependency management - it simply shifts the complexity from deployment time to image build time.

Where Docker helps dramatically is in managing the division of responsibility between folks with an operations focus, who can specialise in provide base images and the infrastructure to run containers, and those with a development focus, who can focus on the creation and deployment of full application containers.