Revision as of 12:44, 7 July 2011

Depcheck process

The Goal

The goal is not to catch _all_ problems, but to make sure, that _average user_ does not suffer from dependency problems.

This is quite important statement to consider. The thing is, that we are unable to check absolutely everything - i.e. whether update XYZ does not potentialy break some installation.

That would require us to test against every single possible combination of packages (currently around 12k).
There always will be some conflicts - try "yum install '*'" - it's not even possible to install all the packages at once (conflicts, obsoletes, ...).

What we do want to check is

Whether there is _some_ way to install the package
Whether it can be installed on _average system_

These two are quite interesting points to discuss.

The first one describes the fact, that if a package should be pushed from 'pending' to 'updates', than all the dependencies must be present. We don't care too much, whether a half of already installed packages needs to be removed in order to install the package, because we can't evaluate the idea behind the update (take initv x systemd for example, if we would check for conflicts/obsoletes with the 'core' stuff, depcheck would deny systemd, because it removes initv). But we need to make sure, that there is a way to install the package. Whether the system is bootable, or if the 'removals' caused by the update/installation are reasonabele is not a subject of depchecking.

The second point means, that we do not want to check all possible combinations, and look for the one particular situation which does not allow the package to install. As in the previous point, we need to understand, that the very nature of packaging/repositories is, that not all packages can be installed at once (because if this would not apply, then no depchecking would be required). From the first point, or we can call it 'first stage', we know that the package can be installed. Now we want to check, whether two packages 'commonly installed next to each other' do not suddenly collide. To achieve this, we'll have a 'standard installation', and we'll take it as a base for the upcoming 'installations'.

This is very different from the way Depcheck functions right now. At the moment, Depcheck assumes that 'all packages are installed' (correct me if i'm wrong), and then installs the whole 'pending' at once. Not caring about 'conflicts', 'obsoletes', etc. The original idea was: "It does not matter, that we install all at once, yum will all sort it out for us". We found out, that it's not as simple :)

Of course, we might want to do some specific tests. Like when we detect 'obsoletes' in the dependencies, we can add the obsoleted package to our 'average installation', and check that the 'upgrade' is seamless.

There is also a problem with 'Foo requires Bar == 1.3' in stable, and having a update for Bar in pending. We could list all packages which have Bar as a requirement, install them, and try to update Bar from pending. But some of the packages requiring Bar can conflict each other, and then we'd need to do more 'runs' so we can check all the packages currently in stable... And this goes back to the initial "we can't check it all". For this particular problem, we could do some 'best effort' and check the requirements manualy - i.e. make a list of packages which can function with the new Bar, and those which can not by traversing their dependency information and comparing it to the new Bar version. But we still won't be able to see the 'hidden dependencies' and other stuff. But _this is OK_.

There are plenty of other potential problems, but we need to step aside from those (at least for the beginning), and set a reasonable goal for depcheck. Then if we have quite a lot of false positives tied to some particular situation (like the exact version requirement), we can make filters to catch the potentialy problematic packages, and do some additional testing on them. But most of it always boils down to the 'what is the beginning state of our system'.

I (jskladan) belive that it would be for the best to develop the 'Simple Depcheck' (as kparal and I call the alternative implementation) incrementaly. Make a simple base, which just checks whether a package can be 'somehow' installed, not caring about how reasonable the removals/requirements are. Do not check intensions, check functionality.

Also, we'd like to cooperate with yum developers, and convince them into creating a stable, documented API suitable for dependecy checking. I.e. something depcheck uses the undocumented yum magic for. We'd like to have a method to set up virtual package sack, give it a list of packages to 'install', make 'snapshots' of the packagesacks to be able to unroll after 'installing' broken package, and of course a structured output which could be used to detect the root of problem with broken package.

What do we want from depcheck?

Subset of packages, which can be installed.
Set of 'uninstallable' packages, ideally with logs ;-)
We want to be able to do all the checks 'virtually' just using pkgdb & yum api (i.e. without performing an actual package installation)

Simple process

Data

All_packages = dictionary of all packages in pending. Contains information about 'installability' (i.e. whether the package is at least in one installable subset)
Runs = List of all packages 'installable' in that run

Data structures than can be imagined like this:

All_packages = {package_name: {OK: None, was_first = False}, ...}
Where OK is boolean indicating, whether the package
None - package was not yet a part of any run

True - was installed in some installable subset

False - package had problems with installation
Also for the OK, the update-path is None -> False -> True (i.e. None can change to both True or False, False can change to True, True can not be changed)
was_first is a helper, to track whether the package was ever "at the start" of the installable subset creation. This is important for deciding when to end the depcheck process.
Runs = [[package_name, ...], ...]
Stores the installable subset of the respective run

Algorithm

Filter packages from all_packages (let's call the output from this filter undecided)
- Take packages which have OK set to None or False
- Remove packages which have the was_first mutex set to True
If undecided is empty:
- End Depcheck
Pick random package from undecided
Set the was_first mutex to True
For each package in undecided (starting with the randomly selected package):
- Try to install the package
- If installation passed:
  update the all_packages[package]['OK'] to True
- If installation failed:
  update the all_packages[package]['OK'] to False (if previous state was True, than it is not updated)

Example

Let's have 'pending' packages A..F with these dependencies:

A requires B
A requires C
A conflicts with D
D requires C
E has no dependencies at all
F requires XYZ which is not present in any repo

@@ Line 1: / Line 1: @@
 = Depcheck process =
+== The Goal ==
+The goal is not to catch _all_ problems, but to make sure, that _average user_ does not suffer from dependency problems.
+This is quite important statement to consider. The thing is, that we are unable to check absolutely everything - i.e. whether update XYZ does not potentialy break some installation.
+# That would require us to test against every single possible combination of packages (currently around 12k).
+# There always will be some conflicts - try "yum install '*'" - it's not even possible to install all the packages at once (conflicts, obsoletes, ...).
+What we do want to check is
+# Whether there is _some_ way to install the package
+# Whether it can be installed on _average system_
+These two are quite interesting points to discuss.
+The first one describes the fact, that if a package should be pushed from 'pending' to 'updates', than all the dependencies must be present. We don't care too much, whether a half of already installed packages needs to be removed in order to install the package, because we can't evaluate the idea behind the update (take initv x systemd for example, if we would check for conflicts/obsoletes with the 'core' stuff, depcheck would deny systemd, because it removes initv). But we need to make sure, that there is a way to install the package. Whether the system is bootable, or if the 'removals' caused by the update/installation are reasonabele is not a subject of depchecking.
+The second point means, that we do not want to check all possible combinations, and look for the one particular situation which does not allow the package to install. As in the previous point, we need to understand, that the very nature of packaging/repositories is, that not all packages can be installed at once (because if this would not apply, then no depchecking would be required). From the first point, or we can call it 'first stage', we know that the package can be installed. Now we want to check, whether two packages 'commonly installed next to each other' do not suddenly collide. To achieve this, we'll have a 'standard installation', and we'll take it as a base for the upcoming 'installations'.
+This is very different from the way Depcheck functions right now. At the moment, Depcheck assumes that 'all packages are installed' (correct me if i'm wrong), and then installs the whole 'pending' at once. Not caring about 'conflicts', 'obsoletes', etc. The original idea was: "It does not matter, that we install all at once, yum will all sort it out for us". We found out, that it's not as simple :)
+Of course, we might want to do some specific tests. Like when we detect 'obsoletes' in the dependencies, we can add the obsoleted package to our 'average installation', and check that the 'upgrade' is seamless.
+There is also a problem with 'Foo requires Bar == 1.3' in stable, and having a update for Bar in pending. We could list all packages which have Bar as a requirement, install them, and try to update Bar from pending. But some of the packages requiring Bar can conflict each other, and then we'd need to do more 'runs' so we can check all the packages currently in stable... And this goes back to the initial "we can't check it all". For this particular problem, we could do some 'best effort' and check the requirements manualy - i.e. make a list of packages which can function with the new Bar, and those which can not by traversing their dependency information and comparing it to the new Bar version. But we still won't be able to see the 'hidden dependencies' and other stuff. But _this is OK_.
+There are plenty of other potential problems, but we need to step aside from those (at least for the beginning), and set a reasonable goal for depcheck. Then if we have quite a lot of false positives tied to some particular situation (like the exact version requirement), we can make filters to catch the potentialy problematic packages, and do some additional testing on them. But most of it always boils down to the 'what is the beginning state of our system'.
+<hr>
+I (jskladan) belive that it would be for the best to develop the 'Simple Depcheck' (as kparal and I call the alternative implementation) incrementaly. Make a simple base, which just checks whether a package can be 'somehow' installed, not caring about how reasonable the removals/requirements are. Do not check intensions, check functionality.
+Also, we'd like to cooperate with yum developers, and convince them into creating a stable, documented API suitable for dependecy checking. I.e. something depcheck uses the undocumented yum magic for. We'd like to have a method to set up virtual package sack, give it a list of packages to 'install', make 'snapshots' of the packagesacks to be able to unroll after 'installing' broken package, and of course a structured output which could be used to detect the root of problem with broken package.
 == What do we want from depcheck? ==
-# Subsets of packages, which can be installed.
+# Subset of packages, which can be installed.
-# Sets of 'uninstallable' packages, ideally with logs ;-)
+# Set of 'uninstallable' packages, ideally with logs ;-)
-# Ideally, we want to be able to do all the checks 'virtually' just using pkgdb & yum api (i.e. without performing an actual package installation)
+# We want to be able to do all the checks 'virtually' just using pkgdb & yum api (i.e. without performing an actual package installation)
 == Simple process ==
 '''Data'''
-# All_packages = dictionary of all packages in pending. Contains information about 'installability' (i.e. whether the package is at least in one installable subset)
+* All_packages = dictionary of all packages in pending. Contains information about 'installability' (i.e. whether the package is at least in one installable subset)
-# Runs = List of all packages 'installable' in that run
+* Runs = List of all packages 'installable' in that run
 Data structures than can be imagined like this:

Search

User:Jskladan/Sandbox:Depcheck moved: Difference between revisions