Bioconductor
NOTE: the information in this article is very outdated (targets Fedora 12).
Summary
Make Bioconductor available for Fedora
"Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data."
Owner
- Name: PierreYvesChibon
Current status
- Targeted release: Fedora 12
- Last updated: 21 March 2009
- Percentage of completion: 10%
Detailed Description
Bioconductor is a large R libraries repository widely used in bioinformatics for statistical analysis of genomic data.
Benefit to Fedora
Widely used packaging Bioconductor would offer a good way to promote Fedora as desktop platform for bioinformaticians.
In addition it could also be promoted in RHEL since some servers running RHEL are used to process the analysis that bioconductor offers.
Scope
Bioconductors contains around 300 packages, not all will be packaged in Fedora (at least not at first). I think for the feature the basis packages of bioconductor should be done.
Bioconductor has its own installation script in R which enable to install the basis libraries of Bioconductor.
These libraries are
Bioconductor name | Fedora package | Review request |
affy | 515081 | |
affydata | 591447 | |
affyPLM | ||
depends on KEGG.db which is non-free | ||
annotate | ||
Biobase | R-Biobase | #240500 |
Biostrings | R-Biostrings | #490721 |
DynDoc | R-DynDoc | #241079 |
gcrma | ||
genefilter | ||
geneplotter | ||
hgu95av2.db | ||
limma | ||
marray | ||
matchprobes | ||
multtest | R-multtest | 240497 |
ROC | 591737 | |
vsn | ||
xtable | 591032 | |
affyQCReport |
The problem is that those libraries can have a high amount of dependencies.
Test Plan
Install the RPMs and test them
User Experience
The users should be able to download the libraries and start to work with it without problem.
In addition it might be interesting to create a package group Bioconductor, which allows to install those packages and their dependencies all at once. i.e. a "metapackage" such as R-Bioconductor:
yum install R-Bioconductor
Dependencies
They are included in the R libraries, most of them are in Bioconductor, some can be in the CRAN repository.
The question of the metadata and experiment data packages should be taken into account. These are heavy packages which do not evolve lot between release. Two of them are incorporated already into Fedora, they brought the question of inheritance between the version.
There is a "small" graph showing the relation (only "Depends") of the Bioconductor packages
There is a list showing for each packages from the base list their dependencies and their sub-dependencies:
- List of dependencies for only the "Depends" libraries
- List of dependencies for the "Depends" and "Suggests" libraries
Contingency Plan
None -- What should that be ??
Planning
- Find interested contributors
- Get a clear view on the amount of packages that have to be done
- Package and review them
Documentation
- http://bioconductor.org
- http://r-project.org
- http://cran.r-project.org/mirrors.html
- R packaging guidelines
---
- R2spec - Small script to easily create the specfile of R libraries
- updateCVS.py - Small script to easily update R library on Fedora CVS
- bio.pl and showDep.py to retrieve the list of dependencies of the Bioconductor libraries and parse them for the libraries of interest.
Release Notes
None -- What should that be ?