Javi Roman: Twitter Linkedin Photography
Fedora Big Data Package Ecosystem
Package | Packaged Version | Upstream Version | Sources |
---|---|---|---|
Apache Hadoop | 2.4.1 | 2.7.0 | http://pkgs.fedoraproject.org/cgit/hadoop.git/ |
Apache HBase | 0.98.3 | 1.0.1 | http://pkgs.fedoraproject.org/cgit/hbase.git/ |
Apache Hive | 0.12.2 | 1.1.0 | http://pkgs.fedoraproject.org/cgit/hive.git/ |
Apache Pig | 0.13.10 | 0.14.0 | http://pkgs.fedoraproject.org/cgit/pig.git/ |
Apache Zookeeper | 3.4.6 | 3.4.6 | http://pkgs.fedoraproject.org/cgit/zookeeper.git/ |
Apache Oozie | 4.0.1 | 4.1.0 | http://pkgs.fedoraproject.org/cgit/oozie.git/ |
Apache Ambari | 1.5.1 | 2.0.0 | http://pkgs.fedoraproject.org/cgit/ambari.git/ |
Apache Accumulo | 1.6.1 | 1.6.2 | http://pkgs.fedoraproject.org/cgit/accumulo.git/ |
Apache Mesos | 0.22.1 | 0.22.1 | http://pkgs.fedoraproject.org/cgit/mesos.git/ |
Apache Solr | 4.10.4 | 5.1.0 | http://pkgs.fedoraproject.org/cgit/solr.git/ |
Apache Spark | 0.9.1 | 1.3.1 | http://pkgs.fedoraproject.org/cgit/spark.git/ |
AMPLab Tachyon | 0.99 | 0.6.4 | http://pkgs.fedoraproject.org/cgit/tachyon.git |
Package | Packaged Version | Upstream Version | Status | Sources |
---|---|---|---|---|
Apache Flume | 1.5.0 | 1.5.0 | Partially supported | https://github.com/fedora-bigdata-rpms/flume-rpm |
Cloudera Kite SDK | 1.0.0 | 1.0.0 | https://gil.fedorapeople.org/kite.spec | |
Apache Crunch | 0.11.0 | 0.11.0 | https://github.com/fedora-bigdata-rpms/crunch-rpm | |
Apache Tez | 0.5.3 | 0.6.0 | https://github.com/fedora-bigdata-rpms/tez-rpm | |
Apache Kafka | 0.8.0 | 0.8.2.1 | https://github.com/fedora-bigdata-rpms/kafka-rpm | |
Apache Tajo | 0.10.0 | 0.10.0 | https://gil.fedorapeople.org/tajo.spec | |
Apache Jena | 2.13.0 | 2.13.0 | https://gil.fedorapeople.org/jena.spec | |
Cascading | 2.6.3 | 2.6.3 | https://gil.fedorapeople.org/cascading.spec | |
Apache Flume package status
Package status
The package builds with this assumptions (we are working on this issues)
- The code is not ready for Thrift v0.9.1 available in Fedora 21, however Flume code can builds using legacy Thrift built-in code available in the upstream Flume TGZ.
- Disable ElasticSearch Sink
- Disable Morphline Solr Sink
- Disable Twitter Source
- Disable Kite Dataset Sink
Testing the package
git clone https://github.com/fedora-bigdata-rpms/flume-rpm.git cd flume-rpm spectool -g flume.spec rpmbuild -bs --nodeps --define "_sourcedir ." --define "_srcrpmdir ." flume.spec sudo mock flume-1.5.2-1.fc21.src.rpm
Dependency packages
- In order to build Flume with full features those are the dependency packages and their status:
Package | Bugzilla | Status |
---|---|---|
irclib | Package is available in Rawhide and in Fedora 21 as an update | |
mapdb | Package is available in Rawhide and in Fedora 21 as an update | |
asynchbase | No BZ ticket | No added for revision in Bugzilla |
suasync | No BZ ticket | aynchbase dependency. No added for revision in Bugzilla |
kite | RHBZ #1179355 | Patched in order to support Fedora Guava version (partial support). |
parquet | kite package dependency. Package is available in Rawhide and was submitted to Fedora 22 and 21 as an update | |
parquet-format | parquet package dependency. Package is available in Rawhide and was submitted to Fedora 22 and 21 as an update | |
maxmind-db-java | kite package dependency. Package is available in Rawhide and in Fedora 21 as an update | |
ua-parser-java | kite package dependency. Package is available in Rawhide and in Fedora 21 as an update | |
elasticsearch | Package is available in Rawhide and in Fedora 22 as an update |
Apache Storm package status
Apache Kafka package status
Apache Kafka is a distributed publish-subscribe messaging system persistent oriented with O(1) disk structures that provide constant time performance even with many TB of stored messages.
Apache Kafka is based on Scala language. Scala uses sbt (Simple Build Tool) for builds, it's the de facto build tool for the Scala community. Sbt is similar to Apache Ant, and uses Apache Ivy (a sub-project of the Apache Ant project) for resolving project dependencies.
We have two methods for scala based project RPM building:
- Building packages with sbt and the climbing-nemesis script (a tool to make a temporary Ivy repository from installed Fedora packages)
SIGs/bigdata/packaging/Sbt sbt is in Fedora 20
- Building packages with sbt and xmvn’s Ivy resolution support
SIGs/bigdata/packaging/Scala Changes/ImprovedScalaEcosystem Changes/ImprovedIvyPackaging