Line 9: | Line 9: | ||
== Current Status == | == Current Status == | ||
Apache Spark [https://bugzilla.redhat.com/show_bug.cgi?id=1071495 is available] in f21. Since the Spark package currently depends on xmvn functionality only available in f21 and later versions, it is unlikely | Apache Spark [https://bugzilla.redhat.com/show_bug.cgi?id=1071495 is available] in f21. Since the Spark package currently depends on xmvn functionality only available in f21 and later versions, it is unlikely that we will produce a Spark package for f20 or earlier versions. The Fedora Spark package includes Spark Core, MLLib, GraphX, Bagel, and Spark Streaming, but currently (as of <code>spark-0.9.0-0.2</code>) has some differences from upstream Spark: | ||
* it builds against Akka 2.3.0-RC2 instead of Akka 2.2.3 (this entails some trivial API changes) | * it builds against Akka 2.3.0-RC2 instead of Akka 2.2.3 (this entails some trivial API changes) | ||
* it doesn't support Kryo serialization yet (because [https://github.com/twitter/chill Chill] isn't yet available in Fedora), which limits some Spark Streaming functionality | * it doesn't support Kryo serialization yet (because [https://github.com/twitter/chill Chill] isn't yet available in Fedora), which limits some Spark Streaming functionality | ||
* it doesn't support functionality dependent on [https://github.com/addthis/stream-lib Clearspring <code>stream-lib</code>], since <code>stream-lib</code> depends on bundled code and can't yet be packaged for Fedora; most notably, the <code>countApproxDistinctByKey</code> methods on RDDs aren't supported. | * it doesn't support functionality dependent on [https://github.com/addthis/stream-lib Clearspring <code>stream-lib</code>], since <code>stream-lib</code> depends on bundled code and can't yet be packaged for Fedora; most notably, the <code>countApproxDistinctByKey</code> methods on RDDs aren't supported. | ||
* it doesn't support Mesos, | * it doesn't support Mesos. However, the most recent builds of the Fedora Mesos package include Java support; stay tuned. | ||
* Spark is not built as a monolithic assembly jar (since this isn't compatible with Fedora guidelines) | * Spark is not built as a monolithic assembly jar (since this isn't compatible with Fedora guidelines) | ||
Revision as of 16:15, 24 March 2014
Spark packaging
Background
See Scala packaging for details on Fedora support for the Scala toolchain; briefly, we have version 2.10.3 of Scala in F19, F20, and Rawhide. sbt 0.13.1 is available in F20 and Rawhide.
Spark 0.9.0 works with Scala 2.10.3, and the upstream source repository currently has support for SBT 0.13.x.
Current Status
Apache Spark is available in f21. Since the Spark package currently depends on xmvn functionality only available in f21 and later versions, it is unlikely that we will produce a Spark package for f20 or earlier versions. The Fedora Spark package includes Spark Core, MLLib, GraphX, Bagel, and Spark Streaming, but currently (as of spark-0.9.0-0.2
) has some differences from upstream Spark:
- it builds against Akka 2.3.0-RC2 instead of Akka 2.2.3 (this entails some trivial API changes)
- it doesn't support Kryo serialization yet (because Chill isn't yet available in Fedora), which limits some Spark Streaming functionality
- it doesn't support functionality dependent on Clearspring
stream-lib
, sincestream-lib
depends on bundled code and can't yet be packaged for Fedora; most notably, thecountApproxDistinctByKey
methods on RDDs aren't supported. - it doesn't support Mesos. However, the most recent builds of the Fedora Mesos package include Java support; stay tuned.
- Spark is not built as a monolithic assembly jar (since this isn't compatible with Fedora guidelines)
We're working on addressing the limitations and bringing other Spark-related projects to Fedora.
Most of the Scala packages build with tests disabled (due to unavailable test dependencies) or with varying degrees of modification to the upstream build process (due to varying dependency versions or unavailable sbt
plugins). If you're looking for an easy way to get involved, packaging some of these missing dependencies would be a great place to start. See the list under Scala packaging or find willb on IRC for more information
Dependencies
Spark requires Scala and SBT in order to build. (Note that there is a Maven build option as well, but it requires artifacts that need to be built with SBT.)
The easiest and most up-to-date place to see the dependency list is in the Spark repository itself, but here we will call out some notable dependencies that aren't already in Fedora:
Project | State | Review BZ | Packager | Notes |
---|---|---|---|---|
sbt | available in F20 and rawhide | sbt-package BZ | willb | |
Not necessary any more with this patch (either carried or integrated into upstream) | willb | |||
json4s | available in f20 and rawhide | json4s-package BZ | willb | |
akka | available to review | 1069871 | willb (with gil) | |
Squeryl | awaiting a reviewer | 1057770 | willb | dependency of lift; no longer strictly necessary but still nice to have for the Scala ecosystem |
scalaz | available in f20 and rawhide | 1055809 | willb | dependency of lift-json; no longer strictly necessary but still nice to have for the Scala ecosystem |
metrics | available in F20 | 861502 | gil | Coda Hale's metrics (Java/Maven build). |
stream-lib | see here; required for approximate set cardinality functionality; not packaged yet because it depends on a Bloom filter implementation bundled and forked from Cassandra | |||
chill | see here; required for Kryo-based serializers and some streaming functionality |