From Fedora Project Wiki

< SIGs‎ | bigdata‎ | packaging

 
(17 intermediate revisions by the same user not shown)
Line 3: Line 3:
== Background ==
== Background ==


See [[SIGs/bigdata/packaging/Scala|Scala packaging]] for details on Fedora support for the Scala toolchain; briefly, we have version 2.10.3 of Scala in F19, F20, and Rawhide and sbt 0.13.1 is under review in Fedora as well.
See [[SIGs/bigdata/packaging/Scala|Scala packaging]] for details on Fedora support for the Scala toolchain; briefly, we have version 2.10.3 of Scala in F19, F20, and Rawhidesbt 0.13.1 is available in F20 and Rawhide.


Spark 0.9.0 works with Scala 2.10.3, and the upstream source repository currently has support for SBT 0.13.x.
Spark 0.9.0 works with Scala 2.10.3, and the upstream source repository currently has support for SBT 0.13.x.
Line 9: Line 9:
== Current Status ==
== Current Status ==


With some light Fedora-specific patching, we are able to build and run the Scala 2.10 branch of Spark locally in Fedora against our <code>sbt</code> and Scala. Once <code>sbt</code> is available, other things should follow pretty quicklyDetailed dependency information is below.
Apache Spark [https://bugzilla.redhat.com/show_bug.cgi?id=1071495 is available] in f21.  Since the Spark package currently depends on xmvn functionality only available in f21 and later versions, it is unlikely that we will produce a Spark package for f20 or earlier versions.  The Fedora Spark package includes Spark Core, MLLib, GraphX, Bagel, and Spark Streaming, but currently (as of <code>spark-0.9.1-0.3</code>) has some differences from upstream Spark:
 
* it builds against Akka 2.3.0-RC2 instead of Akka 2.2.3 (this entails some trivial API changes)
* it doesn't support Kryo serialization yet (because [https://github.com/twitter/chill Chill] isn't yet available in Fedora), which limits some Spark Streaming functionality
* it doesn't support MesosHowever, the most recent builds of the Fedora Mesos package include Java support; stay tuned.
* Spark is not built as a monolithic assembly jar (since this isn't compatible with Fedora guidelines)
 
We're working on addressing the limitations and bringing other Spark-related projects to Fedora.


Most of the Scala packages build with tests disabled (due to unavailable test dependencies) or with varying degrees of modification to the upstream build process (due to varying dependency versions or unavailable <code>sbt</code> plugins).  If you're looking for an easy way to get involved, packaging some of these missing dependencies would be a great place to start.  See the list under [[SIGs/bigdata/packaging/Scala#Other_useful_Scala_and_sbt_dependencies|Scala packaging]] or find [[User:Willb|willb]] on IRC for more information
Most of the Scala packages build with tests disabled (due to unavailable test dependencies) or with varying degrees of modification to the upstream build process (due to varying dependency versions or unavailable <code>sbt</code> plugins).  If you're looking for an easy way to get involved, packaging some of these missing dependencies would be a great place to start.  See the list under [[SIGs/bigdata/packaging/Scala#Other_useful_Scala_and_sbt_dependencies|Scala packaging]] or find [[User:Willb|willb]] on IRC for more information
Line 17: Line 24:
Spark requires Scala and SBT in order to build.  (Note that there is a Maven build option as well, but it requires artifacts that need to be built with SBT.)  
Spark requires Scala and SBT in order to build.  (Note that there is a Maven build option as well, but it requires artifacts that need to be built with SBT.)  


The easiest and most up-to-date place to see the dependency list is in the Spark repository itself, but here we will call out some notable dependencies that aren't already in Fedora:
The easiest and most up-to-date place to see the dependency list is in the Spark repository itself, but here we will call out some notable dependencies that we needed to add to Fedora:


{| class="wikitable"
{| class="wikitable"
Line 24: Line 31:
  |-
  |-
  | sbt
  | sbt
  | under review
  | available in F20 and rawhide
  | [https://bugzilla.redhat.com/show_bug.cgi?id=sbt-package sbt-package BZ]
  | [https://bugzilla.redhat.com/show_bug.cgi?id=sbt-package sbt-package BZ]
| willb
|
|-
| <strike>lift-json</strike>
| Not necessary any more with [https://github.com/apache/incubator-spark/pull/582 this patch] (either carried or integrated into upstream)
|
  | willb
  | willb
  |
  |
  |-
  |-
  | json4s
  | json4s
  | available to review
  | available in f20 and rawhide
  | [https://bugzilla.redhat.com/show_bug.cgi?id=1067664 json4s-package BZ]
  | [https://bugzilla.redhat.com/show_bug.cgi?id=1067664 json4s-package BZ]
  | willb
  | willb
Line 42: Line 43:
  |-
  |-
  | akka
  | akka
  | gil and willb are looking at this
  | available in f20 and rawhide
  |
  | [https://bugzilla.redhat.com/show_bug.cgi?id=1069871 1069871]
  | gil
  | willb (with gil)
  |  
  |  
|-
| Squeryl
| awaiting a reviewer
| [https://bugzilla.redhat.com/show_bug.cgi?id=1057770 1057770]
| willb
| dependency of lift; no longer strictly necessary but still nice to have for the Scala ecosystem
  |-
  |-
  | scalaz
  | scalaz
  | awaiting a reviewer
  | available in f20 and rawhide
  | [https://bugzilla.redhat.com/show_bug.cgi?id=1055809 1055809]
  | [https://bugzilla.redhat.com/show_bug.cgi?id=1055809 1055809]
  | willb
  | willb
  | dependency of lift-json; no longer strictly necessary but still nice to have for the Scala ecosystem
  | dependency of lift-json; no longer strictly necessary but still nice to have for the Scala ecosystem
  |-
  |-  
  | metrics
  | metrics
  | available in F20
  | available in F20
Line 64: Line 59:
  | gil
  | gil
  | Coda Hale's metrics (Java/Maven build).
  | Coda Hale's metrics (Java/Maven build).
|-
| stream-lib
| available in rawhide
|
|
| [https://github.com/addthis/stream-lib see here]; required for approximate set cardinality functionality; needed an FPC exception for its Bloom filter implementation (bundled and forked from Cassandra)
|-
| chill
|
|
|
| [https://github.com/twitter/chill see here]; required for Kryo-based serializers and some streaming functionality
  |-}
  |-}

Latest revision as of 11:57, 5 April 2014

Spark packaging

Background

See Scala packaging for details on Fedora support for the Scala toolchain; briefly, we have version 2.10.3 of Scala in F19, F20, and Rawhide. sbt 0.13.1 is available in F20 and Rawhide.

Spark 0.9.0 works with Scala 2.10.3, and the upstream source repository currently has support for SBT 0.13.x.

Current Status

Apache Spark is available in f21. Since the Spark package currently depends on xmvn functionality only available in f21 and later versions, it is unlikely that we will produce a Spark package for f20 or earlier versions. The Fedora Spark package includes Spark Core, MLLib, GraphX, Bagel, and Spark Streaming, but currently (as of spark-0.9.1-0.3) has some differences from upstream Spark:

  • it builds against Akka 2.3.0-RC2 instead of Akka 2.2.3 (this entails some trivial API changes)
  • it doesn't support Kryo serialization yet (because Chill isn't yet available in Fedora), which limits some Spark Streaming functionality
  • it doesn't support Mesos. However, the most recent builds of the Fedora Mesos package include Java support; stay tuned.
  • Spark is not built as a monolithic assembly jar (since this isn't compatible with Fedora guidelines)

We're working on addressing the limitations and bringing other Spark-related projects to Fedora.

Most of the Scala packages build with tests disabled (due to unavailable test dependencies) or with varying degrees of modification to the upstream build process (due to varying dependency versions or unavailable sbt plugins). If you're looking for an easy way to get involved, packaging some of these missing dependencies would be a great place to start. See the list under Scala packaging or find willb on IRC for more information

Dependencies

Spark requires Scala and SBT in order to build. (Note that there is a Maven build option as well, but it requires artifacts that need to be built with SBT.)

The easiest and most up-to-date place to see the dependency list is in the Spark repository itself, but here we will call out some notable dependencies that we needed to add to Fedora:

Dependencies
Project State Review BZ Packager Notes
sbt available in F20 and rawhide sbt-package BZ willb
json4s available in f20 and rawhide json4s-package BZ willb
akka available in f20 and rawhide 1069871 willb (with gil)
scalaz available in f20 and rawhide 1055809 willb dependency of lift-json; no longer strictly necessary but still nice to have for the Scala ecosystem
metrics available in F20 861502 gil Coda Hale's metrics (Java/Maven build).
stream-lib available in rawhide see here; required for approximate set cardinality functionality; needed an FPC exception for its Bloom filter implementation (bundled and forked from Cassandra)
chill see here; required for Kryo-based serializers and some streaming functionality