(Created page with "<!-- The actual name of your proposed change page should look something like: Changes/Your_Change_Proposal_Name. This keeps all change proposals in the same namespace --> = ...") |
No edit summary |
||
Line 4: | Line 4: | ||
== Summary == | == Summary == | ||
Apache Spark is a fast and general engine for large-scale data processing. This change brings Spark to Fedora, allowing easy deployment and development of Spark applications on Fedora. | Apache Spark is a fast and general engine for large-scale data processing. This change brings Spark to Fedora, allowing easy deployment and development of Spark applications on Fedora. | ||
== Owner == | == Owner == | ||
* Name: [[User:Willb| William Benton]] | * Name: [[User:Willb| William Benton]] | ||
* Email: <code>willb@redhat.com</code> | * Email: <code>willb@redhat.com</code> | ||
* Release notes owner: <!--- To be assigned by docs team [[User:FASAccountName| Release notes owner name]] <email address> --> | * Release notes owner: <!--- To be assigned by docs team [[User:FASAccountName| Release notes owner name]] <email address> --> | ||
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo) | <!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo) | ||
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address> | * FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address> | ||
--> | --> | ||
Line 39: | Line 30: | ||
== Detailed Description == | == Detailed Description == | ||
Apache Spark is a fast and general engine for large-scale data processing. It supports developing custom analytic processing applications over large data sets or streaming data. Because it has the capability to cache intermediate results in cluster memory and schedule DAGs of computations, Spark programs can run up to 100x faster than equivalent Hadoop MapReduce jobs. Spark applications are easy to develop, parallel, fast, and resilient to failure, and they can operate on data from in-memory collections, local files, a Hadoop-compatible filesystem, or from a variety of streaming sources. Spark also includes libraries for distributed machine learning and graph algorithms. | Apache Spark is a fast and general engine for large-scale data processing. It supports developing custom analytic processing applications over large data sets or streaming data. Because it has the capability to cache intermediate results in cluster memory and schedule DAGs of computations, Spark programs can run up to 100x faster than equivalent Hadoop MapReduce jobs. Spark applications are easy to develop, parallel, fast, and resilient to failure, and they can operate on data from in-memory collections, local files, a Hadoop-compatible filesystem, or from a variety of streaming sources. Spark also includes libraries for distributed machine learning and graph algorithms. | ||
== Benefit to Fedora == | == Benefit to Fedora == | ||
Apache Spark is a tremendously exciting project and having it in Fedora makes Fedora a better platform for big data, machine learning, and analytics development, as well as for deploying and distributing these kinds of applications. | Apache Spark is a tremendously exciting project and having it in Fedora makes Fedora a better platform for big data, machine learning, and analytics development, as well as for deploying and distributing these kinds of applications. | ||
== Scope == | == Scope == | ||
* Proposal owners: Currently our [http://pkgs.fedoraproject.org/cgit/spark.git Spark package has been accepted into Fedora]. It features nearly all of the functionality available from the upstream release. (The missing features -- specifically, Python bindings, the Spark REPL, Kryo-based serialization, primitives for approximate cardinalities of very large sets, and Mesos integration -- were missing from the initial packages due to unavailable dependencies and bundling issues; we're working to close the gap with upstream as quickly as possible.) This work depended upon [[Changes/ImprovedScalaEcosystem|Fedora 21's improved support for the Scala ecosystem]]. | * Proposal owners: Currently our [http://pkgs.fedoraproject.org/cgit/spark.git Spark package has been accepted into Fedora]. It features nearly all of the functionality available from the upstream release. (The missing features -- specifically, Python bindings, the Spark REPL, Kryo-based serialization, primitives for approximate cardinalities of very large sets, and Mesos integration -- were missing from the initial packages due to unavailable dependencies and bundling issues; we're working to close the gap with upstream as quickly as possible.) This work depended upon [[Changes/ImprovedScalaEcosystem|Fedora 21's improved support for the Scala ecosystem]]. | ||
* Other developers: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | * Other developers: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
* Release engineering: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | * Release engineering: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
* Policies and guidelines: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | * Policies and guidelines: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
== Upgrade/compatibility impact == | == Upgrade/compatibility impact == | ||
N/A | |||
N/A | |||
== How To Test == | == How To Test == | ||
It should be possible to install Spark from Fedora repositories and develop and run applications against it. I can prepare a simple Fedora-specific example if necessary. | It should be possible to install Spark from Fedora repositories and develop and run applications against it. I can prepare a simple Fedora-specific example if necessary. | ||
== User Experience == | == User Experience == | ||
Users will be able to develop and deploy applications based on Apache Spark in Fedora without relying on third-party software distributions. | Users will be able to develop and deploy applications based on Apache Spark in Fedora without relying on third-party software distributions. | ||
== Dependencies == | == Dependencies == | ||
This work partially motivated and was dependent upon [[Changes/ImprovedScalaEcosystem|Fedora 21's improved support for the Scala ecosystem]], but the packages listed there are all complete and available in F21. | This work partially motivated and was dependent upon [[Changes/ImprovedScalaEcosystem|Fedora 21's improved support for the Scala ecosystem]], but the packages listed there are all complete and available in F21. | ||
== Contingency Plan == | == Contingency Plan == | ||
* Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | * Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
* Contingency deadline: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | * Contingency deadline: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
* Blocks release? N/A (not a System Wide Change), Yes/No <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | * Blocks release? N/A (not a System Wide Change), Yes/No <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
* Blocks product? product <-- Applicable for Changes that blocks specific product release/Fedora.next --> | * Blocks product? product <-- Applicable for Changes that blocks specific product release/Fedora.next --> | ||
== Documentation == | == Documentation == | ||
N/A (not a System Wide Change) | N/A (not a System Wide Change) | ||
== Release Notes == | == Release Notes == | ||
Fedora 21 includes Apache Spark, a fast and general engine for large-scale data processing on clusters. | Fedora 21 includes Apache Spark, a fast and general engine for large-scale data processing on clusters. | ||
[[Category:ChangeReadyForWrangler]] | [[Category:ChangeReadyForWrangler]] | ||
[[Category:SelfContainedChange]] | [[Category:SelfContainedChange]] | ||
Revision as of 23:17, 20 March 2014
Apache Spark
Summary
Apache Spark is a fast and general engine for large-scale data processing. This change brings Spark to Fedora, allowing easy deployment and development of Spark applications on Fedora.
Owner
- Name: William Benton
- Email:
willb@redhat.com
- Release notes owner:
Current status
- Targeted release: Fedora 21
- Last updated: 20 March 2014
- Tracker bug: <will be assigned by the Wrangler>
Detailed Description
Apache Spark is a fast and general engine for large-scale data processing. It supports developing custom analytic processing applications over large data sets or streaming data. Because it has the capability to cache intermediate results in cluster memory and schedule DAGs of computations, Spark programs can run up to 100x faster than equivalent Hadoop MapReduce jobs. Spark applications are easy to develop, parallel, fast, and resilient to failure, and they can operate on data from in-memory collections, local files, a Hadoop-compatible filesystem, or from a variety of streaming sources. Spark also includes libraries for distributed machine learning and graph algorithms.
Benefit to Fedora
Apache Spark is a tremendously exciting project and having it in Fedora makes Fedora a better platform for big data, machine learning, and analytics development, as well as for deploying and distributing these kinds of applications.
Scope
- Proposal owners: Currently our Spark package has been accepted into Fedora. It features nearly all of the functionality available from the upstream release. (The missing features -- specifically, Python bindings, the Spark REPL, Kryo-based serialization, primitives for approximate cardinalities of very large sets, and Mesos integration -- were missing from the initial packages due to unavailable dependencies and bundling issues; we're working to close the gap with upstream as quickly as possible.) This work depended upon Fedora 21's improved support for the Scala ecosystem.
- Other developers: N/A (not a System Wide Change)
- Release engineering: N/A (not a System Wide Change)
- Policies and guidelines: N/A (not a System Wide Change)
Upgrade/compatibility impact
N/A
How To Test
It should be possible to install Spark from Fedora repositories and develop and run applications against it. I can prepare a simple Fedora-specific example if necessary.
User Experience
Users will be able to develop and deploy applications based on Apache Spark in Fedora without relying on third-party software distributions.
Dependencies
This work partially motivated and was dependent upon Fedora 21's improved support for the Scala ecosystem, but the packages listed there are all complete and available in F21.
Contingency Plan
- Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
- Contingency deadline: N/A (not a System Wide Change)
- Blocks release? N/A (not a System Wide Change), Yes/No
- Blocks product? product <-- Applicable for Changes that blocks specific product release/Fedora.next -->
Documentation
N/A (not a System Wide Change)
Release Notes
Fedora 21 includes Apache Spark, a fast and general engine for large-scale data processing on clusters.