From Fedora Project Wiki
(Moved to https://fedoraproject.org/wiki/Changes/Hadoop)
 
(130 intermediate revisions by 7 users not shown)
Line 1: Line 1:
= Apache Hadoop 2.0 =
This feature has been moved to https://fedoraproject.org/wiki/Changes/Hadoop


== Summary ==
History is preserved here for posterity.
Bring Apache Hadoop, the hottest open source big data platform, to Fedora, the hottest open source distribution. Fedora should be the best distribution for using Apache Hadoop.
 
== Owner ==
* Name: [[User:matt | Matthew Farrellee]]
* Email: matt@fedoraproject.org
 
=== People involved ===
{|
! Name !! IRC !! Focus !! Additional
|-
| [[User:matt | Matthew Farrellee]]
| mattf
| keeping track, integration testing
| UTC-5
|-
| [[User:pmackinn | Peter MacKinnon]]
| pmackinn
| packaging
| UTC-5
|-
| [[User:rrati | Rob Rati]]
| rsquared
| packaging
| UTC-5
|-
| [[User:tstclair | Timothy St. Clair]]
| tstclair
| setup and configuration
| UTC-6
|-
| [[User:skottler | Sam Kottler]]
| skottler
| packaging
| UTC-5
|-
| [[User:gil | Gil Cattaneo]]
| gil
| packaging
| UTC+1
|}
 
== Current status ==
* Targeted release: [[Releases/20 | Fedora 20 ]]
* Last updated: 3 Apr 2013
* Percentage of completion: 5%
 
 
== Detailed Description ==
Apache Hadoop is a widely used, increasingly complete big data platform, with a strong open source community and growing ecosystem. The goal is to package and integrate the core of the Hadoop ecosystem for Fedora, allowing for immediate use and creating a base for the rest of the ecosystem.
 
 
== Benefit to Fedora ==
The Apache Hadoop software will be packaged and integrated with Fedora. The core of the Hadoop ecosystem will be available with Fedora and provide a base for additional packages.
 
 
== Scope ==
* Package the Apache Hadoop 2.0.2 software
* Package all dependencies needed for Apache Hadoop 2.0.2
* Skip package dependencies required for unit testing, record them in a dependency backlog for later cleanup
 
=== Approach ===
We are taking an iterative, depth-first approach to packaging. We do not have all the dependencies mapped out ahead of time. Dependencies are being tabulated into two groups:
# ''missing'' - the dependency being requested from a hadoop-common pom has not yet been packaged, reviewed or generated into fedora repos
# ''broken'' - the dependency requested is out of date with current fedora versions, and patches must be developed for inclusion in a hadoop rpm build that address any build, API or source code deltas
Note that a dependency may show up in both of these tables.
 
Anyone who wants to help should find an available dependency below, edit the table changing the state to Active and packager to yourself.
 
While packaging a dependency, test dependencies can be skipped. Testing will be done via integration testing periodically during packaging and then after packaging completes. Test dependencies that are skipped must be added to the [[#skip|Skipped dependencies]] table below.
 
If you are ''lucky enough'' to pick a dependency that itself has unpackaged dependencies, identify the sub-dependencies and add them to the bottom of the [[#deps|Dependencies]] table below, change your current dependency to Blocked and repeat.
 
If your dependency is already packaged but the version is incompatible, contact the package owner and resolve the incompatibility in a mutually satisfactory way. For instance:
 
* If the version available in Fedora is older, explore updating the package. If that is not possible, explore creating a package that includes a version in its name, e.g. pkgnameXY. Ultimately, the most recent version in Fedora should have the name pkgname while older versions have pkgnameXY. It may take a full Fedora release to rationalize package names. Make a note in the [[#deps|Dependencies]] table.
 
* If the version you need is older than the packaged version, consider creating a patch to use the newer version. If a patch is not viable, proceed by packaging the dependency with a version in its name, e.g. pkgnameXY. Make a note in the [[#deps|Dependencies]] table.
 
{| class="wikitable"
|+ Missing dependency legend
! State !! Notes
|-
 
| '''<span style="color:darkviolet">Available</span>''' || free for someone to take
|-
| '''<span style="color:blue">Active</span>'''    || dependency is actively being packaged if missing, or patch is being developed or tested for inclusion in hadoop-common build
|-
| '''<span style="color:red">Blocked</span>'''  || pending packages for dependencies
|-
| '''<span style="color:orange">Review</span>'''    || under review, include link to review BZ
|-
| '''<span style="color:green">Complete</span>'''  || woohoo!
|}
 
{| class="wikitable"
|+ <div id="deps">Missing Dependencies</div>
! Project !! State !! Review BZ !! Packager !! Notes
|-
| hadoop
| '''<span style="color:blue">Active</span>'''
|
| [[User:rrati|rrati]],[[User:pmackinn|pmackinn]]
|
|-
|
|
|
|
|
|-
| bookkeeper
| '''<span style="color:orange">Review</span>'''
| {{bz|948589}}
| [[User:gil|gil]]
| Version 4.0 requested. packaged 4.2.1. Patch: [https://issues.apache.org/jira/browse/BOOKKEEPER-598 BOOKKEEPER-598]
|-
| glassfish-gmbal
| '''<span style="color:green">Complete</span>'''
| {{bz|859112}}
| [[User:gil|gil]]
| [https://koji.fedoraproject.org/koji/buildinfo?buildID=413470 F18 build]
|-
| glassfish-management-api
| '''<span style="color:green">Complete</span>'''
| {{bz|859110}}
| [[User:gil|gil]]
| [https://koji.fedoraproject.org/koji/buildinfo?buildID=412579 F18 build]
|-
| grizzly
| '''<span style="color:orange">Review</span>'''
| {{bz|859114}}
| [[User:gil|gil]]
|
|-
| groovy
| '''<span style="color:orange">Review</span>'''
| {{bz|858127}}
| [[User:gil|gil]]
| 1.5 requested but 1.8 packaged in fedora.  Possible moving forward 1.8 series will be known as groovy18 and groovy will be 2.x.
|-
| hsqldb
| '''<span style="color:darkviolet">Available</span>'''
|
|
| 1.8 in fedora, 2.0 requested.  2.2.8 packaged by gil, but seemingly no review request.  Needs followup.
|-
| jersey
| '''<span style="color:green">Complete</span>'''
| {{bz|825347}}
| [[User:gil|gil]]
| [https://koji.fedoraproject.org/koji/buildinfo?buildID=407315 F18 build] Should be rebuilt with grizzly2 support enabled.
|-
| jets3t
| '''<span style="color:orange">Review</span>'''
| {{bz|847109}}
| [[User:gil|gil]]
|
|-
| jspc-compiler
| '''<span style="color:blue">Active</span>'''
|
|[[User:pmackinn|pmackinn]]
|jspc specfile developed. Adaptations made for incumbent Tomcat 7 within spec. RPMs packaged in local custom repo. Reviewing fit as part of overall hadoop-common compilation/testing.
|-
| kfs
| '''<span style="color:darkviolet">Available</span>'''
|
|
| gil has packaged 0.5, but no review request. kfs has become Quantcast qfs.
|-
| maven-native
| '''<span style="color:orange">Review</span>'''
| {{bz|864084}}
| [[User:gil|gil]]
| Needs patch to build with java7. NOTE: javac target/source is already set by mojo.java.target option
|-
| zookeeper
| '''<span style="color:orange">Review</span>'''
| {{bz|823122}}
| [[User:gil|gil]]
|
|}
{| class="wikitable"
|+ <div id="deps">Broken Dependencies</div>
! Project !! Packager !! Notes
|-
| ant
|
| Version 1.6 requested, 1.8 currently packaged in Fedora.  Needs to be inspected for API/functional incompatibilities(?)
|-
| apache-commons-collections
|[[User:pmackinn|pmackinn]]
| Java import compilation error with existing package.  Patches for hadoop-common being tracked at https://github.com/pdmack/hadoop-common/tree/fed-master-collections
|-
| apache-commons-math
|[[User:pmackinn|pmackinn]]
| Current apache-commons-math uses math3 in pom instead of math, and API changes in code. Patches for hadoop-common being tracked at https://github.com/pdmack/hadoop-common/tree/fed-master-math
|-
| gmaven
| [[User:gil|gil]]
| Version 1.0 requested, available 1.4 (but has broken deps) {{bz|914056}}
|-
| hadoop-hdfs
| [[User:pmackinn|pmackinn]]
| glibc link error in hdfs native build. Patch for hadoop-common being tracked at https://github.com/pdmack/hadoop-common/tree/fed-master-cmake
|-
| jersey
| [[User:pmackinn|pmackinn]]
| Needs jersey-servlet and version. Trachked at https://github.com/pdmack/hadoop-common/tree/fed-master-jersey
|-
| jets3t
| [[User:pmackinn|pmackinn]]
| Requires 0.6.1. With 0.9.x: hadoop-common Jets3tNativeFileSystemStore.java error: incompatible types S3ObjectsChunk chunk = s3Service.listObjectsChunked(bucket.getName(). Patches for hadoop-common being tracked at https://github.com/pdmack/hadoop-common/tree/fed-master-jets3t
|-
| jetty
| [[User:rrati|rrati]]
| jetty8 packaged in Fedora, but 6.x requested. 6 and 8 are incompatible. Patches tracked at https://github.com/pdmack/hadoop-common/tree/fed-master-jetty
|-
| slf4j
|[[User:pmackinn|pmackinn]]
| Package in fedora fails to match in dependency resolution.  jcl104-over-slf4j dep in hadoop-common moved to jcl-over-slf4j as part of jspc/tomcat dep. Patch being tracked at https://github.com/pdmack/hadoop-common/tree/fed-master-jasper
|-
| tomcat-jasper
| [[User:pmackinn|pmackinn]]
| Version 5.5.x requested. Adaptations made for incumbent Tomcat 7 via patches at https://github.com/pdmack/hadoop-common/tree/fed-master-jasper. Reviewing fit as part of overall hadoop-common compilation/testing.
|}
 
{| class="wikitable"
|+ <div id="skip">Skipped dependencies</div>
! JAR !! Project !! State !! Packager !! Notes
|-
| [jar name]
| [package name]
| Available
| [[User:noone|noone]]
| Needed for tests by [[#N]]
|}
 
=== Workflow ===
Repo of dependencies already packaged an in review state can be found here:
http://repos.fedorapeople.org/repos/rrati/hadoop/
 
Currently, only Fedora 18 x86_64 packages are available
 
=== Packager tips ===
* mvn-rpmbuild utility will ONLY resolve from system repo
* mvn-local will resolve from system repo first then fallback to maven if unresolved
* can be used to find the delta between system repo packages available and missing dependencies that can be viewed in the .m2 local maven repo (find *.jar)
* -Dmaven.local.debug=true
** reveals how JPP lookups are executing per dependency -> useful for finding gId,aId mismatches
* -Dmaven.test.skip=true
** tells maven to skip test compilation
 
'''TODO: Template spec files to work from'''
 
'''TODO: Setup staging repository for sharing packages under review'''
 
'''An alternative to gmaven:'''
*  apply a patch with the following content where required
*  test support is not guaranteed, should not work.
'''   
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-antrun-plugin</artifactId>
        <version>1.7</version>
        <dependencies>
          <dependency>
            <groupId>org.codehaus.groovy</groupId>
            <artifactId>groovy</artifactId>
            <version>any</version>
          </dependency>
          <dependency>
            <groupId>antlr</groupId>
            <artifactId>antlr</artifactId>
            <version>any</version>
          </dependency>
          <dependency>
            <groupId>commons-cli</groupId>
            <artifactId>commons-cli</artifactId>
            <version>any</version>
          </dependency>
          <dependency>
            <groupId>asm</groupId>
            <artifactId>asm-all</artifactId>
            <version>any</version>
          </dependency>
          <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-nop</artifactId>
            <version>any</version>
          </dependency>
        </dependencies>
        <executions>
          <execution>
            <id>compile</id>
            <phase>process-sources</phase>
            <configuration>
              <target>
                <mkdir dir="${basedir}/target/classes"/>
                <taskdef name="groovyc" classname="org.codehaus.groovy.ant.Groovyc">
                  <classpath refid="maven.plugin.classpath"/>
                </taskdef>
                <groovyc destdir="${project.build.outputDirectory}" srcdir="${basedir}/src/main" classpathref="maven.compile.classpath">
                  <javac source="1.5" target="1.5" debug="on"/>
                </groovyc>
              </target>
            </configuration>
            <goals>
              <goal>run</goal>
            </goals>
          </execution>
        </executions>
      </plugin>'''
 
== How To Test ==
<!-- This does not need to be a full-fledged document.  Describe the dimensions of tests that this feature is expected to pass when it is done.  If it needs to be tested with different hardware or software configurations, indicate them.  The more specific you can be, the better the community testing can be.
 
Remember that you are writing this how to for interested testers to use to check out your feature - documenting what you do for testing is OK, but it's much better to document what *I* can do to test your feature.
 
A good "how to test" should answer these four questions:
 
0. What special hardware / data / etc. is needed (if any)?
1. How do I prepare my system to test this feature? What packages
need to be installed, config files edited, etc.?
2. What specific actions do I perform to check that the feature is
working like it's supposed to?
3. What are the expected results of those actions?
-->
# '''TODO: NEEDS MORE DEFINITION'''
# yum install X Y Z across one or more nodes
# Setup a simple cluster by following '''TBD'''
# Run http://hadoop.apache.org/docs/stable/gridmix.html
 
 
== User Experience ==
For users who are interested in running Apache Hadoop on Fedora, they will find it available from Fedora Project yum repositories.
 
'''TODO: SPECIFICALLY PACKAGES X Y Z'''
 
 
== Dependencies ==
No other packages currently depend on Apache Hadoop.
 
Completion of this feature will involve packaging numerous dependencies, see [[#deps|the Dependencies table]]. Some of the dependencies are already being packaged by others in the Fedora community. Where dependency overlap is found, a negotaition must occur to ensure a satisfactory version and package is available to all parties.
 
'''TODO: Is https://fedoraproject.org/wiki/Hypertable ?'''
 
 
== Contingency Plan ==
With no packages depending on Apache Hadoop, none is necessary. The biggest risk is not completing packages for all dependencies. In that case, the feature can be removed from the release notes. The packaged dependencies should remain in the distribution. The feature can be pushed to the next Fedora release.
 
 
== Documentation ==
* http://wiki.apache.org/hadoop
 
 
== Release Notes ==
<!-- The Fedora Release Notes inform end-users about what is new in the release.  Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->
* '''TODO'''
 
 
== Comments and Discussion ==
* See [[Talk:Features/Hadoop]]
 
 
[[Category:FeaturePageIncomplete]]
<!-- When your feature page is completed and ready for review -->
<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler -->
<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete-->
<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process -->

Latest revision as of 21:17, 8 July 2013

This feature has been moved to https://fedoraproject.org/wiki/Changes/Hadoop

History is preserved here for posterity.