From Fedora Project Wiki
 
(7 intermediate revisions by the same user not shown)
Line 9: Line 9:
* [[SIGs/bigdata/packaging/Scala]] / [[SIGs/bigdata/packaging/Sbt]]
* [[SIGs/bigdata/packaging/Scala]] / [[SIGs/bigdata/packaging/Sbt]]


== Bootstrap Hadoop ==
== Installation and Setup (as root) ==
Install Hadoop:
Install Hadoop:
  # dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn*
  # dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn*
Set the JAVA_HOME environment variable within the Hadoop configuration file (the default does not seem to work)
Set the JAVA_HOME environment variable within the Hadoop and YARN configuration files (the default files do not seem to work)
  # vi /etc/hadoop/hadoop-env.sh
  # vi /etc/hadoop/hadoop-env.sh; vi /etc/hadoop/yarn-env.sh
For instance, with [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Oracle Java JDK 8], the line should read something like:
For instance, with the Open JDK, the line should read something like:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-15.b17.fc23.x86_64 # On a Fedora 23
Or with [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Oracle Java JDK 8], the line would become:
  export JAVA_HOME=/usr/java/jdk1.8.0_51
  export JAVA_HOME=/usr/java/jdk1.8.0_51
You may want to adjust the amount of memory and the number of cores for the YARN cluster,
by adding the following lines to <tt>/etc/hadoop/yarn-site.xml</tt> (derived from [http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml yarn-default.xml]):
  <property>
    <description>Number of CPU cores that can be allocated for containers.</description>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>2</value>
  </property>
  <property>
    <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
  </property>
  <property>
    <description>The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.</description>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
  </property>
Format the name-node:
Format the name-node:
  # runuser hdfs -s /bin/bash /bin/bash -c "hadoop namenode -format"
  # runuser hdfs -s /bin/bash /bin/bash -c "hdfs namenode -format"
which should produce something like:
which should produce something like:
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
  15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG:  
  15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG:  
  /************************************************************
  /************************************************************
Line 45: Line 63:
  ************************************************************/
  ************************************************************/
Start the Hadoop services:
Start the Hadoop services:
  # systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager
  # systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Check that the Hadoop services have been started:
Check that the Hadoop services have been started:
  # systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager
  # systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Enable the Hadoop services permanently, in case everything went smoothly:
Enable the Hadoop services permanently, in case everything went smoothly:
  # systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager
  # systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Create the default HDFS directories:
Create the default HDFS directories:
  # hdfs-create-dirs
  # hdfs-create-dirs
Web UI:
* Node: http://localhost:8042
* Resource Manager (RM): http://localhost:8088


== Setting Up a User's Sandbox (as root) ==
== Setting Up a User's Sandbox (as root) ==
Line 57: Line 78:
  # runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -mkdir /user/build"
  # runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -mkdir /user/build"
  # runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -chown build /user/build"
  # runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -chown build /user/build"
== Running WordCount (as user) ==
For simplicity, a WordCount example is available on GitHub that you can copy:
$ git clone https://github.com/timothysc/hadoop-tests.git
Once it has downloaded you can put the example .txt file into your user location
$ cd WordCount
$ hadoop fs -put constitution.txt /user/build
Now you can build WordCount against the system installed .jars.
$ mvn-rpmbuild package
Finally you can run:
$ hadoop jar wordcount.jar org.myorg.WordCount /user/build /user/build/output
Feel free to cat the part-0000 file to see the results.


== References ==
== References ==


[[User:Denisarnaud#NoSQL|Denis Arnaud's page]] >
[[User:Denisarnaud#NoSQL|Denis Arnaud's page]] >

Latest revision as of 21:17, 10 January 2016

Denis Arnaud's page >

Overview

Bootstrapping Hadoop on Fedora for Fedora 22+.

See Also

Installation and Setup (as root)

Install Hadoop:

# dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn*

Set the JAVA_HOME environment variable within the Hadoop and YARN configuration files (the default files do not seem to work)

# vi /etc/hadoop/hadoop-env.sh; vi /etc/hadoop/yarn-env.sh

For instance, with the Open JDK, the line should read something like:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-15.b17.fc23.x86_64 # On a Fedora 23

Or with Oracle Java JDK 8, the line would become:

export JAVA_HOME=/usr/java/jdk1.8.0_51

You may want to adjust the amount of memory and the number of cores for the YARN cluster, by adding the following lines to /etc/hadoop/yarn-site.xml (derived from yarn-default.xml):

 <property>
   <description>Number of CPU cores that can be allocated for containers.</description>
   <name>yarn.nodemanager.resource.cpu-vcores</name>
   <value>2</value>
 </property>
 <property>
   <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
   <name>yarn.nodemanager.resource.memory-mb</name>
   <value>2048</value>
 </property>
 <property>
   <description>The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.</description>
   <name>yarn.scheduler.maximum-allocation-mb</name>
   <value>2048</value>
 </property>

Format the name-node:

# runuser hdfs -s /bin/bash /bin/bash -c "hdfs namenode -format"

which should produce something like:

15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = myhost.mydomain/127.0.0.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.4.1
STARTUP_MSG:   classpath = /etc/hadoop:/usr/share/hadoop/common/lib/asm-tree-5.0.3.jar:[...]
STARTUP_MSG:   build = Unknown -r Unknown; compiled by 'mockbuild' on 2015-04-21T22:21Z
STARTUP_MSG:   java = 1.8.0_51
[...]
************************************************************/
15/08/16 19:09:16 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/08/16 19:09:16 INFO namenode.NameNode: createNameNode [-format]
15/08/16 19:09:16 INFO namenode.AclConfigFlag: ACLs enabled? false
15/08/16 19:09:16 INFO namenode.FSImage: Allocated new BlockPoolId: BP-393991083-127.0.0.1-1439744956758
15/08/16 19:09:16 INFO common.Storage: Storage directory /var/lib/hadoop-hdfs/hdfs/dfs/namenode has been successfully formatted.
15/08/16 19:09:16 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/08/16 19:09:16 INFO util.ExitUtil: Exiting with status 0
15/08/16 19:09:16 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at myhost.mydomain/127.0.0.1
************************************************************/

Start the Hadoop services:

# systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs

Check that the Hadoop services have been started:

# systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs

Enable the Hadoop services permanently, in case everything went smoothly:

# systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs

Create the default HDFS directories:

# hdfs-create-dirs

Web UI:

Setting Up a User's Sandbox (as root)

In the following commands, build is the Unix user name:

# runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -mkdir /user/build"
# runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -chown build /user/build"

Running WordCount (as user)

For simplicity, a WordCount example is available on GitHub that you can copy:

$ git clone https://github.com/timothysc/hadoop-tests.git

Once it has downloaded you can put the example .txt file into your user location

$ cd WordCount
$ hadoop fs -put constitution.txt /user/build

Now you can build WordCount against the system installed .jars.

$ mvn-rpmbuild package 

Finally you can run:

$ hadoop jar wordcount.jar org.myorg.WordCount /user/build /user/build/output

Feel free to cat the part-0000 file to see the results.

References

Denis Arnaud's page >