Denisarnaud (talk | contribs) |
Denisarnaud (talk | contribs) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 12: | Line 12: | ||
Install Hadoop: | Install Hadoop: | ||
# dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn* | # dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn* | ||
Set the JAVA_HOME environment variable within the Hadoop configuration | Set the JAVA_HOME environment variable within the Hadoop and YARN configuration files (the default files do not seem to work) | ||
# vi /etc/hadoop/hadoop-env.sh | # vi /etc/hadoop/hadoop-env.sh; vi /etc/hadoop/yarn-env.sh | ||
For instance, with [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Oracle Java JDK 8], the line | For instance, with the Open JDK, the line should read something like: | ||
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-15.b17.fc23.x86_64 # On a Fedora 23 | |||
Or with [http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Oracle Java JDK 8], the line would become: | |||
export JAVA_HOME=/usr/java/jdk1.8.0_51 | export JAVA_HOME=/usr/java/jdk1.8.0_51 | ||
You may want to adjust the amount of memory and the number of cores for the YARN cluster, | |||
by adding the following lines to <tt>/etc/hadoop/yarn-site.xml</tt> (derived from [http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml yarn-default.xml]): | |||
<property> | |||
<description>Number of CPU cores that can be allocated for containers.</description> | |||
<name>yarn.nodemanager.resource.cpu-vcores</name> | |||
<value>2</value> | |||
</property> | |||
<property> | |||
<description>Amount of physical memory, in MB, that can be allocated for containers.</description> | |||
<name>yarn.nodemanager.resource.memory-mb</name> | |||
<value>2048</value> | |||
</property> | |||
<property> | |||
<description>The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.</description> | |||
<name>yarn.scheduler.maximum-allocation-mb</name> | |||
<value>2048</value> | |||
</property> | |||
Format the name-node: | Format the name-node: | ||
# runuser hdfs -s /bin/bash /bin/bash -c " | # runuser hdfs -s /bin/bash /bin/bash -c "hdfs namenode -format" | ||
which should produce something like: | which should produce something like: | ||
15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG: | 15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG: | ||
/************************************************************ | /************************************************************ | ||
Line 45: | Line 63: | ||
************************************************************/ | ************************************************************/ | ||
Start the Hadoop services: | Start the Hadoop services: | ||
# systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager | # systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs | ||
Check that the Hadoop services have been started: | Check that the Hadoop services have been started: | ||
# systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager | # systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs | ||
Enable the Hadoop services permanently, in case everything went smoothly: | Enable the Hadoop services permanently, in case everything went smoothly: | ||
# systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager | # systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs | ||
Create the default HDFS directories: | Create the default HDFS directories: | ||
# hdfs-create-dirs | # hdfs-create-dirs | ||
Web UI: | |||
* Node: http://localhost:8042 | |||
* Resource Manager (RM): http://localhost:8088 | |||
== Setting Up a User's Sandbox (as root) == | == Setting Up a User's Sandbox (as root) == | ||
Line 61: | Line 82: | ||
$ git clone https://github.com/timothysc/hadoop-tests.git | $ git clone https://github.com/timothysc/hadoop-tests.git | ||
Once it has downloaded you can put the example .txt file into your user location | Once it has downloaded you can put the example .txt file into your user location | ||
$ cd | $ cd WordCount | ||
$ hadoop fs -put constitution.txt /user/build | $ hadoop fs -put constitution.txt /user/build | ||
Now you can build WordCount against the system installed .jars. | Now you can build WordCount against the system installed .jars. |
Latest revision as of 21:17, 10 January 2016
Overview
Bootstrapping Hadoop on Fedora for Fedora 22+.
See Also
- Hadoop on Fedora 20
- Bootstrapping Your MapReduce 2.X Programming on Fedora 20
- SIGs/bigdata/packaging/Scala / SIGs/bigdata/packaging/Sbt
Installation and Setup (as root)
Install Hadoop:
# dnf install hadoop-common hadoop-hdfs hadoop-mapreduce hadoop-mapreduce-examples hadoop-yarn maven-* xmvn*
Set the JAVA_HOME environment variable within the Hadoop and YARN configuration files (the default files do not seem to work)
# vi /etc/hadoop/hadoop-env.sh; vi /etc/hadoop/yarn-env.sh
For instance, with the Open JDK, the line should read something like:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.65-15.b17.fc23.x86_64 # On a Fedora 23
Or with Oracle Java JDK 8, the line would become:
export JAVA_HOME=/usr/java/jdk1.8.0_51
You may want to adjust the amount of memory and the number of cores for the YARN cluster, by adding the following lines to /etc/hadoop/yarn-site.xml (derived from yarn-default.xml):
<property> <description>Number of CPU cores that can be allocated for containers.</description> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>2</value> </property>
<property> <description>Amount of physical memory, in MB, that can be allocated for containers.</description> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property>
<property> <description>The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.</description> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> </property>
Format the name-node:
# runuser hdfs -s /bin/bash /bin/bash -c "hdfs namenode -format"
which should produce something like:
15/08/16 19:09:15 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = myhost.mydomain/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.4.1 STARTUP_MSG: classpath = /etc/hadoop:/usr/share/hadoop/common/lib/asm-tree-5.0.3.jar:[...] STARTUP_MSG: build = Unknown -r Unknown; compiled by 'mockbuild' on 2015-04-21T22:21Z STARTUP_MSG: java = 1.8.0_51 [...] ************************************************************/ 15/08/16 19:09:16 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 15/08/16 19:09:16 INFO namenode.NameNode: createNameNode [-format] 15/08/16 19:09:16 INFO namenode.AclConfigFlag: ACLs enabled? false 15/08/16 19:09:16 INFO namenode.FSImage: Allocated new BlockPoolId: BP-393991083-127.0.0.1-1439744956758 15/08/16 19:09:16 INFO common.Storage: Storage directory /var/lib/hadoop-hdfs/hdfs/dfs/namenode has been successfully formatted. 15/08/16 19:09:16 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 15/08/16 19:09:16 INFO util.ExitUtil: Exiting with status 0 15/08/16 19:09:16 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at myhost.mydomain/127.0.0.1 ************************************************************/
Start the Hadoop services:
# systemctl start hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Check that the Hadoop services have been started:
# systemctl status hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Enable the Hadoop services permanently, in case everything went smoothly:
# systemctl enable hadoop-namenode hadoop-datanode hadoop-nodemanager hadoop-resourcemanager tomcat@httpfs
Create the default HDFS directories:
# hdfs-create-dirs
Web UI:
- Node: http://localhost:8042
- Resource Manager (RM): http://localhost:8088
Setting Up a User's Sandbox (as root)
In the following commands, build is the Unix user name:
# runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -mkdir /user/build" # runuser hdfs -s /bin/bash /bin/bash -c "hadoop fs -chown build /user/build"
Running WordCount (as user)
For simplicity, a WordCount example is available on GitHub that you can copy:
$ git clone https://github.com/timothysc/hadoop-tests.git
Once it has downloaded you can put the example .txt file into your user location
$ cd WordCount $ hadoop fs -put constitution.txt /user/build
Now you can build WordCount against the system installed .jars.
$ mvn-rpmbuild package
Finally you can run:
$ hadoop jar wordcount.jar org.myorg.WordCount /user/build /user/build/output
Feel free to cat the part-0000 file to see the results.