Вы находитесь на странице: 1из 9

Aim: Set up a single-node Hadoop cluster backed by the Hadoop distributed file system,

running on ubuntu linux. After successful installation on one node, configuration of a


multi-node Hadoop cluster (one master and multiple slaves).

Hadoop installation

check all updates were already installed or not

$ sudo apt-get install update

Prerequisites
Sun java 7

$ sudo apt-get install default-jdk

After installation, make a quick check whether Sun’s JDK is correctly set up

$ java -version

Adding a dedicated Hadoop system user


We will use a dedicated Hadoop user account for running Hadoop.
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

This will add the user hduser and the group hadoop to your local machine.

Configuring SSH

$ sudo apt-get install openssh-server

Now, First login with hduser

$ su hduser

The hadoop control scripts rely on SSH to peform cluster-wide operations. For example, there is
a script for stopping and starting all the daemons in the clusters. To work seamlessly, SSH needs
to be setup to allow password-less login for the hadoop user from machines in the cluster. The
simplest way to achive this is to generate a public/private key pair, and it will be shared across
the cluster.

Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine.
For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for
the hduser user we created in the earlier.
We have to generate an SSH key for the hduser user.
$ ssh-keygen -t rsa -P “”
P “”, here indicates an empty password.

You have to enable SSH access to your local machine with this newly created key which is done
by the following command.

$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

$ ssh localhost

Now, download hadoop tar file from internet


$ wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.2/hadoop-2.7.2.tar.gz

Now, extract download zip file.

$ tar xvzf hadoop-2.7.2.tar.gz

Now, let move hadooop 2.7.2 to a directory of our choice, we will choose /usr/local/hadoop

$ sudo mv hadoop-2.7.2 /usr/local/hadoop

Let give the directory to hduser as the owner

$ sudo chown -R hduser /usr/local

Setup Configuration Files

The following files will have to be modified to complete the Hadoop setup:
1. ~/.bashrc
2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
3. /usr/local/hadoop/etc/hadoop/core-site.xml
4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
6. /usr/local/hadoop/etc/hadoop/yarn-site.xml
1. ~/.bashrc

Now let edit the bashrc file and append to the end of the file the path to hadoop

$ sudo nano ~/.bashrc


export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"

$ source ~/.bashrc

2. hadoop-env.sh
Now let give the java path to run hadoop

$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Before set the java path the sysntax looks like

export JAVA_HOME=${JAVA_HOME}

After set the java path the sysntax looks like

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

3. core-site.xml

The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting
up.
This file can be used to override the default settings that Hadoop starts with.

$ sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

add code between <configuration>...</configuration>.

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>

4. mapred-site.xml

By default, the /usr/local/hadoop/etc/hadoop/ folder contains


/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
file which has to be renamed/copied with the name mapred-site.xml

$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

The mapred-site.xml file is used to specify which framework is being used for MapReduce.

$ sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

<property>

<name>mapred.job.tracker</name>

<value>localhost:54311</value>
</property>

5. hdfs-site.xml
The /usr/local/hadoop/etc/hadoop/hdfs-site.xml file needs to be configured for each host in the cluster that is being
used.
It is used to specify the directories which will be used as the namenode and the datanode on that host.

$ sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

add code between <configuration>...</configuration>.

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>

6. yarn-site.xml

$ sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

add code between <configuration>...</configuration>.

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

Now, let create the folder where will process the hdfs jobs

$ sudo mkdir -p /usr/local/hadoop_tmp


$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/namenode

$ sudo mkdir -p /usr/local/hadoop_tmp/hdfs/datanode

assign hduser the ownership of the folder

$ sudo chown -R hduser /usr/local/hadoop_tmp

Formatting the HDFS filesystem via the NameNode

hadoop namenode -format command should be executed once before we start using Hadoop.
If this command is executed again after Hadoop has been used, it'll destroy all the data on the Hadoop file
system.

$ hdfs namenode -format

Starting Hadoop

$ start-dfs.sh

dfs : dfs is used for start namenode and datanode

$ start-yarn.sh

yarn : yarn is for start resource manager.

$ jps
jps : It is verify that cluster is properly created or not.

Now, check your hadoop is installed completly or not follow this steps,
1. open your browser
2. write localhost:8088 in your address bar

if you see this type of window then good.

3. then write localhost:50070 in your address bar


if you see this type of window then your hadoop installed successfully.