Академический Документы
Профессиональный Документы
Культура Документы
Multi-Node
Installation
Manual
Venkateshwarlu Ankam
venkat@cloudwick.com
Steps for cloudera hadoop installation on single node
and multi node cluster.
Why Linux?
RHEL4/Fedora/SUSE Linux
Ubuntu
Gentoo
On all the above platforms a 32/64 bit Hadoop native library will work with a
respective 32/64 bit jvm. Install any of the Linux flavor given below .
Step2:
Designing the cluster network.
Create two virtual machines - a Master and a slave using one of the Linux image.
While creating the VM's, provide hadoop as user name. Once the VMs are ready,
click on VM from file menu and choose settings. Select Network Adaptor and
choose Bridged instead of NAT. Do this both on master and slave VMs.
Step3
Change your Hostname without Rebooting in Linux
Make sure you are logged in as root and move to /etc/sysconfig and open the
network file in vi.
Look for the HOSTNAME line and replace it with the new hostname you want to
use. Change the localhost.localdomain to the required hostname. Here I
am going to change it as master.
When you are done, save your changes and exit vi.
Next edit the /etc/hosts file and set the new hostname.
In hosts, edit the line that has the old hostname and replace it with your new one.
if it is not there add the ip address and the new hostname by the end of the
existing data.
Replace the localhost or whatever may be the hostname associated with the Host
IP to the desired one as follows.
Save your changes and exit vi. The changes to /etc/hosts and
/etc/sysconfig/network are necessary to make your changes persistent
(in the event of an unscheduled reboot).
Now we use the hostname program to change the hostname that is currently set.
And run it again without any parameters to see if the hostname changed.
[root@localhost ~]# hostname
master
Finally we will restart the network to apply the changes we made to
/etc/hosts and /etc/sysconfig/network.
To verify the hostname has been fully changed, exit from terminal and you should
see your new hostname being used at the login prompt and after you've logged
back in.
Do the same steps on slave machine as well.
If we are not updating the hosts file those systems will not communicate each
other with the hostname. It will give some error like this.
iptables requires elevated privileges to operate and must be executed by user root, otherwise it fails to
function. On most Linux systems, iptables is installed as /usr/sbin/iptables and documented in its man
[2]
page, which can be opened using man iptables when installed. It may also be found in /sbin/iptables, but
since iptables is more like a service rather than an "essential binary", the preferred location remains
/usr/sbin.)
TCP Wrapper(TCP Wrapper is a host-based networking ACL system, used to filter network access to
Internet Protocol servers on (Unix-like) operating systems such as Linux or BSD. It allows host or
subnetwork IP addresses, names and/or ident query replies, to be used as tokens on which to filter for
access control purposes).
You can also use setenforce command as shown below to disable SELinux.
Possible parameters to setenforce commands are: Enforcing , Permissive, 1
(enable) or 0 (disable).
# setenforce 0
Following are the possible values for the SELINUX variable in the
/etc/selinux/config file
Installation Instructions
This procedure installs the Java Development Kit (JDK) for 64-bit Linux, using a
self-extracting binary file. The JDK download includes the Java SE Runtime
Environment (JRE) – you do not have to download the JRE separately.
jdk-6u<version>-linux-x64.bin
<version>jdk-6u18-linux-x64.bin
You can download to any directory that you can write to.
This bundle can be installed by anyone (not only root users), in any location that
the user can write to. However, only the root user can displace the system version
of the Java platform supplied by Linux.
Here:
[root@localhost local]# ./jdk-6u17-linux-x64.bin
inflating: jdk1.6.0_30/README.html
creating: jdk1.6.0_30/include/
inflating: jdk1.6.0_30/include/jni.h
creating: jdk1.6.0_30/include/linux/
inflating: jdk1.6.0_30/include/linux/jawt_md.h
inflating: jdk1.6.0_30/include/linux/jni_md.h
inflating: jdk1.6.0_30/include/jvmti.h
inflating: jdk1.6.0_30/include/jawt.h
inflating: jdk1.6.0_30/include/jdwpTransport.h
inflating: jdk1.6.0_30/include/classfile_constants.h
inflating: jdk1.6.0_30/COPYRIGHT
Creating jdk1.6.0_30/jre/lib/rt.jar
Creating jdk1.6.0_30/jre/lib/jsse.jar
Creating jdk1.6.0_30/jre/lib/charsets.jar
Creating jdk1.6.0_30/lib/tools.jar
Creating jdk1.6.0_30/jre/lib/ext/localedata.jar
Creating jdk1.6.0_30/jre/lib/plugin.jar
Creating jdk1.6.0_30/jre/lib/javaws.jar
Creating jdk1.6.0_30/jre/lib/deploy.jar
Done.
The binary code license is displayed, and you are prompted to agree to its terms.
The Java Development Kit files are installed in a directory called jdk.6.0_<version>
in the current directory. the directory stricture is as follows
Note about Root Access: Installing the software automatically creates a directory
called jre1.6.0_<version> . Note that if you choose to install the Java SE
Runtime Environment into system-wide location such as /usr/local, you must
first become root to gain the necessary permissions. If you do not have root
access, simply install the Java SE Runtime Environment into your home directory,
or a subdirectory that you have permission to write to.
Note about Overwriting Files: If you install the software in a directory that
contains a subdirectory named jre1.6.0_ <version> , the new software overwrites
files of the same name in that jre1.6.0_ <version> directory. Please be careful to
rename the old directory if it contains files you would like to keep.
Note about System Preferences: By default, the installation script configures the
system such that the backing store for system preferences is created inside the
JDK's installation directory. If the JDK is installed on a network-mounted drive, it
and the system preferences can be exported for sharing with Java runtime
environments on other machines.
A. ~/.bash_profile is a startup script which generally runs once. This particular file
is used for commands which run when the normal user logs in. Common uses for
.bash_profile are to set environment variables such as PATH, JAVA_HOME, to
create aliases for shell commands, and to set the default permissions for newly
created files.
Click the following link to go to the installation package. otherwise copy this link
and open it in the browser.
https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation
click on Continue Anyway to initiate the installation of the package. Click on install to start the
installation. Now it will Download and install the package
In master
[root@master /]# sudo gpasswd -a hdfs hadoop
[root@master /]# sudo gpasswd -a mapred hadoop
In slave
[root@slave /]# sudo gpasswd -a hdfs hadoop
[root@slave /]# sudo gpasswd -a mapred hadoop
conf/*-site.xml
Note: As of Hadoop 0.20.0, the configuration settings previously found in hadoop-
site.xml were moved to core-site.xml (hadoop.tmp.dir, fs.default.name),
mapred-site.xml (mapred.job.tracker) and hdfs-site.xml (dfs.replication).
In this section, we will configure the directory where Hadoop will store its data
files, the network ports it listens to, etc.
<property>
<name>fs.default.name</name>
<value>hdfs://master:8020</value>
</property>
Create the tmp directory in some location and give the proper ownership and
permissions. Then map the location in the core-site.xml file. This will be the
temporary folder in which the data to the hdfs will first saved temporarily.
[root@master /]# mkdir /usr/lib/hadoop/tmp
[root@master hadoop]# chmod 750 tmp/
[root@master hadoop]# chown hdfs:hadoop tmp/
In file conf/hdfs-site.xml:
<!-- In: conf/hdfs-site.xml -->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/storage/name </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/storage/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
In file conf/mapred-site.xml:
[root@slave conf]$ gedit mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>slave:8021</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/home/hadoop/mapred/system</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.temp.dir</name>
<value>/home/hadoop/mapred/temp</value>
</property>
For core-site.xml
[root@master /]# rsync -avrt /usr/lib/hadoop/conf/core-site.xml
root@<slave IP add>:/usr/lib/hadoop/conf/core-site.xml
The authenticity of host '192.168.208.216 (192.168.208.216)' can't be
established.
RSA key fingerprint is 6c:2a:38:e6:b3:e0:0c:00:88:56:55:df:f6:b9:a3:68.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.208.216' (RSA) to the list of known
hosts.
root@192.168.208.216's password:
sending incremental file list
core-site.xml
For hdfs-site.xml
[root@master /]# rsync -avrt /usr/link/hadoop/conf/hdfs-site.xml
root@<slave IP add>:/usr/lib/hadoop/conf/hdfs-site.xml
root@192.168.208.216's password:
sending incremental file list
hdfs-site.xml
For mapred-site.xml
[root@master rsync -avrt /usr/link/hadoop/conf/mapre-site.xml
/]#
root@<slave IP add>:/usr/lib/hadoop/conf/mapred-site.xml
root@192.168.208.216's password:
sending incremental file list
mapred-site.xml
sent 341 bytes received 37 bytes 84.00 bytes/sec
total size is 258 speedup is 0.68
[root@master conf]#
In Slave
[root@slave /]# export HADOOP_NAMENODE_USER=hdfs
[root@slave /]# export HADOOP_DATANODE_USER=hdfs
[root@slave /]# export HADOOP_JOBTACKER_USER=mapred
[root@slave /]# export HADOOP_TASKTRACKER_USER=mapred
Step8
Formatting the HDFS file system via the NameNode
Now move to the bin directory of hadoop and now we are about to
start formatting our hdfs filesystem via namenode
Now we will start the demons one by one in the appropriate nodes if
we need multinode cluster. Otherwise start all demons in the single
node.
Here live node is 1 because we have enabled all the daemons in single
node.
For Multi-Node Cluster:
First we will start the
namenode/secondarynamenode/jobtracker/datanode/tasktracker on
the master and datanode/tasktracker on salve node.
Namenode Daemon:
Secondarynamenode Daemon:
Tasktracker Daemon:
Datanode Daemon:
Or if you want the full details of the ports follow the command:
Now open the browser and in the address bar master:50070. It should give the
proper output with 2 data node running.
Let us check the status of the running nodes. click on the live nodes we will get
the following details.