Вы находитесь на странице: 1из 10

Installation Guide

1. Hadoop
2. Hive
3. Pig
4. MongoDB
5. Cassandra

Hadoop Cluster Setup:


1. Install Java / JDK (Master Node and Slave Nodes)
The primary requirement for running Hadoop on any system is Java. So make sure that Java is installed on your system
using the following command.
# java –version

2. Disable Firewall
You are required to disable firewall settings.

3. Map IPaddress and Hostname (On Master Node)


You are required to map IPAddress and Hostname in /etc/hosts
For Example:
IPaddress hostname

4. Configuring SSH

On Master:
4.1. Generate ssh key by issuing the following commands
# ssh-keygen -t rsa

4.2. Copy the ssh key to the localhost.


# ssh-copy-id -i ~/.ssh/id_rsa.pub root@localhost

4.3. Similarly copy the key to the slave machines also.


4.4. Authorize the key by issuing following commands.
# chmod 0600 ~/.ssh/authorized_keys

On Slave:
3.1. Generate ssh key by issuing following commands
# ssh-keygen -t rsa

3.2. Copy the ssh key to the localhost.


# ssh-copy-id -i ~/.ssh/id_rsa.pub root@localhost

3.3. Similarly copy the key to the master machine.

3.4. Authorize the key by issuing following commands.


# chmod 0600 ~/.ssh/authorized_keys

4. Download Hadoop
You can download the latest version of Hadoop from the below Apache site.
http://mirror.sdunix.com/apache/hadoop/common/
Then, you can untar it using following command,
# tar -xzf hadoop-2.6.0-tar.gz
5. Configuring Hadoop
Set the environment variable that is used by Hadoop. You can set the environment variable by editing ~/.bashrc file and
append the following values at the end of the file.

export HADOOP_HOME=/root/Desktop/VMDATA/Hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64
export PATH=${PATH}:${JAVA_HOME}/bin

6. Edit Configuration File


You are required to edit core-site.xml, hdfs-site.xml, mapred-site.xml and yarn-site.xml as shown below in the master
node as well as the slave nodes. The configuration files remains the same for all nodes.

core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://masterhostname:9000</value>
</property>
</configuration>
hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/root/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/root/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>masterhostname:8031</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>masterhostname:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>masterhostname:8032</value>
</property>
</configuration>

Slaves
Specify the slave nodes hostname.
7. Format Namenode
Now, you can format the Namenode by issuing the following command.
# hdfs Namenode –format

8. Start Hadoop Cluster


Finally, you can start the cluster by issuing following commands
# start-dfs.sh // to start hdfs daemons
#start-yarn.sh // to start yarn daemons

9. Go to browser and type the following address to check the cluster setup
http://masterhostname:50070
Hive Installation

1. Download Hive
You can download latest version Hive from the below Apache site.
http://www.apache.org/dyn/closer.cgi/hive/
Then, you can untar it using following command.
# tar -xzf apache-hive-0.14-bin.tar.gz

2. Configure MySQL on the same machine

2.1. Start the MySQL server it is not stared using below command.
service mysqld start

2.2. Create user


CREATE USER 'hiveuser'@'hostname' IDENTIFIED BY '1234';
GRANT ALL PRIVILEGES ON * . * TO 'hiveuser'@'hostname';
FLUSH PRIVILEGES;

3. Configure hive-site.xml as shown below.


<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hostname/metastore_db?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server </description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>1234</value>
<description>password for connecting to mysql server </description>
</property>
</configuration>

4. Add MySQL JDBC driver in the lib folder of hive.

5. Now, you can start hive using hive command.

Note: Install it on server


Pig Installation

1. Download Pig
You can download latest version Pig from the below Apache site.
http://www.apache.org/dyn/closer.cgi/pig
Then, you can untar it using following command.
# tar -xzf pig-0.14-bin.tar.gz

2. Now you can work with pig by invoking grunt shell.


MongoDB Installation

1. Download MongoDB

You can download MongoDB from the below link.

https://www.mongodb.org/downloads#development

Extract the zipped folder and place it in D:\

2. Create data/db folder in D:\

3. Go to the command prompt, navigate to the mongodb bin folder and type the below command to start the
server.

mongod.exe
You can minimize this folder

4. Open another command prompt, navigate to the mongodb bin folder and type the below command to start the
client.

mongo.exe
Cassandra Installation

1. Download Cassandra

You can download Cassandra from the below link.

http://cassandra.apache.org/

Extract the zipped folder

You can find Cassandra windows batch file inside the extracted folder.

Click on Cassandra windows batch file to start the Cassandra Server.

2. Configuring CQLSH:

2.1. Start the Cassandra Server.

2.2. Open the command prompt. Navigate to the Cassandra bin folder.

2.3. Set the path for python as shown below.

set path= C:\Python27

2.4. Start the CQLSH by typing python cqlsh in the command prompt.

Оценить