Hadoop Nishant Gandhi.

C K Pithawalla College of Engineering & Technology,Surat
Hadoop
Introduction & Setup
Prepared By, Nishant M Gandhi.
Certified network Manager by Nettech. Diploma in Cyber Security(persuing).
What is
Hadoop is a framework for running applications on large clusters built of commodity hardware.
HADOOP WIKI Open source, Java Googles MapReduce inspired Yahoos Hadoop. Now part of Apache group
Hadoop Architecture on DELL C Series Server
Hadoop Software Stack

Hadoop Common: The common utilities Hadoop Distributed File System (HDFS): A distributed file system Hadoop Map Reduce: Distributed processing on compute clusters.
Other Hadoop-related projects: Avro: A data serialization system. Cassandra: A scalable multi-master database Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper:co-ordination services
Who Uses Hadoop?
Click to edit Master text styles econd level Third level Fourth level Fifth level
The Yahoo! Search Webmap is a Hadoop application that runs on more than 10,000 core Linux cluster and produces data that is now used in every Yahoo! Web search query. On February 19, 2008, Yahoo! Inc. launched what it claimed was the world's largest Hadoop production application
Who Uses Hadoop?

The Datawarehouse Hadoop cluster at Facebook

ck to edit Master text styles 21 PB of storage in a single HDFS cluster cond level 2000 machines Third level 12 TB per machine (a few machines have 24 TB each) Fourth level 1200 machines with 8 cores each + 800 machines with 16 cores each Fifth level 32 GB of RAM per machine

15 map-reduce tasks per machine That's a total of more than 21 PB of configured storage capacity! This is larger than the previously known Yahoo!'s cluster of 14 PB. Here are the cluster statistics from the HDFS cluster at Facebook:
Who Uses Hadoop?

Creater of MapReduce Runs Hadoop for NSResearch cluster HDFS is inspired by GFS
Who Uses Hadoop?

Other Hadoop Users:
IBM NEW YORK TIMES Twitter Veoh Amazon Apple eBay AOL Hewlett-Packard Joost
Map Reduce
Programming model developed at Google Sort/merge based distributed computing Initially, it was intended for their internal search/indexing application, but now used extensively by more organizations (e.g., Yahoo, Amazon.com, IBM, etc.) It is functional style programming (e.g., LISP) that is naturally parallelizable across a large cluster of workstations or PCS. The underlying system takes care of the partitioning of the input data, scheduling the programs execution across several machines, handling machine failures, and managing required intermachine communication. (This is the key for Hadoops success)
Map Reduce
Hadoop Distributed File System (HDFS)

At Google MapReduce operation are run on a special file system called Google File System (GFS) that is highly optimized for this purpose. GFS is not open source. Doug Cutting and others at Yahoo! reverse engineered the GFS and called it Hadoop Distributed File System (HDFS).
Goals of HDFS
Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle hardware failure Detect failures and recovers from them Optimized for Batch Processing Data locations exposed so that computations can move to where data resides Provides very high aggregate bandwidth User Space, runs on heterogeneous OS
DFShell
The HDFS shell is invoked by: bin/hadoop dfs <args>

cat chgrp chmod chown copyFromLocal copyToLocal cp du dus
expunge get getmerge ls lsr mkdir movefromLocal mv touchz
put rm rmr setrep stat tail test text
Hadoop Single Node Setup

Step 1:
Download hadoop from http://hadoop.apache.org/mapreduce/releases.html Step 2: Untar the hadoop file: tar xvfz hadoop-0.20.2.tar.gz

Step 3:
Set the path to java compiler by editing JAVA_HOME Parameter in hadoop/conf/hadoop--env.sh

Step 4:
Create an RSA key to be used by hadoop when sshing to localhost: ssh-keygen -t rsa -P cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Step 5:
Do the following changes to the configuration files under hadoop/conf
core--site.xml:
<configuration> <property> <name>hadoop.tmp.dir</name> <value>TEMPORARY-DIR-FOR-HADOOPDATASTORE</ value> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> </property> </configuration>

mapred--site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> </property> </configuration>
hdfs--site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>

Step 6:
Format the hadoop file system. From hadoop directory run the following:
bin/hadoop namenode -format
Using Hadoop
1)How to start Hadoop? cd hadoop/bin ./start-all.sh 2)How to stop Hadoop? cd hadoop/bin ./stop-all.sh 3)How to copy file from local to HDFS? cd hadoop bin/hadoop dfs put local_machine_path hdfs_path 4)How to list files in HDFS? cd hadoop bin/hadoop dfs -ls
Thank You..

Hadoop Nishant Gandhi.

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hadoop Nishant Gandhi.

Загружено:

Авторское право:

Доступные форматы

C K Pithawalla College of Engineering & Technology,Surat

Hadoop Architecture on DELL C Series Server

Hadoop Software Stack

Who Uses Hadoop?

Who Uses Hadoop?

Who Uses Hadoop?

Who Uses Hadoop?

Hadoop Distributed File System (HDFS)

cat chgrp chmod chown copyFromLocal copyToLocal cp du dus

expunge get getmerge ls lsr mkdir movefromLocal mv touchz

put rm rmr setrep stat tail test text

Hadoop Single Node Setup

Hadoop Single Node Setup

Hadoop Single Node Setup

Hadoop Single Node Setup

Hadoop Single Node Setup

Hadoop Single Node Setup

Вам также может понравиться