Вы находитесь на странице: 1из 13

Introduction to

Administration
SpringPeople Software Private Limited, All Rights Reserved.

What is Hadoop?
Hadoop is a free, Java-based programming
framework that supports the processing of
large data sets in a distributed computing
environment. It is part of the Apache
project sponsored by the Apache Software
Foundation.

SpringPeople Software Private Limited, All Rights Reserved.

What is HDFS?
The Hadoop Distributed File System (HDFS) is a distributed file
system designed to run on commodity hardware. It has many
similarities with existing distributed file systems. However, the
differences from other distributed file systems are significant. HDFS
is highly fault-tolerant and is designed to be deployed on low-cost
hardware. HDFS provides high throughput access to application
data and is suitable for applications that have large data sets. HDFS
relaxes a few POSIX requirements to enable streaming access to file
system data. HDFS was originally built as infrastructure for the
Apache Nutch web search engine project. HDFS is now an Apache
Hadoop subproject.

SpringPeople Software Private Limited, All Rights Reserved.

HDFS Architecture

SpringPeople Software Private Limited, All Rights Reserved.

SpringPeople Software Private Limited, All Rights Reserved.

SpringPeople Software Private Limited, All Rights Reserved.

What is Hadoop Cluster?


A Hadoop cluster is a special type of computational cluster
designed specifically for storing and analyzing huge amounts
of unstructured data in a distributed computing environment.
Such clusters run Hadoop's open source distributed
processing software on low-cost commodity computers.
Hadoop clusters are known for boosting the speed of data
analysis applications. They also are highly scalable.
Hadoop clusters also are highly resistant to failure because
each piece of data is copied onto other cluster nodes, which
ensures that the data is not lost if one node fails.

SpringPeople Software Private Limited, All Rights Reserved.

SpringPeople Software Private Limited, All Rights Reserved.

What is MapReduce?
Hadoop MapReduce (Hadoop Map/Reduce) is a
software framework for distributed processing of
large data sets on compute clusters of commodity
hardware. It is a sub-project of the
Apache Hadoop project. The framework takes
care of scheduling tasks, monitoring them and reexecuting any failed tasks.

SpringPeople Software Private Limited, All Rights Reserved.

Apache Hadoop NextGen MapReduce (YARN)


MapReduce has undergone a complete overhaul in hadoop0.23 and we now have, what we call, MapReduce 2.0 (MRv2)
or YARN.
Apache Hadoop YARN is a sub-project of Hadoop at the
Apache Software Foundation introduced in Hadoop 2.0 that
separates the resource management and processing
components. YARN was born of a need to enable a broader
array of interaction patterns for data stored in HDFS beyond
MapReduce. The YARN-based architecture of Hadoop 2.0
provides a more general processing platform that is not
constrained to MapReduce.

SpringPeople Software Private Limited, All Rights Reserved.

How you can master Hadoop


Administration?
Become an expert in 2 days.
World class Hadoop Administration training by the
industry experts.

More Details

SpringPeople Software Private Limited, All Rights Reserved.

Suggested Audience & Other Details


Prerequisites: Basic knowledge of unix and system
administration. Prior knowledge of Hadoop is not required.
Suggested Audience:
Developers
Architects
Duration 2 Days

Syllabus

SpringPeople Software Private Limited, All Rights Reserved.

For further info/assistance contact


training@springpeople.com
+91 80 656 79700
www.springpeople.com

Our Partners

SpringPeople Software Private Limited, All Rights Reserved.

Вам также может понравиться