Вы находитесь на странице: 1из 31

Course Topics

 Week 1  Week 5
– Understanding Big Data – Analytics using Hive
– Introduction to HDFS – Understanding HIVE QL

 Week 2  Week 6
– Playing around with Cluster – NoSQL Databases
– Data loading Techniques – Understanding HBASE

 Week 3  Week 7
– Map-Reduce Basics, types and formats – Real world Datasets and Analysis
– Use-cases for Map-Reduce – Hadoop Project Environment

 Week 4  Week 8
– Analytics using Pig – Project Reviews
– Understanding Pig Latin – Planning a career in Big Data
How it works

 Live classes
 Class recordings
 Module wise Quizzes, Coding Assignments
 24x7 on-demand technical support
 Project work on large Datasets
 Online certification exam
 Lifetime access to the Learning Management System

 Complementary Java Classes


What is Big Data?
Facebook Example

Facebook users spend 10.5 billion minutes


(almost 20,000 years) online on the social
network
Facebook has an average of 3.2 billion likes and
comments are posted every day.
Twitter Example
 Twitter has over 500 million registered
users.
 The USA, whose 141.8 million accounts
represents 27.4 percent of all Twitter users,
good enough to finish well ahead of Brazil,
Japan, the UK and Indonesia.
 79% of US Twitter users are more like to
recommend brands they follow
 67% of US Twitter users are more likely to
buy from brands they follow
 57% of all companies that use social media
for business use Twitter
Other Industrial Usecases

• Insurance
• Healthcare
• Retail
– Recommendations
– Groupings
• Genome Sequencing
• Utilities
Hadoop Users

http://wiki.apache.org/hadoop/Po
weredBy
Data volume is growing exponentially

• Estimated Global Data Volume:


– 2011: 1.8 ZB
– 2015: 7.9 ZB
• The world's information doubles every two years
• Over the next 10 years:
– The number of servers worldwide will grow by 10x
– Amount of information managed by enterprise
data centers will grow by 50x
– Number of “files” enterprise data center handle
will grow by 75x

Source: http://www.emc.com/leadership/programs/digital-
universe.htm, which was based on the 2011 IDC Digital
Universe Study
Un-Structured Data is exploding
Why DFS?
Read 1 TB Data

1 Machine 10 Machines
 4 I/O Channels  4 I/O Channels
 Each Channel – 100 MB/s  Each Channel – 100 MB/s
Why DFS?
Read 1 TB Data

1 Machine 10 Machines
 4 I/O Channels  4 I/O Channels
 Each Channel – 100 MB/s  Each Channel – 100 MB/s

45 Minutes
Why DFS?
Read 1 TB Data

1 Machine 10 Machines
 4 I/O Channels  4 I/O Channels
 Each Channel – 100 MB/s  Each Channel – 100 MB/s

45 Minutes 4.5 Minutes


What Is Distributed File System? (DFS)
What is Hadoop?

 Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters
of commodity computers using a simple programming model.

 Companies using Hadoop:


- Yahoo
- Google
- Facebook
- Amazon
- AOL
- IBM
- And many more at
http://wiki.apache.org/hadoop/PoweredBy
Hadoop Eco-System
Hadoop Core Components:

 HDFS – Hadoop Distributed File System (storage)

 MapReduce (processing)
What is HDFS?

HDFS - Hadoop Distributed File System

 Highly fault-tolerant

 High throughput

 Suitable for applications with large data sets

 Streaming access to file system data

 Can be built out of commodity hardware


Main Components Of HDFS:

 NameNode:
 master of the system
 maintains and manages the blocks which are present on the
DataNodes

 DataNodes:
 slaves which are deployed on each machine and provide the actual
storage
 responsible for serving read and write requests for the clients
Secondary NameNode:

metadata

Single Point
NameNode Failure
 Secondary NameNode:
 Not a hot standby for the NameNode You give me
 Connects to NameNode every hour* metadata every
hour, I make it
 Housekeeping, backup of NemeNode metadata
secure
 Saved metadata can build a failed NameNode
Secondary
NameNode

metadata
JobTracker and TaskTracker:
HDFS Architecture
Job Tracker
Job Tracker Contd.
Job Tracker Contd.
Job Tracker Contd.
HDFS Client Creates a New File
Rack Awareness
Anatomy of a File Write:
Anatomy of a File Read:
Thank You
See You in Class Next Week

Вам также может понравиться