Академический Документы
Профессиональный Документы
Культура Документы
HADOOP 101
Purpose
After completing this module, you will be able to:
1. Discuss the evolution of Data Platforms and why Hadoop was created
2. Discuss the purpose, functionality, and value of Hadoop
3. Describe the various Hadoop components
4. Discuss some of the most common use cases for Hadoop
HADOOP 101
Agenda
1. What is Hadoop
2. The Evolution of Data Platforms
3. How Hadoop Is Being Used Today
4. Resources and Key Takeaways
Part 1:
What Is Hadoop?
What is Hadoop?
Framework that allows for distributed processing of large
data sets across clusters of commodity servers
Store large amount of data
Process the large amount of data stored
What is Hadoop?
Two Core Components
HDFS
MapReduce
Scalable storage in
Hadoop Distributed
File System
Part 2:
The Evolution of Data
Platforms
How it all began
Legacy
EDW
New Data
Streams
New Delivery
Platforms
New
Deployment
Models &
Languages
Expanding
Data Volumes
Greater Cost
Pressures
Increasing
Customer
Expectations
10
11
12
13
14
15
Cloudera was the first commercial vendor to enter the space in 2008 founded by early Hadoop technologists
who later brought in Doug Cutting to be head architect.
Other commercial vendors of Hadoop software include Hortonworks (2011 out of Yahoo!), IBM (Infosphere
BigInsights), MapR (2009) and DataStax (2010 Hadoop and Cassandra).
Organizations that are evaluating Hadoop are typically also looking at other NoSQL
databases such as Cassandra, Mongo DB, Couch DB, Amazon EMR, and others.
They might also be evaluating scale-out file systems to use for storage such as Isilon
or Gluster FS. These systems mirror Hadoops scale-out architecture and are also
capable of handling the volume and unstructured nature of data that could be stored in
Hadoop.
16
17
Part 3:
How Hadoop Is Being
Used Today
18
Web 2.0
Telecom
Healthcare
Risk
Modeling/Managem
ent
Portfolio Analysis
Investment
Predictions
Fraud Detection
Compliance Check
Customer Profiling
Social Media
Analytics
ETL
Network analysis
based on
transactions
Product
Recommendation
Engine
Search Engine
Indexing (Search
Assist)
Content Optimization
Advertising
Optimization
Customer Churn
Analysis
POS Transaction
Analysis
Data Warehousing
Network Graph
Analysis
Call Detail Record
(CDR) Analysis
Network Optimization
Service Optimization
& Log Processing
User Behavior
Analysis
Customer Churn
Prediction
Machine-generated
data centralization
(logs from firewalls,
towers, switches,
servers, etc..)
Electronic Medical
Record Analysis
Claims Fraud
Detection
Drug Safety
Analysis
Personalize
Medicine
Healthcare Service
Optimization
Drug Development
Healthcare
Information
Exchange
Medical Image
Processing
19
Objectives
Reduce EDW Total Cost of Ownership
Enable longer data retention to enable analytics and accelerate time
to market
Migrate ETL off EDW to free up compute resources
20
21
i.e. -- if direct deposits stop coming into checking acct, its likely that customer lost his/her job, which
impacts creditworthiness for other products (CC, mortgage, etc)
Data existing in silos across mutliple LOBs and acquired bank systems
Data size approached 1 petabyte
22
Why Hadoop
Social media/web data is unstructured
Amount of data is immense
New data sources arise weekly
23
Part 4:
Resources and Key
Takeaways
24
Online Resources
Hadoop Basics: Hadoop Basics from EMC
Pivotal Hadoop online resources: Greenplum Nation Pivotal HD
Selling Guide - Hadoop Selling Guide
Webcast
Hadoop Spotlight Webinar
Video: HAWQ and Pivotal HD Video
Internal Webcast: Internal Webcast
25
Contact Resources
Advanced Technology Sales
Nick Cayou ncayou@gopivotal.com
Ian Andrews iandrews@gopivotal.com
Product Management
SK Krishnamurthy - SK.Krishnamurthy@emc.com
26
2
727
Thank You
Please note:
It may take up to 24 hours for your
transcript to be updated and reflect that you
have successfully completed this course.
If after 24 hours your transcript is not
updated, please send an e-mail to
joneill@gopivotal.com describing your
issue.
Please include the following:
BADGE NUMBER
ROLE
MODULE TITLE
ISSUE
28
Click Here to
Provide Feedback
29