Академический Документы
Профессиональный Документы
Культура Документы
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
10,000
5,000
2005
Source:
IDC
2011
2015
2010
STRUCTURED
DATA
UNSTRUCTURED DATA
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Current Solu@ons
10,000
10%
2005
2015
2010
STRUCTURED
DATA
UNSTRUCTURED DATA
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Schema,
no
schema
High
volume,
low
volume
Leverages
commodity
hardware
to
mi@gate
costs
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
2002
Publishes
MapReduce
and
GFS
Paper
Open
Source
MapReduce
and
HDFS
project
created
by
Doug
Cuang
Runs
4,000-node
Hadoop
cluster
2007
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
2012
HDFS
HDFS breaks incoming les into blocks and stores them redundantly across the cluster.
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
framework
1
2
3
4
5
MR
Processes large jobs in parallel across many nodes and combines the results.
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
You need
Scalability
of
Storage/Compute
Complex
Data
Processing
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
10
Interactive
database
Data export
OLAP load
Oracle,
SAP...
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
11
Interactive
database
New
data
Hadoop
Business
intelligence apps
Oracle,
SAP...
Recommendations, etc...
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
12
Why Cloudera?
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
13
Cloudera
is
in Customers and Users
in
banking,
telecommunica@ons,
mobile
services,
defense
&
intelligence,
media
and
retail
depend
on
Cloudera
all
other
Hadoop
systems
combined
than
for
developers,
administrators
and
managers
in Integrated Partners
across
hardware,
pla`orms,
database
and
business
intelligence
(BI)
for
hardware,
pla`orms,
sokware
and
services
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
14
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
15
STORAGE
COMPUTATION
ACCESS
INTEGRATION
DEPLOYMENT
CONFIGURATION
MONITORING
DIAGNOSTICS AND
REPORTING
ISSUE
RESOLUTION
ESCALATION
PROCESSES
OPTIMIZATION
KNOWLEDGE
BASE
Produc*on
Support
Our
team
of
experts
on
call
to
help
you
meet
your
Service
Level
Agreements
(SLAs)
Partner Ecosystem
Cloudera University
Professional Services
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
16
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
17
2.
3.
Recommenda*on engine
4.
5.
6.
Threat analysis
7.
Search quality
8.
Data sandbox
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
18
19
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
20
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
21
22
23
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
24
7.
Search
Quality
Challenge:
Providing
real
*me
meaningful
search
results
Solu*on
with
Hadoop:
Analyzing
search
aiempts
in
conjunc*on
with
structured
data
Paiern
recogni*on
Browsing
pa=ern
of
users
performing
searches
in
dierent
categories
Typical
Industry:
Web,
Ecommerce
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
25
8.
Data
Sandbox
Challenge:
Data
Deluge
Dont
know
what
to
do
with
the
data
or
what
analysis
to
run
Solu*on
with
Hadoop:
Dump
all
this
data
into
an
HDFS
cluster
Use
Hadoop
to
start
trying
out
dierent
analysis
on
the
data
See
paierns
to
derive
value
from
data
Typical
Industry:
Common
across
all
industries
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
26
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
27
Before
Hadoop
Orbitzs
data
warehouse
contains
a
full
archive
of
all
transac*ons
Every
booking,
refund,
cancella@on
etc.
Non-transac*onal
data
was
thrown
away
because
it
was
uneconomical
to
store
Non-transactional Data
(e.g., Searches)
Transactional Data
(e.g., Bookings)
Data Warehouse
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
28
Aker
Hadoop
Hadoop
was
deployed
late
2009/early
2010
to
begin
collec*ng
this
non-
transac*onal
data
Orbitz
has
been
using
CDH
for
that
en@re
period
with
great
success.
Much
of
this
non-transac*onal
data
is
contained
in
Web
analy*cs
logs
Non-transactional Data
(e.g., Searches)
Transactional Data
(e.g., Bookings)
Hadoop
Data Warehouse
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
29
What
Now?
Access
to
this
non-transac*onal
data
enables
a
number
of
applica*ons
Op@mizing
hotel
search
E.g.,
op@mize
hotel
ranking
and
show
consumers
hotels
more
closely
matching
their
preferences
User
specic
product
Recommenda@ons
Web
page
performance
tracking
Analyses
to
op@mize
search
result
cache
performance
User
segments
analysis,
which
can
drive
personaliza@on
Lots
of
press
coverage
in
June
2012:
company
discovered
that
people
using
Macs
are
willing
to
spend
30%
more
on
hotels
that
PC
users
Mac
users
are
now
presented
with
pricier
hotels
rst
in
the
list
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
30
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
31
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
32
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
33
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
34
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
35
Ne`lix
Before
Hadoop
Nightly
processing
of
logs
Imported
into
a
database
Analysis/BI
As
data
volume
grew,
it
took
more
than
24
hours
to
process
and
load
a
days
worth
of
logs
Today,
an
hourly
Hadoop
job
processes
logs
for
quicker
availability
to
the
data
for
analysis/BI
Currently
inges*ng
approximately
1TB
of
data
per
day
04-36
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
Copyright @ 2011 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
36
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
37
Hadoop Jobs
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
38
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
39
System
Administrators
Required
skills:
Strong
Linux
administra@on
skills
Networking
knowledge
Understanding
of
hardware
Job
responsibili*es
Install,
congure
and
upgrade
Hadoop
sokware
Manage
hardware
components
Monitor
the
cluster
Integrate
with
other
systems
(e.g.,
Flume
and
Sqoop)
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
40
Developers
Required
skills:
Strong
Java
or
scrip@ng
capabili@es
Understanding
of
MapReduce
and
algorithms
Job
responsibili*es:
Write,
package
and
deploy
MapReduce
programs
Op@mize
MapReduce
jobs
and
Hive/Pig
programs
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
41
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
42
Data
Steward
Required
skills:
Data
modeling
and
ETL
Scrip@ng
skills
Job
responsibili*es:
Cataloging
the
data
(analogous
to
a
librarian
for
books)
Manage
data
lifecycle,
reten@on
Data
quality
control
with
SLAs
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
43
Combining
Roles
System
Administrator
+
Steward
analogous
to
DBA
Required
skills:
Data
modeling
and
ETL
Scrip@ng
skills
Strong
Linux
administra@on
skills
Job
responsibili*es:
Manage
data
lifecycle,
reten@on
Data
quality
control
with
SLAs
Install,
congure
and
upgrade
Hadoop
sokware
Manage
hardware
components
Monitor
the
cluster
Integrate
with
other
systems
(e.g.,
Flume
and
Sqoop)
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
44
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
45
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
46
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
47
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
48
Copyright 2010-2012 Cloudera. All rights reserved. Not to be reproduced without prior wri=en consent.
49
Ian
Wrigley
ian@cloudera.com
50
Copyright
2010-2012
Cloudera.
All
rights
reserved.
Not
to
be
reproduced
without
prior
wri=en
consent.