Вы находитесь на странице: 1из 39

Demystifying Big Data

Dan McClary, PhD


Principal Product Manager
1 Copyright 2012, Oracle and/or its affiliates. All rights
Big Data and Hadoop
reserved.
Introduction

Oracle MoviePlex is an on-line movie


streaming company
Like many other on-line stores, they needed
a cost effective approach to tackle their big
data challenges
They recently implemented Oracles Big
Data Platform to better manage their
business, identify key opportunities and
enhance customer satisfaction

2 Copyright 2012, Oracle and/or its affiliates. All rights * Movie data provided by IMDb. Links to movie images provided by TMDb
reserved.
Big Data Challenge

Applications are generating massive


volumes of unstructured data that
describe user behavior and application
performance
Today, most companies are unable to
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}
{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}
fully capitalize on this potentially valuable
{"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}
{"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8}
information due to cost and complexity
{"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9}

How do you capitalize on this raw data to


{"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7}
{"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7}
{"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}
{"custId":1067283,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:55","recommended":null,"activity":9}
gain better insights into your customers,
{"custId":1377537,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:58","recommended":null,"activity":9}
{"custId":1347836,"movieId":null,"genreId":null,"time":"2012-07-01:00:02:03","recommended":null,"activity":8}
{"custId":1137285,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:39","recommended":null,"activity":8}
{"custId":1354924,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:51","recommended":null,"activity":9}
enhance their user experience and
{"custId":1036191,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:55","recommended":null,"activity":8}
{"custId":1143971,"movieId":1017161,"genreId":44,"time":"2012-07-
01:00:04:00","recommended":"Y","activity":7}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:04:03","recommended":"Y","activity":5}
ultimately improve profitability?
{"custId":1273464,"movieId":null,"genreId":null,"time":"2012-07-01:00:04:39","recommended":null,"activity":9}
{"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}

3 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Big Data Challenge

Applications are generating massive


volumes of unstructured data that
describe user behavior and application
performance
Today, most companies are unable to
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}
{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}
fully capitalize on this potentially valuable
{"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}
{"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8}
information due to cost and complexity
{"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9}

How do you capitalize on this raw data to


{"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7}
{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9}

How can you get answers to.


{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7}
{"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7}
{"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}
{"custId":1067283,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:55","recommended":null,"activity":9}
gain better insights into your customers,
{"custId":1377537,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:58","recommended":null,"activity":9}
{"custId":1347836,"movieId":null,"genreId":null,"time":"2012-07-01:00:02:03","recommended":null,"activity":8}
{"custId":1137285,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:39","recommended":null,"activity":8}
{"custId":1354924,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:51","recommended":null,"activity":9}
enhance their user experience and
{"custId":1036191,"movieId":null,"genreId":null,"time":"2012-07-01:00:03:55","recommended":null,"activity":8}
{"custId":1143971,"movieId":1017161,"genreId":44,"time":"2012-07-
01:00:04:00","recommended":"Y","activity":7}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:04:03","recommended":"Y","activity":5}
ultimately improve profitability?
{"custId":1273464,"movieId":null,"genreId":null,"time":"2012-07-01:00:04:39","recommended":null,"activity":9}
{"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}

4 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Derive Value from Big Data

How do we
Make the right movie offers at the right time?
Better understand the viewing trends of our various customer
segments?
Optimize our marketing spend by targeting customers with optimal
promotional offers?
Minimize infrastructure spend by understanding bandwidth usage
over time?
Prepare to answer questions that we havent thought of yet!

5 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracles Big Data Platform

Organize & Visualize &


Stream Acquire Discover Analyze Decide

6 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
MoviePlex Architecture
Log of all activity on
site
Application Log
Endeca Information Oracle Business
Discovery Intelligence EE
Customer Profile
Capture activity nec. for (e.g. recommended
MoviePlex site movies)
Oracle Exalytics

Streamed into
HDFS using
Flume Clustering/Market Basket
Oracle Advanced
Mood Analytics
Oracle NoSQL DB Recommendations

Load Recommendations Oracle Exadata

Load Session & Activity Data

Oracle Big Data


HDFS Connectors

Map Reduce Map Reduce Map Reduce


ORCH - CF Recs. Pig - Sessionize Hive - Activities

Oracle Big Data Appliance

7 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Demonstration:
Oracle MoviePlex Demo

8 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
What you will see

Message
Application is demonstrating a personalized environment that is
leveraging advanced analytic capabilities
Need low latency - Amazon: every 100ms of latency costs them
1% in sales
Challenge
Massive volumes of unstructured data flowing in
How do you harness it and take advantage of it

9 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracles Big Data Platform

Organize &
Stream Acquire Discover

10 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracles Big Data Platform Acquire
Two Sets of Characteristics

Batch-Oriented Real-Time

Process data to use Deliver a service

Bulk storage Fast access to specific record

Write once, read all Read, write, delete update

11 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracles Big Data Platform Acquire
Two Sets of Characteristics
Hadoop Distributed File System Oracle NoSQL Database
(HDFS)
File System Database

Parallel scanning Indexed storage

No inherent structure Simple data structure

High volume writes High volume random reads and


writes

12 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracle NoSQL DB

Stores all key interactions required


to drive application. For example:
User profile
Movie listings
Ratings
NoSQL Driver Position within paused movie
Read, Update
Scalable, low-latency retrieval &
update processing
Oracle NoSQL DB
Big Data Appliance

13 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Hadoop Distributed File System

Stores all user activity that will be


processed for analytics and
reporting. For example:
Recommendation generation
activity.out
Marketing analysis
Flume
Operational reporting
Write

Streamed into HDFS using Flume


Contains all history to support
HDFS future/unanticipated requirements
Big Data Appliance

14 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Activity Log File

Example Log:
{"custId":1046915,"movieId":null,"genreId":null,"time":"2012-07-01:00:33:18","recommended":null,"activity":9}
{"custId":1144051,"movieId":768,"genreId":9,"time":"2012-07-01:00:33:39","recommended":"N","activity":6}
{"custId":1264225,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:01","recommended":null,"activity":8}
{"custId":1085645,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:18","recommended":null,"activity":8}
{"custId":1098368,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:28","recommended":null,"activity":8}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:35:09","recommended":"Y","activity":11,"price":3.99}
{"custId":1156900,"movieId":20352,"genreId":14,"time":"2012-07-01:00:35:12","recommended":"N","activity":7}
HDFS How do you turn this into.
{"custId":1336404,"movieId":null,"genreId":null,"time":"2012-07-01:00:35:27","recommended":null,"activity":9}
{"custId":1022288,"movieId":null,"genreId":null,"time":"2012-07-01:00:35:38","recommended":null,"activity":8}
{"custId":1129727,"movieId":1105903,"genreId":11,"time":"2012-07-01:00:36:08","recommended":"N","activity":1,"rating":3}
{"custId":1305981,"movieId":null,"genreId":null,"time":"2012-07-01:00:36:27","recommended":null,"activity":8}

JSON format
Standard method for data serialization
Captures a user activity (or click) and information about that activity
Example: Customer 1234 started watching Iron Man 2 on 2012-Nov-05. It
was recommended. Paid 3.99. Genre is Adventure

15 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracle Business Intelligence
Understand & Optimize
Marketing
Highest rented movies, sliced by Genre, Actors, Directors, Years, etc
Demographics of viewers
Time of movie viewing, sliced by started, paused, fully watched
Finance
Total movies watched per time period (day, time-blocks, week)
Total cost of movies (ie, royalties)
Total cost of infrastructure (storage, bandwidth) broken by
demographic or subscriber class
Network Operations
Total storage needed for movies
Total bandwidth used & average per movie, broken down by time
blocks, days of week
Number of simultaneous streams per movie

16 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Organize Steps

1. Apply structure to log output


BDA/Hadoop
2. Filter, transform and load logs
1 Hive External table
into staging area
3. Load results into Oracle 2 Hive Staging table
Database 11g
ODCH OLH

Exadata
3 Cust

Oracle External Fact Movie


Table Table

17 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Demonstration:
Acquire & Organize (Part 1)

18 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
What you will see

Message
BDA provides all of the key capabilities to capture and structure
huge volumes of unstructured data that is generated by
applications
Data will be streamed into HDFS using Flume
Show Flume configuration
Show how data has landed in HDFS
Show how structure is applied to that data
Filter and transform that data - load into staging table

19 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracle Direct Connector for HDFS
Direct Access from Oracle Database

HDFS Oracle Database


SQL Query
SQL access to HDFS
External
Table External table view

Data query or import


DCH
DCH
HDFS
Infini
Band DCH
Client

20 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracle Loader for Hadoop
Use The Cluster
ORACLE LOADER FOR HADOOP
MAP
REDUCE
MAP Last stage in MapReduce
MAP
SHUFFLE REDUCE workflow
/SORT

Partitioned and non-


MAP REDUCE partitioned tables
MAP REDUCE
SHUFFLE
MAP /SORT REDUCE
Online and offline loads

21 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Demonstration:
Organize (Part 2)

22 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
What you will see

Message
Connectors provide simple, fast data throughput from BDA into
Exadata
SQL Developer
Show the external table - simple query
Combine that data with other data in database - join to movie

23 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracles Big Data Platform

Organize & Visualize &


Stream Acquire Discover Analyze Decide

24 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Advanced Analytics

Analytics operationalized to optimize the end user


experience
Utilize power of in-database analytics for ad hoc analysis

25 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracle R Connector for Hadoop
Native R Access to Hadoop

Client Host Oracle Big Data


Appliance
R Engine R Engine
ORE
ORCH
Native R MapReduce
ORCH

Hadoop MapReduce Native R HDFS access


Cluster Nodes
Software
HDFS

26 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Use BDA to Generate Recommendations

Activity logs
ORCH executes R-based
collaborative filtering on BDA
R Engine Find users with similar interests
Movie Recommendations
ORCH Recommends movies based on
interest groups selections

Genre/Movie Rankings
Results fed into NoSQL DB key-
value store
Oracle NoSQL DB
Big Data Appliance

27 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracle In-Database Analytics Platform

Oracle R Spatial
Enterprise Analytics

Oracle Text and SQL


Data Mining Search Analytics

Parallel Processing Engine

XML Relational OLAP Spatial


Data Layer RDF Media

28 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracle Advanced Analytics
Oracle Data Mining
Utilize clustering analysis to
determine movie recommendation
based on current mood
Use text mining to derive themes from
movie plot summary
Combine themes with cast and crew to
yield recommendations
Called at-run time by Oracle
MoviePlex application

29 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Close the Loop
Targeted Recommendations for Users

Activity logs

R Engine
ORCH
Movie Recommendations

Oracle NoSQL DB
Big Data Appliance

30 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracle Advanced Analytics
Oracle R Enterprise

Models run in-database


Processes large data sets
Uses the power of Oracle
Database 11g and Exadata
Same code, much faster

31 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Demonstration:
Advanced Analytics

32 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
What you will see

Message
Analytics is an iterative process. As new data arrives, you will be
constantly updating your models based on the most recent info
Advanced Analytics is a core capability of Oracle Database 11g - and this
integration is key
Reduce latency
Results are saved in the DB - making it easily accessible to *any* application or
process. E.g. update recommendation models or use for ad hoc analysis
R-Studio
Define an association model and utilize R visualizations
Save results to a table in the DB (to be used by Endeca)

33 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Business Intelligence and Information Discovery
More Powerful Together

Analysis Problems Discovery Problems


Measure, Analyze, Report Investigate, Explore, Understand

New questions Optimized


require
exploration, for Exalytics
Structured Data new information;
Modeled and Oracle Business Intelligence In-Memory
Leverage existing
Proven Answers to Known Questions Machine
conforming investments

Oracle Endeca
Unstructured Data Information Discovery
Insights yield new
Diverse, textual, metrics to monitor, Fast Answers to New Questions
uncertain quality data to integrate

34 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Constructing the Logical Model for OBIEE

Database is
accessed by the
And exposed as Semantic Layer
Subject Areas for
Analysis

35 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Analysis & Reporting via OBIEE Answers &
Dashboards Ad-hoc analyses can
then be included in
standard Dashboards

Subject Areas
are then
available for
ad-hoc analysis

36 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Demonstration:
Visualize & Decide

37 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Oracles Big Data Platform

DECIDE STREAM

ACQUIRE

VISUALIZE

ORGANIZE

ANALYZE
DISCOVER

38 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.
Big Data Platform Summary
Big Data for the Enterprise

Optimized and Complete


Everything you need to store and organize big data
Integrated with Oracles Engineered Systems
Analyze all your data
Easy to Deploy
Risk Free, Quick Installation and Setup
Single Vendor Support
Full Oracle support for the entire system and software
set

39 Copyright 2012, Oracle and/or its affiliates. All rights


reserved.

Вам также может понравиться