Академический Документы
Профессиональный Документы
Культура Документы
2 Copyright 2012, Oracle and/or its affiliates. All rights * Movie data provided by IMDb. Links to movie images provided by TMDb
reserved.
Big Data Challenge
How do we
Make the right movie offers at the right time?
Better understand the viewing trends of our various customer
segments?
Optimize our marketing spend by targeting customers with optimal
promotional offers?
Minimize infrastructure spend by understanding bandwidth usage
over time?
Prepare to answer questions that we havent thought of yet!
Streamed into
HDFS using
Flume Clustering/Market Basket
Oracle Advanced
Mood Analytics
Oracle NoSQL DB Recommendations
Message
Application is demonstrating a personalized environment that is
leveraging advanced analytic capabilities
Need low latency - Amazon: every 100ms of latency costs them
1% in sales
Challenge
Massive volumes of unstructured data flowing in
How do you harness it and take advantage of it
Organize &
Stream Acquire Discover
Batch-Oriented Real-Time
Example Log:
{"custId":1046915,"movieId":null,"genreId":null,"time":"2012-07-01:00:33:18","recommended":null,"activity":9}
{"custId":1144051,"movieId":768,"genreId":9,"time":"2012-07-01:00:33:39","recommended":"N","activity":6}
{"custId":1264225,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:01","recommended":null,"activity":8}
{"custId":1085645,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:18","recommended":null,"activity":8}
{"custId":1098368,"movieId":null,"genreId":null,"time":"2012-07-01:00:34:28","recommended":null,"activity":8}
{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:35:09","recommended":"Y","activity":11,"price":3.99}
{"custId":1156900,"movieId":20352,"genreId":14,"time":"2012-07-01:00:35:12","recommended":"N","activity":7}
HDFS How do you turn this into.
{"custId":1336404,"movieId":null,"genreId":null,"time":"2012-07-01:00:35:27","recommended":null,"activity":9}
{"custId":1022288,"movieId":null,"genreId":null,"time":"2012-07-01:00:35:38","recommended":null,"activity":8}
{"custId":1129727,"movieId":1105903,"genreId":11,"time":"2012-07-01:00:36:08","recommended":"N","activity":1,"rating":3}
{"custId":1305981,"movieId":null,"genreId":null,"time":"2012-07-01:00:36:27","recommended":null,"activity":8}
JSON format
Standard method for data serialization
Captures a user activity (or click) and information about that activity
Example: Customer 1234 started watching Iron Man 2 on 2012-Nov-05. It
was recommended. Paid 3.99. Genre is Adventure
Exadata
3 Cust
Message
BDA provides all of the key capabilities to capture and structure
huge volumes of unstructured data that is generated by
applications
Data will be streamed into HDFS using Flume
Show Flume configuration
Show how data has landed in HDFS
Show how structure is applied to that data
Filter and transform that data - load into staging table
Message
Connectors provide simple, fast data throughput from BDA into
Exadata
SQL Developer
Show the external table - simple query
Combine that data with other data in database - join to movie
Activity logs
ORCH executes R-based
collaborative filtering on BDA
R Engine Find users with similar interests
Movie Recommendations
ORCH Recommends movies based on
interest groups selections
Genre/Movie Rankings
Results fed into NoSQL DB key-
value store
Oracle NoSQL DB
Big Data Appliance
Oracle R Spatial
Enterprise Analytics
Activity logs
R Engine
ORCH
Movie Recommendations
Oracle NoSQL DB
Big Data Appliance
Message
Analytics is an iterative process. As new data arrives, you will be
constantly updating your models based on the most recent info
Advanced Analytics is a core capability of Oracle Database 11g - and this
integration is key
Reduce latency
Results are saved in the DB - making it easily accessible to *any* application or
process. E.g. update recommendation models or use for ad hoc analysis
R-Studio
Define an association model and utilize R visualizations
Save results to a table in the DB (to be used by Endeca)
Oracle Endeca
Unstructured Data Information Discovery
Insights yield new
Diverse, textual, metrics to monitor, Fast Answers to New Questions
uncertain quality data to integrate
Database is
accessed by the
And exposed as Semantic Layer
Subject Areas for
Analysis
Subject Areas
are then
available for
ad-hoc analysis
DECIDE STREAM
ACQUIRE
VISUALIZE
ORGANIZE
ANALYZE
DISCOVER