Вы находитесь на странице: 1из 26

BIG DATA ANALYTICS

Reference Material
Agend
a

Tips for
Big Data
Big Data Case Designin
Reference
Challenge Architectur Studi g Big
s es es Data
Solutions

2
Big Data
Challenges
UNSTRUCTURED

STRUCTURED

HIGH

MEDIUM

LOW

Archives Docs Business Media Social Public Data Machine Sensor


Apps Networks Web Storages Log Data Data

Complexity Velocity Variety Volum


e

Archives Media Data Storages


Scanned documents, statements, Images, video, audio RDBMS, NoSQL, Hadoop, file systems
medical records, e-mails etc.. etc. etc.

Docs Social Networks Machine Log Data


XLS, PDF, CSV, HTML, JSON Twitter, Facebook, Google+, Application logs, event logs, server
etc. LinkedIn etc. data, CDRs, clickstream data etc.

Business Apps Public Web Sensor Data


CRM, ERP systems, HR, Wikipedia, news, weather, public Smart electric meters, medical
project management etc. finance etc devices, car sensors, road cameras
etc.

3
Big Data
Analytics

Traditional Analytics (BI) vs Big Data Analytics

Focus on • Descriptive analytics • Predictive analytics


• Diagnosis analytics • Data Science

Data Sets • Limited data sets • Large scale data sets


• Cleansed data • More types of data
• Simple models • Raw data
• Complex data models
Supports Causation: what happened, Correlation: new
and why? insight More accurate
answers

4
Big Data Analytics Use
Cases Low
Latency
Reliability
Real Time
Intelligenc
e Consumer Intelligent
s Agents

Volume Data
Performanc Data Business Quality
e Discover Reportin Self Service
y g
Data Scientists/ Business
Analysts Users

5
Big Data Analytics Reference
Architectures

Architecture Drivers: Reference Architectures:


▪ Volume ▪ Extended Relational
▪ Sources ▪ Non-Relational
▪ Throughput ▪ Hybrid
▪ Latency
▪ Extensibility
▪ Data Quality
▪ Reliability
▪ Security
▪ Self-Service
▪ Cost

6
Relational Reference
Architecture
Data Integration Data Analytics Presentation
Sources Storages

Data Query & Web


Structured ET Warehouse Reportin Browser
L s g s

Semi- Native
Structure Messaging Data OLAP Deskto
d Marts Cubes p

Operationa Advance Mobile


Unstructured API/ODB l Data d
C Stores Analytics Device
s

Replication Web
Services

7
Extended Relational
Reference
Architecture
Data Integration Data Analytics Presentation
Sources Storages

Data Query & Web


Structured ET Warehouse Reportin Browser
L s g s

Semi- Native
Structure Messaging Data OLAP Deskto
d Marts Cubes p

Operationa Advance Mobile


Unstructured API/ODB l Data d
C Stores Analytics Device
s

Replication Web
Services

Key components affected with Big Data 8


challenges
Non-Relational Reference
Architecture
Data Integration Data Analytics Presentation
Sources Storages

NoSQL Query & Web


Structured ET Reportin Browser
Databases
L g s

Semi- Distributed File Native


Structure Messaging Map Deskto
Systems Reduce
d p

Mobile
Unstructured AP Search
I Engines Device
s

Advance
d Web
Analytics Services

Key components introduced with non-relational 10


movement
Extended Relational vs. Non-Relational
Architecture
Extended
Architecture Drivers Non‐Relational
Relation
al
Large data volume

Self‐service (ad‐hoc reporting)

Unstructured data processing

High data model extensibility

High data quality and consistency

Extensive security

Reliability and fault‐tolerance

Low latency (near‐real time)

Low cost

Skills availability

10
Extended Relational vs. Non-Relational
Architecture
Extended
Architecture Drivers Non‐Relational
Relation
al
Large data volume

Self‐service (ad‐hoc reporting)

Unstructured data processing

High data model extensibility

High data quality and consistency

Extensive security

Reliability and fault‐tolerance

Low latency (near‐real time)

Low cost

Skills availability

11
Extended Relational vs. Non-Relational
Architecture
Extended
Architecture Drivers Non‐Relational
Relation
al
Large data volume

Self‐service (ad‐hoc reporting)

Unstructured data processing

High data model extensibility

High data quality and consistency

Extensive security

Reliability and fault‐tolerance

Low latency (near‐real time)

Low cost

Skills availability

12
Relational vs. Non-Relational
Architecture
Relational Non-
Relational

• Rational • Agile
• Predictabl • Flexible
e • Moder
• Traditional n
13
Big Data Analytics Use
Cases

Real Time
Intelligenc
e Consumer Intelligent
s Agents

Performanc
e Data Business
Volume Discover Reportin
y g
Data Business
Scientists Users

14
Data Discovery: Non-Relational
Architecture
Data Integration Data Analytics Presentation
Sources Storages

NoSQL Query & Web


Structured ET Reportin Browser
Databases
L g s

Semi- Distributed File Native


Structure Messaging Map Deskto
Systems Reduce
d p

Mobile
Unstructured AP Search
I Engines Device
s

Advance
d Web
Analytics Services

15
Big Data Analytics Use
Cases

Real Time
Intelligenc
e Consumer Intelligent
s Agents

Data
Data Business Quality
Discover Reportin Self Service
y g
Data Business
Scientists Users

16
Business Reporting: Hybrid
Architecture
Data Integration Data Analytics Presentation
Sources Storages

Relational SQL Query Web


Structured ET & Browser
L Reporting s
DWH/DM

Semi- Distributed File Native


Structure Messagin Map Deskto
g Systems Reduce
d p

Mobile
Unstructured AP Search
I Engines Device
s

Advance
d Web
Analytics Services

Extended Relational Non-relational 18


components components
Big Data Analytics Use
Cases Low
Latency
Reliability
Real Time
Intelligenc
e Consumer Intelligent
s Agents

Data Business
Discover Reportin
y g
Data Business
Scientists Users

18
Lambda
Architecture

Source:

19
Case Study #1: Usage & Billing
Analysis
Business
Goals:
Provide visual environment for building
Business Area:
custom mobile application Cloud based platform for building,
Charge customers based on the platform deploying, hosting and managing of mobile
they are using, number of consumers’ applications
applications etc.

20
Architectural
Decisions
Architecture Drivers:

▪ Volume (> 10 TB) ▪ Reliability (24/7)


▪ Sources (Semi-structured - ▪ Security (Multitenancy)
JSON) ▪ Self-Service (Ad-Hoc
▪ Throughput (> 10K/sec) reports)
▪ Latency (2 min) ▪ Cost (The less the better
▪ Extensibility (Custom metrics) )
▪ Data Quality (Consistency) ▪ Constraints (Public Cloud)

Trade-off: Extended
Non-Relational
Relationa
l  Extended Relational
Extensibility ‐ + Architecture
Data Quality + ‐  Extensibility via Pre‐
Self-Service + ‐ allocated Fields pattern

21
Technologies:
Solution • Amazon Redshift
Amazon SQS
Architecture

• Amazon S3
• Elastic Beanstalk
• Jaspersoft BI Professional
• Python

22
Case Study #2: Clickstream for retail website
Business Goals:
Build in-house Analytics Platform for ROI measurement Business Area:
and performance analysis of every product and feature
delivered by the e-commerce platform;
Retail. A platform for e-commerce
Provide the ability to understand how end-users are and collecting feedbacks from
interacting with service content,products, and features on
sites;
customers
 Do clickstream analysis;
 Perform A/B Testing

23
Architectural
Decisions
Architecture Drivers:

▪ Volume (45 TB) ▪ Reliability (24/7)


▪ Sources (Semi-structured - ▪ Security (Multitenancy)
JSON) ▪ Self-Service (Canned reports,
▪ Throughput (> 20K/sec) Data science)
▪ Latency (1 hour) ▪ Cost (The less the better  )
▪ Extensibility (Custom tags) ▪ Constraints (Public Cloud)
▪ Data Quality (Not critical)

Trade-off: Extended Non-


Relationa
Relationa l
l  Non‐Relational Architecture
Volume/Scalability +/‐ +  Reporting via Materialized
Throughput + + View
Self-Service + +/‐ pattern
Extensibility ‐ +
24
Technologies:
Solution • Amazon S3
Flume
Architecture

• Hadoop/HDFS, MapReduce
• HBase
• Oozie
• Hive

Node 1

Node 2

Node

25
Tips for Designing Big Data
Solutions
 Understand data users and sources
 Discover architecture drivers
 Select proper reference architecture
 Do trade-off analysis, address cons
 Map reference architecture to technology stack
 Prototype, re-evaluate architecture
 Estimate implementation efforts
 Set up devops practices from the very beginning
 Advance in solution development through “small wins”
 Be ready for changes, big data technologies are
evolving rapidly

26

Вам также может понравиться