Академический Документы
Профессиональный Документы
Культура Документы
About me
BS in Computer Science and Engineering from University of
Connecticut
In the Healthcare Industry for over 19 years
Programmer most of my career - Architect, Designer
Worked in the SOA space for a number of years
Lead engineer in the mobile application space
Now Lead engineer in the Big Data Analytics Space - Hadoop
!
In my spare time
Love to travel with the family
Video games, music, movies
Community relations work
Fan of College basketball
2
The blowback
What we
accomplished
Roadmap to the
future
Lessons Learned
Questions?
Edge'Node'For'Hadoop'Client
Logs
Web
IVR
Portal
Mobile
Realtime Feed
Flume
SQOOP/
Flume
Chronos
NoSQL
Storing
weblogs
Log files to
HDFS
Scalding
Python
SAS
Pentaho
R
?
(Scala)
Cascading
Hive/Impala
(Java)
(SQL)
Analysis/Modeling'Tools
Analysis
Live Data
Streams
Web Analytics
Jobs
Tableau
Spotfire
Platfora
Event detection(Storm)
Real%me'Data'Store'or'event'processing
Data'Science'Tools'
SQOOP
Flume
MapReduce'Distributed'Programming'Framework
*Use Spark
Streaming and
append to Hadoop
output - Realtime
events
HDFS
SQOOP
CED/
Claims
Clinical
Data
RDBMS
Cognos
Microstrategy
?
6
Visualization
filestack
Hadoop'Cluster'running'HDFS'and'MapReduce
Includes'Management,'Monitoring'and'Security
Teradata
RDBMS
External'Hadoop'Output
Or'in'HDFS
Teradata
Hadoop'
Cluster'#3
Hadoop'
Cluster'#2
8
Back'up'or'copy'data'from'HDFS'to'a'
redundant'cluster'for'quick'recovery
*'For'future'implementa%on'TBD
10
11
Use Case 1
12
Use Case 2
13
Success!
Ready to tackle tougher more
complicated problems
!
14
15
& Challenges
16
But Why?
Overuse of the words Big & Data
!
17
Building
!
!
Service
!
!
Being
!
!
Brand
Ops efficiency
Customer Centric
Product
!
!
a Customer Persona
Efficiency
Impact
18
Predictive
!
!
Data
!
!
threat modeling
Archival
Network
Efficiency
19
Hadoop is complementary
20
Hadoop is Complementary
Hadoop excels at processing and analyzing large volumes of
distributed, unstructured, structured and semi-structured
data in batch or near real-time fashion for analysis
!
NoSQL databases are adept at storing and serving up multistructured data in near-real time for web-based applications
!
Massively parallel OLAP databases are best at providing
analysis of large volumes of mainly structured data Teradata
!
SAS/R - Modeling and Business Intelligence
!
Tableau - Visualization
21
22
23
What we accomplished?
Evangelized Hadoop
!
R on Hadoop
!
24
25
What we accomplished?
ETL - Ingest, Transform and Move patterns
!
27
28
29
Lessons Learned
Overuse of the words Big & Data
!
The overlap
!
31
Vendors
!
Vendor Partnerships
32
WWYS
Difficult to see. Always in motion
is the future
Yoda
!
33
Questions?
!
atif71@gmail.com
34