Академический Документы
Профессиональный Документы
Культура Документы
Source: Wikipedia
Sources of
data
30 billion
12+ TBs RFID tags
of tweet data today 4.6
every day (1.3B in billion
2005) camera
phones
world
wide
data every day
100s of
? TBs of
millions of
GPS
enabled
devices
sold
25+ TBs of annually
log data
every day 2+
billion
people
on the
76 million smart
Web
meters in 2009
by end
200M by 2014
2011
What makes Data Big
Characteristics Description Attributes Drivers
Volume The amount of data generated or Exabyte (EB) Increase in data sources
intensify that must be ingested, Zettabyte (ZB) Higher resolution sensors
analyzed and managed to make Yottabyte (YB) Scalable infrastructure
decision based on complete data
analysis
Velocity How fast the data is being Batch Improved throughput connectivity
produced and changed and the Near real time Competitive advantage
speed at which is transformed into Real time and Streams Pre-computed information
insight Rapid feedback loop
Interactive
Speed Business Big Data:
Scale
Intelligence & Real Time &
Single View
BI Reporting
OLAP & In-memory
Graph Databases
Data warehouse RDBMS Speed
QlikView, Tableau,HANA
Scale
Business Objects,
SAS, Informatica,
Cognos other SQL
Reporting Tools
Big Data:
1990s 2000s Batch Processing &
Distributed Data Store
Hadoop/Spark;
HBase/Cassandra/MongoDB
2010s
Solving business problem with big
data
Formulation of big data strategy
People; 31%
Tools; 33%
Social Banking
Media Finance
Our
Known
Gaming
History
Purchase
Entertain
Entertain
Customer
Real-Time Analytics/Decision
Requirement
Product
Recommendations Friend Invitations
that are Relevant to join a
& Compelling Game or Activity
that expands
business
Influence
Behavior
Improving the
Marketing Customer
Effectiveness of a
Promotion while it Learning why Customers
is still in Play Preventing Fraud Switch to competitors
as it is Occurring and their offers; in
& preventing more time to Counter
proactively
IoT+Big Data = IoE(Internet-of-
Everything)
Role of Big Data in M2M/IoT
Big Data is a factor that will, to a large extent, determine the
future growth rate in the M2M industry
M2M will connect increasingly more nodes that will provide
data from endpoints.
Data will be more granular, more frequent, and more
accurate, with bigger data sets or even live data streams
Large volume of endpoint connections IPv4 addressing
scheme cant accommodate everything(sensors, smart
phones, smart factories, smart grids, smart vehicles,
controllers, meters ) that it requires IPv6
IoE= Convergence of IoT, Big Data Analytics ,Cloud
Computing and other technologies is collectively called as
Internet of Everything
Challenges of Big Data in M2M/IoT
Batch processing
- Gathering of data and processing as a group at one time.
- Jobs run to completion
- Data might be out of date
Real-time processing
- Processing of data that takes place as the information is
being entered.
- Run for ever
Storm
Stream Processing
Fast
Scalable
Fault Tolerant
Reliable
Tuple
Streams
Spouts
Bolts
Topologies
Reliable Processing
Reliable Processing
Stream Grouping
Groupings are used to decide to which task in the
subscribing bolt (group) a tuple is sent.
Possible Groupings:
- Shuffle
- Fields
- All
- Global
- None
- Direct
- Local or Shuffle
Storm Cluster View
Fault Tolerance
Fault Tolerance
Fault Tolerance
Fault Tolerance
Fault Tolerance
Parallelism
Parallelism
Apache Storm Real-time -Use cases
Segment Prevent Use Cases Optimize Use Cases