Вы находитесь на странице: 1из 34

IBM Security Systems

Introduction to Big Data

July, 2013

1
1 © 2012 IBM Corporation
IBM Security Systems

Simple to start

What is the maximum file size you have dealt so far?


– Movies/Files/Streaming video that you have used?
– What have you observed?

What is the maximum download speed you get?


Simple computation
– How much time to just transfer.

2
2 © 2012 IBM Corporation
IBM Security Systems

What is big data?

 “Every day, we create 2.5 quintillion bytes of data —


so much that 90% of the data in the world today has
been created in the last two years alone. This data
comes from everywhere: sensors used to gather
climate information, posts to social media sites,
digital pictures and videos, purchase transaction
records, and cell phone GPS signals to name a few.

This data is “big data.”

3
3 © 2012 IBM Corporation
IBM Security Systems

Huge amount of data

 There are huge volumes of data in the world:


+From the beginning of recorded time until 2003,
+ We created 5 billion gigabytes (exabytes) of data.

+In 2011, the same amount was created every two days
+In 2013, the same amount of data is created every 10 minutes.

4
4 © 2012 IBM Corporation
IBM Security Systems

5
5 © 2012 IBM Corporation
IBM Security Systems

Big data spans three dimensions: Volume, Velocity and Variety


 Volume: Enterprises are awash with ever-growing data of all types, easily amassing
terabytes—even petabytes—of information.
– Turn 12 terabytes of Tweets created each day into improved product sentiment
analysis
– Convert 350 billion annual meter readings to better predict power consumption
 Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching
fraud, big data must be used as it streams into your enterprise in order to maximize its
value.
– Scrutinize 5 million trade events created each day to identify potential fraud
– Analyze 500 million daily call detail records in real-time to predict customer churn
faster
– The latest I have heard is 10 nano seconds delay is too much.
 Variety: Big data is any type of data - structured and unstructured data such as text,
sensor data, audio, video, click streams, log files and more. New insights are found when
analyzing these data types together.
– Monitor 100’s of live video feeds from surveillance cameras to target points of
interest
6 – Exploit the 80% data growth in images, video and documents to improve customer
6 © 2012 IBM Corporation
satisfaction
IBM Security Systems

7
7 © 2012 IBM Corporation
IBM Security Systems

Finally….

`Big- Data’ is similar to ‘Small-data’ but bigger

.. But having data bigger it requires different approaches:

Techniques, tools, architecture


… with an aim to solve new problems

Or old problems in a better way

8
8 © 2012 IBM Corporation
IBM Security Systems

Big Data Analytics

 Examining large amount of data

 Appropriate information

 Identification of hidden patterns, unknown correlations

 Competitive advantage

 Better business decisions: strategic and operational

 Effective marketing, customer satisfaction, increased revenue

9
9 © 2012 IBM Corporation
IBM Security Systems

Whom does it matter


 Research Community 
 Business Community - New tools, new capabilities, new infrastructure, new business
models etc.,
 On sectors

Financial Services..

10
10 © 2012 IBM Corporation
IBM Security Systems

How are revenues looking like….

11
11 © 2012 IBM Corporation
IBM Security Systems

The Social Layer in an Instrumented Interconnected World


4.6
30 billion RFID billion
tags today
camera
12+ TBs (1.3B in 2005)
phones
of tweet data world
every day wide

100s of
millions
of GPS
data every day
? TBs of

enabled
devices
sold
annually

25+ TBs of 2+
log data billion
every day people
on the
76 million smart Web by
meters in 2009… end 2011
12 200M by 2014
12 © 2012 IBM Corporation
IBM Security Systems

13
13 © 2012 IBM Corporation
IBM Security Systems

What does Big Data trigger?

 From “Big Data and the Web: Algorithms for Data Intensive Scalable Computing”, Ph.D Thesis, Gianmarco

14
14 © 2012 IBM Corporation
IBM Security Systems

Applications for Big Data Analytics


Multi-channel
Smarter Healthcare sales Finance Log Analysis

Homeland Security Traffic Control Telecom Search Quality

Manufacturing Trading Analytics Fraud and Risk Retail: Churn, NBO

15
15 © 2012 IBM Corporation
IBM Security Systems

Healthcare

 80% of medical data is unstructured and is clinically relevant

 Data resides in multiple places like individual EMRs, lab and imaging systems, physician
notes, medical correspondence, claims etc

 Leveraging Big Data


– Build sustainable healthcare systems
– Collaborate to improve care and outcomes
– Increase access to healthcare

16
16 © 2012 IBM Corporation
Market Size
IBM Security Systems

By 2015 4.4 million IT jobs in Big Data ; 1.9 million is in US itself

17 Source: Wikibon Taming Big Data


17 © 2012 IBM Corporation
Potential Talent Pool -Big Data
IBM Security Systems

India will require a minimum of 1 lakh data scientists in the next couple of years in
addition to data analysts and data managers to support the Big Data space.
18
18 © 2012 IBM Corporation
IBM Security Systems

19
19 © 2012 IBM Corporation
IBM Security Systems

20
20 © 2012 IBM Corporation
IBM Security Systems

BIG DATA is not just HADOOP

Understand and navigate


Federated Discovery and Navigation
federated big data sources

Manage & store huge Hadoop File System


volume of any data MapReduce

Structure and control data Data Warehousing

Manage streaming data Stream Computing

Analyze unstructured data Text Analytics Engine

Integrate and govern all Integration, Data Quality, Security,


data sources Lifecycle Management, MDM
21
21 © 2012 IBM Corporation
IBM Security Systems

Types of tools typically used in Big Data Scenario

 Where is the processing hosted?


–Distributed server/cloud
 Where data is stored?
–Distributed Storage (eg: Amazon s3)
 Where is the programming model?
–Distributed processing (Map Reduce)
 How data is stored and indexed?
–High performance schema free database
 What operations are performed on the data?
–Analytic/Semantic Processing (Eg. RDF/OWL)
22
22 © 2012 IBM Corporation
IBM Security Systems

When dealing with Big Data is hard

 When the operations on data are complex:


–Eg. Simple counting is not a complex problem.
–Modeling and reasoning with data of different kinds can get
extremely complex
 Good news with big-data:
–Often, because of the vast amount of data, modeling techniques
can get simpler (e.g., smart counting can replace complex
model-based analytics)…
–…as long as we deal with the scale.

23
23 © 2012 IBM Corporation
IBM Security Systems

Time for thinking

 What do you do with the data.


– Lets take an example:

• “From application developers to video streamers, organizations of all sizes face the
challenge of capturing, searching, analyzing, and leveraging as much as terabytes of
data per second—too much for the constraints of traditional system capabilities and
database management tools.”

24
24 © 2012 IBM Corporation
IBM Security Systems

Why Big-Data?

 Key enablers for the appearance and growth of ‘Big-Data’ are:

+Increase in storage capabilities


+Increase in processing power
+Availability of data

25
25 © 2012 IBM Corporation
IBM Security Systems

Casos de Uso de BigData

26
26 © 2012 IBM Corporation
IBM Security Systems

Casos de Uso de BigData

27
27 © 2012 IBM Corporation
IBM Security Systems

Casos de uso de BigData

28
28 © 2012 IBM Corporation
IBM Security Systems

Casos de Uso de BigData

29
29 © 2012 IBM Corporation
IBM Security Systems

Casos de uso de BigData

30
30 © 2012 IBM Corporation
IBM Security Systems

Future of Big Data

• $15 billion on software firms only specializing in data management and analytics. This
industry on its own is worth more than $100 billion and growing at almost 10% a year
which is roughly twice as fast as the software business as a whole.
• In February 2012, the open source analyst firm Wikibon released the first market
forecast for Big Data , listing $5.1B revenue in 2012 with growth to $53.4B in 2017
• The McKinsey Global Institute estimates that data volume is growing 40% per year,
and will grow 44x between 2009 and 2020.

31
31 © 2012 IBM Corporation
IBM Security Systems

Big Data Analytics Technologies

NoSQL : non-relational or at least non-SQL database solutions such as


HBase (also a part of the Hadoop ecosystem), Cassandra, MongoDB, Riak,
CouchDB, and many others.

Hadoop: It is an ecosystem of software packages, including MapReduce,


HDFS, and a whole host of other software packages

32
32 © 2012 IBM Corporation
IBM Security Systems

Thank you 

33
33 © 2012 IBM Corporation
IBM Security Systems

34
34 © 2012 IBM Corporation

Вам также может понравиться