Data Mining With Bigdata

Data mining With Big Data
Presented By:
Sandip B. Tipayle Patil
Under the Guidance of
Prof. Y.N.Patil
DEPARTMENT OF COMPUTER ENGINEERING
DR. BABASAHEB AMBEDKAR TECHNOLOGICAL UNIVERSITY
Lonere.
Outlines
Introduction
What is Big Data?
How Much Data really Exist?
Literature Review
4Vs of Big Data
Proposed System
System Architecture
Big Data mining Framework
Hadoop Framework
Big Data Challenges and solution
Conclusion
Introduction
Interesting Facts
The volume of business data worldwide, across all companies, doubles every
1.2 years (was 1.5 years)
Daily 2500 quadrillion of data are produced and more than 90 percentage of
data are produced within past two years.
A regular person is processing daily more data than a 16th century individual in
his entire life
In the last years cost of storage and processing power dropped significantly
Bad data or poor data quality costs US businesses $600 billion annually
Facebook processes 10 TB of data every day / Twitter 7 TB
Google has over 3 million servers processing over 2 trillion searches per year
in 2012 (only 22 million in 2000)
What is
Big Data is the frontier of a firm's

ability to store, process, and access
(SPA) all the data it needs to operate
effectively, make decisions, reduce
risks, and serve customers.
-- Forrester
Bo
Big Data is the frontier of a firm's

ability to store, process, and access
(SPA) all the data it needs to operate
effectively, make decisions, reduce
risks, and serve customers.
rin
g!
-- Forrester
Big data is the data characterized by 3

attributes: volume, variety and
velocity.
-- IBM
R
a
ndcharacterized
Big data is the data
by 3
o
m
attributes: volume, variety and
words
velocity.
-- IBM
Big Data is not about the size of the data,

its about the value within the data.
What is ?
Data Mining
computational process of discovering patterns in

large data sets
Big Data
The term Big data is used to describe a massive
volume of both structured and unstructured data
that is so large that it's difficult to process using
traditional database and software techniques.
Big Data is similar to small data, but bigger
but having data bigger it requires different approaches:

Techniques, tools and architecture
with an aim to solve new problems

or old problems in a better way
How much Data does exist?
2.5 quintillion bytes of data are created EVERY DAY
IBM: 90 percent of the data in the world today were produced

with past two years
Forms of Data????
Examples : Boing Jet, Scientific Data, Sensor Data, Internet

Data,
Literature Review
Data has grown tremendously.
This large amount of data is beyond the software tools to

manage.
Exploring the large volume of data and extracting useful

information and knowledge is a challenge, and sometimes, it is
almost infeasible.
Most people dont know what to do with all data that they
already have
Giant Elephant
Huge Data with heterogeneous and diverse dimensionality
Autonomous sources with distributed and decentralized control
represent huge volume of data
main characteristics of Big Data
Complex and evolving relationships
4 Vs of Big Data
Volume
Velocity
Variety
Veracity
Data quantity
Data Speed
Data Types
Authenticity
Proposed System:
Identify relationships between different idea
Capable of handling Huge volume of Data
Uses distributed parallel computing with help of Hadoop
Provides platform for process data in different dimensions and summarized

results.
system architecture is to be flexible enough that the components built on top

of it for expressing the various kinds of processing tasks can tune it to
efficiently run these different workloads.
System will process these data within reasonable cost and time limits.
Gap due to Lack of analysis
System Architecture:
Hadoop framework :
Big Data Mining framework
Big Data Mining Platform
Dig Data Semantics and Application Knowledge
I.
Information Sharing and Data Privacy
II.
Domain and Application Knowledge
Big Data Mining Algorithm

I.
Local Learning and Model Fusion for Multiple

Information Sources
II.
mining from Sparse, Uncertain, and Incomplete Data
III.
Mining Complex and Dynamic Data
Big Data mining Framework
Challenges
Location
of Big Data sources- Commonly Big Data are

stored in different locations
Volume
of the Big Data- size of the Big Data grows

continuously.
Hardware
PrivacyHaving
resources- RAM capacity
Medical reports, bank transactions
domain knowledge
Getting
meaningful information
Solutions
Parallel
computing programming
An
efficient platform for computing will not have

centralized data storage instead of that platform
will be distributed in big scale storage.
Restricting
access to the data
Advantages:
Fast response
Extract useful information
Prediction of required data from large amount of data.
Savour of better results in the form of visualization.
Conclusion
We have entered an era of Big Data. Through better analysis of the large
volumes of data that are becoming available, there is the potential for
making faster advances in many scientific and improving the profitability and
success of many enterprises by using technologies like hadoop ,pig and so on.
Proposed system will fully serviceable across a large variety of application

domains, and therefore not cost-effective to address in the context of one
domain alone.
Furthermore, this system will provide fully transformative solutions, and will
be address naturally for the next generation of industrial applications. We
must support and encourage this proposed framework towards addressing
these technical challenges of unstructured data, if we are to achieve the
promised benefits of Big Data.

Data Mining With Bigdata

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Mining With Bigdata

Загружено:

Авторское право:

Доступные форматы

Data mining With Big Data

What is Big Data?

How Much Data really Exist?

4Vs of Big Data

Big Data mining Framework

Big Data Challenges and solution

Facebook processes 10 TB of data every day / Twitter 7 TB

Big Data is the frontier of a firm's

Big Data is the frontier of a firm's

Big data is the data characterized by 3

Big Data is not about the size of the data,

computational process of discovering patterns in

Big Data is similar to small data, but bigger

but having data bigger it requires different approaches:

with an aim to solve new problems

How much Data does exist?

2.5 quintillion bytes of data are created EVERY DAY

IBM: 90 percent of the data in the world today were produced

Examples : Boing Jet, Scientific Data, Sensor Data, Internet

Data has grown tremendously.

This large amount of data is beyond the software tools to

Exploring the large volume of data and extracting useful

Huge Data with heterogeneous and diverse dimensionality

Autonomous sources with distributed and decentralized control

represent huge volume of data

main characteristics of Big Data

Complex and evolving relationships

Identify relationships between different idea

Capable of handling Huge volume of Data

Uses distributed parallel computing with help of Hadoop

Provides platform for process data in different dimensions and summarized

system architecture is to be flexible enough that the components built on top

Gap due to Lack of analysis

Big Data Mining framework

Big Data Mining Platform

Dig Data Semantics and Application Knowledge

Information Sharing and Data Privacy

Domain and Application Knowledge

Big Data Mining Algorithm

Local Learning and Model Fusion for Multiple

mining from Sparse, Uncertain, and Incomplete Data

Mining Complex and Dynamic Data

Big Data mining Framework

of Big Data sources- Commonly Big Data are

of the Big Data- size of the Big Data grows

resources- RAM capacity

Medical reports, bank transactions

efficient platform for computing will not have

access to the data

Extract useful information

Prediction of required data from large amount of data.

Savour of better results in the form of visualization.

Proposed system will fully serviceable across a large variety of application

Вам также может понравиться