Big Data

What is Data Analytics ?
It is a process of transforming processed Information into knowledge
Why do we need it ?
We need It to predict something relevant to our goal, increase
profit and for efficient utilization of our resources
Where do we need it ?
We need it everywhere:
Daily Life: Before buying the grocery, Before filling the fuel
Business : To increase Sales
Computer System : Various algorithms ( LRU, MRU ,
Command Queuing Algorithms)
Have
ever noticed that the product you select or viewed from
AMAZON, SNAPDEAL, MYANTRA.

The Advertisement of the same product is reflect on all the websites and webpages over
which U go?
How Dominos
is giving
Thrusday and may be giving
U a offer a PIZZA free on
U a GARLIC BREAD with
CHEESY DIP
absolutely free?
WHY Supermarket stores

are having
SALES and how they are giving a product free?
IF U HAVE SUCH
QUESTIONS
DIVE IN OCEAN OF
ANALYSIS
IN YOUR
MIND THEN
DATA ANALYTICS (Play with data)
ANALYSIS needs DATA

So needs to understand various types of DATA over
which companies work
1. STRUCTURED DATA
2. SEMI STRUCTURED DATA
3. UNSTRUCTURED
BIG DATA
Structured Data
It concerns all data which can be stored in database SQL in
table with rows and columns.
They have relational key and can be easily mapped into predesigned fields.
Today, those data are the most processed in development and
the simplest way to manage information.
But structured data represent only 5 to 10% of all

informatics data.
Semi Structured Data

Semi-structured data is information that doesnt reside in a
relational database but that does have some organizational
properties that make it easier to analyze.
With some process you can store them in relation database (it
could be very hard for some kind of semi structured data), but
the semi structure exist to ease space, clarity or compute
Examples of semi-structured : CSV , XML documents are semi

structured documents.
But as Structured data, semi structured data represents a few
parts of data (5 to 10%).
Unstructured data
Unstructured data represent around 80% of data.
It often include text and multimedia content.
Examples include e-mail messages, word processing
documents, videos, photos, audio files, presentations,
webpages and many other kinds of business documents.
Unstructured data is everywhere.
In fact, most individuals and organizations conduct their lives
around unstructured data.
Just as with structured data, unstructured data is either
machine generated or human generated.
Unstructured data
Here are some examples of machine-generated unstructured data:

Satellite images: This includes weather data or the data that the government captures in
its satellite surveillance imagery. Just think about Google Earth, and you get the picture.
Scientific data: This includes seismic imagery, atmospheric data, and high energy
physics.
Photographs and video: This includes security, surveillance, and traffic video.
Radar or sonar data: This includes vehicular, meteorological, and oceanographic
seismic profiles.
The following list shows a few examples of human-generated unstructured data:

Social media data: This data is generated from the social media platforms such as
YouTube, Facebook, Twitter, LinkedIn, and Flickr.
Mobile data: This includes data such as text messages and location information.
website content: This comes from any site delivering unstructured content, like
YouTube, Flickr, or Instagram.
BIG DATA
BIG DATA
Big Data may well be the Next Big Thing in the IT world.
Big data burst upon the scene in the first decade of the 21st
century.
The first organizations to embrace it were online and startup

firms. Firms like Google, eBay, LinkedIn, and Facebook were
built around big data from the beginning.
Like many new information technologies, big data can bring
about dramatic cost reductions, substantial improvements in the
time required to perform a computing task, or new product and
service offerings.
What is BIG DATA?

Big Data is similar to small data, but bigger in size
but having data bigger it requires different approaches:
Techniques, tools and architecture
An aim to solve new problems or old problems in a better way
Big Data generates value from the storage and processing of very
large quantities of digital information that cannot be analyzed
with traditional computing techniques.
What is BIG DATA

Walmart handles more than 1 million customer transactions every
hour.
Facebook handles 40 billion photos from its user base.
Decoding the human genome originally took 10years to process; now

it can be achieved in one week.
1st Character of Big Data

Volume
A typical PC might have had 10 gigabytes of storage in 2000.
Today, Facebook ingests 500 terabytes of new data every day.
Boeing 737 will generate 240 terabytes of flight data during a single
flight across the US.
The smart phones, the data they create and consume; sensors
embedded into everyday objects will soon result in billions of new,
constantly-updated data feeds containing environmental, location,
and other information, including video.
2nd Character of Big Data

Velocity
Clickstreams and ad impressions capture user behavior at

millions of events per second
high-frequency stock trading algorithms reflect market

changes within microseconds
machine to machine processes exchange data between billions

of devices
infrastructure and sensors generate massive log data in realtime
on-line gaming systems support millions of concurrent users,

each producing multiple inputs per second.
3rd Character of Big Data

Variety
Big Data isn't just numbers, dates, and strings. Big Data is also
geospatial data, 3D data, audio and video, and unstructured
text, including log files and social media.
Traditional database systems were designed to address smaller
volumes of structured data, fewer updates or a predictable,
consistent data structure.
Big Data analysis includes different types of data
Storing Big Data

Analyzing your data characteristics
Selecting data sources for ANALYSIS

Eliminating redundant data
Establishing the role of NoSQL*
Overview of Big Data stores

Data models: key value, graph, document,
column-family
Hadoop* Distributed File System
HBase
Hive
HADOOP
Hadoop is not a type of database, but rather a software ecosystem that
allows for massively parallel computing.
It is an enabler of certain types NoSQL distributed databases (such as
HBase), which can allow for data to be spread across thousands of
servers with little reduction in performance.
Hadoop is an Apache Software Foundation project that importantly
provides two things:
A distributed filesystem called HDFS (Hadoop Distributed File System)
A framework and API for building and running MapReduce jobs
HDFS is structured similarly to a regular Unix filesystem except that
data storage is distributed across several machines.
HADOOP
HADOOP
Google solved this problem using an algorithm called MapReduce.
This algorithm divides the task into small parts and assigns those
parts to many computers connected over the network, and collects
the results to form the final result dataset.
Doug Cutting, Mike Cafarella and team took the solution provided
by Google and started an Open Source Project called HADOOP in
2005 and Daug named it after his son's toy elephant.
Hadoop is an Apache open source framework written in java that
allows distributed processing of large datasets across clusters of
computers using simple programming models.
Hadoop is designed to scale up from single server to thousands of
machines, each offering local computation and storage.
SO TILL NOW WE HAVE STUDIED REGARDING

VARIOUS TYPES OF DATA
Structured Data > Maintained by RDBMS software
Unstructured Data > Managed by NOSQL databases
BIG Data > Handled by HADOOP (MAP REDUCE FRAMEWORK)
BUT to INCREASE company/organisation PROFIT then after the

management of these data, they needs to be HIGHLY ANALYZED.

Big Data

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Big Data

Загружено:

Авторское право:

Доступные форматы

What is Data Analytics ?

It is a process of transforming processed Information into knowledge

ever noticed that the product you select or viewed from

AMAZON, SNAPDEAL, MYANTRA.

U a offer a PIZZA free on

U a GARLIC BREAD with

WHY Supermarket stores

DATA ANALYTICS (Play with data)

ANALYSIS needs DATA

But structured data represent only 5 to 10% of all

Semi Structured Data

Examples of semi-structured : CSV , XML documents are semi

Here are some examples of machine-generated unstructured data:

The following list shows a few examples of human-generated unstructured data:

The first organizations to embrace it were online and startup

What is BIG DATA?

What is BIG DATA

Decoding the human genome originally took 10years to process; now

1st Character of Big Data

2nd Character of Big Data

Clickstreams and ad impressions capture user behavior at

high-frequency stock trading algorithms reflect market

machine to machine processes exchange data between billions

infrastructure and sensors generate massive log data in realtime

on-line gaming systems support millions of concurrent users,

3rd Character of Big Data

Storing Big Data

Selecting data sources for ANALYSIS

Overview of Big Data stores

SO TILL NOW WE HAVE STUDIED REGARDING

BUT to INCREASE company/organisation PROFIT then after the

Вам также может понравиться