Submitted By: POOJA MISHRA (12609071) S Wat I G U P Ta (1 2 6 0 9 0 4 8) Section B'

S U B M I T T E D B Y: POOJA MISHRA(12609071) S WAT I G U P TA ( 1 2 6 0 9 0 4 8 ) SECTION B
Integrating Data Sources (Chapter 15)
12/3/2013
BIG DATA FOR DUMMIES
Introduction
Identifying the data you need Understanding the fundamentals of big data integration Using Hadoop as ETL Knowing best practices for data integration
12/3/2013
Identifying the data you need

Before you can begin to plan for integration of your big data, you
need to take stock of the type of data you are dealing with. By leveraging new tools, organizations are gaining new insight from previously untapped sources of unstructured data in e-mails, customer service records, sensor data, and security logs. As you begin your big data analysis, you probably do not know exactly what you will find. Your analysis will go through several stages. Exploratory stage Codifying stage Integration and incorporation stage
12/3/2013
Exploratory Stage
In the early stages of your analysis, you will want to search for
patterns in the data. It is only by examining very large volumes (terabytes and petabytes) of data that new and unexpected relationships and correlations among elements may become apparent. You will need a platform such as Hadoop for organizing your big data to look for these patterns. In the exploratory stage, you are not so concerned about integration with operational data.
12/3/2013
Using FlumeNG for big data integration

Flume is used to collect large amounts of log data from
distributed servers. Flume is designed for scalability and can continually add more resources to a system to handle extremely large amounts of data in an efficient way. Flumes output can be integrated with Hadoop and Hive for analysis of the data. Flume also has transformation elements to use on the data and can turn your Hadoop infrastructure into a streaming source of unstructured data.
12/3/2013
Looking for patterns in big data

In the exploratory stage, technology can be used to rapidly search
through huge amounts of streaming data and pull out the trending patterns that relate to specific products or customers. As companies search for patterns in big data, the huge data volumes are narrowed down as if they are passed through a funnel. You may start with petabytes of data and then, as you look for data with similar characteristics or data that forms a particular pattern, you eliminate data that does not match up.
12/3/2013
Codifying stage
After you find something interesting in your big data analysis,
you need to codify it and make it a part of your business process.

You need to make the connection between your big data analytics
and your inventory and product systems.

To codify the relationship between your big data analytics and
your operational data, you need to integrate the data.
12/3/2013
Integration and incorporation stage

Once big data analysis is complete, an approach is needed that
will allow to integrate or incorporate the results of big data analysis into business process and real-time business actions. Technologies for high-speed transport of very large and fast data are a requirement for integrating across distributed big data sources and between big data and operational data. A company that uses big data to predict customer interest in new products needs to make a connection between the big data and the operational data on customers and products to take action. If the company wants to use this information to buy new products or change pricing it needs to integrate its operational data with the results of its big data analysis.
12/3/2013
Understanding the Fundamentals of Big Data Integration

You must create a common understanding of data definitions. You must develop of a set of data services to qualify the data and
make it consistent and ultimately trustworthy.

You need a streamlined way to integrate your big data sources
and systems of record.
12/3/2013
10
Defining Traditional ETL

Traditionally ETL has been used with batch processing in data
warehouse environments.
ETL tools are used to transform the data into the format required
by the data warehouse.

However, ETL is evolving to support integration across much
more than traditional data warehouses. ETL can support integration across transactional systems, operational data stores, BI platforms, MDM hubs, the cloud, and Hadoop platforms
12/3/2013
11
Extract
Read data database.
from
the
source
Transform
Convert the format of the extracted data so that it conform to the requirements of the target database.
Load
Write data to the target database.
12/3/2013
12
Data transformation
Data transformation is the process of changing the format of data
so that it can be used by different applications. This process also includes mapping instructions so that applications are told how to get the data they need to process. The process of data transformation is made far more complex because of the staggering growth in the amount of unstructured data. Data transformation tools are not designed to work well with unstructured data. As a result, companies faced with a significant amount of manual coding to accomplish the required data integration.
12/3/2013
13
Prioritizing Big Data Quality

You should follow a two-phase approach to data quality: Phase 1: Look for patterns in big data without concern for data
quality. Phase 2: After you locate your patterns and establish results that are important to the business, apply the same data quality standards that you apply to your traditional data sources. You want to avoid collecting and managing big data that is not important to the business and will potentially corrupt other data elements in Hadoop or other big data platforms.
12/3/2013
14
Using Hadoop as ETL

Hadoop can be used to handle some of the transformation process
and to otherwise improve on the ETL and data-staging processes. You can speed up the data integration process by loading both unstructured data and traditional operational and transactional data directly into Hadoop, regardless of the initial structure of the data. After the data is loaded into Hadoop, it can be further integrated using traditional ETL tools. When Hadoop is used as an aid to the ETL process, it speeds the analytics process.
12/3/2013
15
Best Practices for Data Integration in a Big Data World

Keep data quality in perspective. Consider real-time data requirements. Dont create new silos of information.
12/3/2013
16
Dealing With Real-time Data Streams And Complex Event Processing

(Chapter 16)
12/3/2013
17
Introduction
Explaining Streaming Data

Meaning Principles Uses Products for Streaming data.
Explaining Complex Event Processing

Meaning Uses Vendors
Differentiating CEP from Streams
Understanding the Impact of Streaming Data and CEP on
Business
12/3/2013
18
Data Streaming
MEANING :Streaming data is an analytic computing platform that is focused on speed. This is because these applications require a continuous stream of often unstructured data to be processed. o Therefore, data is continuously analyzed and transformed in memory before it is stored on a disk. o Processing streams of data works by processing time windows of data in memory across a cluster of servers. o It is a single-pass analysis i.e the analyst cannot reanalyze the data after it is streamed. o Streaming data is useful when analytics need to be done in real time while the data is in motion.
12/3/2013
19
PRINCIPLES :-
When it is necessary to determine a retail buying opportunity at

the point of engagement, either via social media or via permission-based messaging Collecting information about the movement around a secure site To be able to react to an event that needs an immediate response, such as a service outage or a change in a patients medical condition Real-time calculation of costs that are dependent on variables such as usage and available resources
20
12/3/2013
USES :A power plant A power plant needs to be a highly secure environment. Companies often place sensors around the perimeter of a site to detect movement. Therefore, the vast amount of data coming from these sensors needs to be analyzed in real time so that an alarm is sounded only when an actual threat exists.
A teleco mmuni cations compa ny
It is a highly competitive market. Communications systems generate huge volumes of data that have to be analyzed in real time to take th appropriate action. A delay in detecting an error can seriously impact customer satisfaction.
12/3/2013
21
Continued.
Oil explo ration comp any
This needs to know exactly the sources of oil, environmental factors impacting their Operations, water depth, temperature, ice flows etc. This massive amount of data needs to be analyzed and computed so that mistakes are avoided.
Medic al diagn ostic group

12/3/2013
These are required to be able to take massive amounts of data from brain scans and analyze the results in real time to determine where the source of a problem is and what type of action needed to be taken to help the patient.
22
PRODUCTS FOR STREAMING DATA : IBM Infosphere Streams InfoSphere Streams provides continuous analysis of massive data volumes. It is intended to perform complex analytics of heterogeneous data types. It can perform real-time and look-ahead analysis of regularly generated data, using digital filtering, pattern/correlation analysis, and decomposition as well as geospacial analysis.
12/3/2013
23
Twitters Storm Twitters Storm is an open source real-time analytics engine. Twitter uses Storm internally. It is still available as open source and has been gaining significant
traction among emerging companies. It can be used with any programming language for applications Storm is designed to work with existing queuing and database technologies.
12/3/2013
24
Apache S4 The four Ss in S4 stand for Simple Scalable Streaming System. It allows programmers to easily develop applications for
processing continuous streams of data. S4 is designed as a highly distributed system. The S4 design is best suited for large-scale applications for data mining and machine learning in a production environment.
12/3/2013
25
Complex Event Processing

MEANING :CEP is an advanced approach based on simple event processing that collects and combines data from different relevant sources to discover events and patterns that can result in action. o It is a technique for tracking, analyzing, and processing data as an event happens. o It unable companies to establish the correlation between streams of information and match the resulting pattern with defined behaviors such as mitigating a threat or seizing an opportunity.
12/3/2013
26
USES :Ret ail chai n Cred it card com pany

Appl icati ons
12/3/2013
It creates a tiered loyalty program to increase repeat sales. Using a CEP platform, the system triggers a process that offers the customer an extra discount on a related product.
These uses CEP to better manage fraud.. The underlying system will correlate the incoming transactions, track the stream of event data, and trigger a process.
CEP is also implemented in financial trading applications, weather-reporting applications, and sales management Applications.
27
VENDORS OF CEP : Esper (open source vendor), IBM with IBM Operational Decision Manager, Informatica with RulePoint, Oracle with its Complex Event Processing Solution, Microsofts StreamInsights, SAS DataFlux Event Stream Processing Engine, Streambases CEP
12/3/2013
28
Differentiating CEP from streams

Streaming
computing is typically applied to analyzing vast amounts of data in real time, while CEP is focused on solving a specific use case based on events and actions. In many situations CEP is dependent on data streams; however, CEP is not required for streaming data. Streaming computing is used to handle unstructured data, while CEP deals with variables correlated with specific business process. Streaming data is managed in a highly distributed clustered environment, while CEP often run on less complex hardware.
12/3/2013
29
Impact of streaming data and CEP on business

With streaming data, companies are able to process and analyze
big data in real time to gain an immediate insight.

With CEP approaches, companies can stream data and then
leverage a business process engine to apply business rules to the results of that streaming data analysis.
12/3/2013
30
12/3/2013
31

Submitted By: POOJA MISHRA (12609071) S Wat I G U P Ta (1 2 6 0 9 0 4 8) Section B'

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Submitted By: POOJA MISHRA (12609071) S Wat I G U P Ta (1 2 6 0 9 0 4 8) Section B'

Загружено:

Авторское право:

Доступные форматы

S U B M I T T E D B Y: POOJA MISHRA(12609071) S WAT I G U P TA ( 1 2 6 0 9 0 4 8 ) SECTION B

Integrating Data Sources (Chapter 15)

BIG DATA FOR DUMMIES

BIG DATA FOR DUMMIES

Identifying the data you need

BIG DATA FOR DUMMIES

Using FlumeNG for big data integration

BIG DATA FOR DUMMIES

Looking for patterns in big data

BIG DATA FOR DUMMIES

you need to codify it and make it a part of your business process.

and your inventory and product systems.

your operational data, you need to integrate the data.

BIG DATA FOR DUMMIES

Integration and incorporation stage

Understanding the Fundamentals of Big Data Integration

make it consistent and ultimately trustworthy.

and systems of record.

BIG DATA FOR DUMMIES

Defining Traditional ETL

by the data warehouse.

BIG DATA FOR DUMMIES

Read data database.

Write data to the target database.

BIG DATA FOR DUMMIES

Prioritizing Big Data Quality

BIG DATA FOR DUMMIES

Using Hadoop as ETL

BIG DATA FOR DUMMIES

Best Practices for Data Integration in a Big Data World

BIG DATA FOR DUMMIES

Dealing With Real-time Data Streams And Complex Event Processing

Meaning Principles Uses Products for Streaming data.

Explaining Complex Event Processing

Meaning Uses Vendors

Differentiating CEP from Streams

Understanding the Impact of Streaming Data and CEP on

When it is necessary to determine a retail buying opportunity at

BIG DATA FOR DUMMIES

A teleco mmuni cations compa ny

Oil explo ration comp any

Medic al diagn ostic group

BIG DATA FOR DUMMIES

BIG DATA FOR DUMMIES

BIG DATA FOR DUMMIES

Complex Event Processing

BIG DATA FOR DUMMIES

USES :Ret ail chai n Cred it card com pany

BIG DATA FOR DUMMIES

Differentiating CEP from streams

BIG DATA FOR DUMMIES

Impact of streaming data and CEP on business

big data in real time to gain an immediate insight.

BIG DATA FOR DUMMIES

BIG DATA FOR DUMMIES

Вам также может понравиться