Вы находитесь на странице: 1из 12

A

PAPER PRESENTATION
ON

Data Warehousing & Data


Mining

SRI VENKATESWARA COLLEGE OF


ENGINEERING &TECHNOLOGY
R.V.S.Nagar, chittoor - 517127

Presented by:

M.SIVA PRASAD

S.SHRAVAN

III B.TECH, CSE

III B.TECH, CSE

e-mail:siva181988@gmail.com

e-mail:shravan930@gmail.com

ph.no.9347549466

ph.no.9885382040

ABSTRACT
Organisations are today suffering from a malaise of data overflow.
The developments in the transaction processing technology has given rise
to a situation where the amount and rate of data capture is very high, but
the processing of this data into information that can be utilised for
decision making, is not developing at the same pace. Data warehousing
and data mining (both data & text) provide a technology that enables the
decision-maker in the corporate sector/govt. to process this huge amount
of data in a reasonable amount of time, to extract intelligence/knowledge
in a near real time.
The data warehouse allows the storage of data in a format that
facilitates its access, but if the tools for deriving information and/or
knowledge and presenting them in a format that is useful for decision
making are not provided, then the whole rationale for the existence of the
warehouse disappears. Various technologies for extracting new insight
from the data warehouse have come up which we classify loosely as "Data
Mining Techniques".
Our paper focuses on the need for information repositories and
discovery of knowledge and hence the overview of, the so hyped, Data
Warehousing and Data Mining.

Introduction:
Organizations have lately realized that just processing transactions
and/or informations faster and more efficiently, no longer provides them
with a competitive advantage vis--vis their competitors for achieving
business excellence. Information technology (IT) tools that are oriented
towards knowledge processing can provide the edge that organizations
need to survive and thrive in the current era of fierce competition. The

increasing competitive pressures and the desire to leverage information


technology techniques have led many organizations to explore the benefits
of new emerging technology viz. "Data Warehousing and Data Mining".
What is needed today is not just the latest and updated to the nano-second
information, but the cross-functional information that can help decisions
making activity as "on-line" process.

Problems with the Present Database


The problems with the present Database reporting are:
They do not provide full-fledged support for the following:

Accessibility-Getting required

Timelines-time taken to submit the report

Formats-Formats like spreadsheets, graphs, maps etc.,

Integrity-Accuracy and Reliability of data

information when ever needed

In order to achieve the above characteristics, which are the main


requirements of the Present Information Systems department, Data
Warehousing and Mining is Preferred over the present Database systems.

Evolution of Information Technology Tools :


The evolution of the information systems characterize the evolution
of systems from data maintenance systems, to systems that transform
the data into "information" for use in the decision making process.
These systems supported the information acquisition from the database
of transactional data. The managerial knowledge acquisition function
is/was not directly supported by these systems. The evolution of new
patterns in the changing scenario could not be provided by these
systems directly, the planner was supposed to do this from experience.

Processing

Data

Information

Transaction
s
processi
ng

Processin
Knowledge
g

Management
Information

Data Mining Tools &


On-Line Analytical

The Transformation of Data into Knowledge and associated


tools.

Warehouse with a database:


The only thing that remains constant, especially in corporate
world, is Change. And, these days, change is occurring at an everincreasing rate. A key challenge is implementing an information
infrastructure that allows your company to rapidly respond to change. One
solution to this challenge is the data warehouse.
Data warehousing is an information infrastructure based on detail
data that supports the decision-making process and provides businesses the
ability to access and analyze data to increase an organization's competitive
advantage.
Data warehousing is a process, not an off-the-shelf solution you
buy, but hardware--database and tools integrated into an evolving
information infrastructure--that changes with the dynamics of the business.
The data warehouse makes an attempt to figure out "what we
need", before we know we need it.

Data Warehousing:
Data Warehouse is a database of data gathered from many systems and
intended to support management reporting and decision making. This
process of gathering data is called Data Warehousing.

What it actually is?


A data warehouse stores current and historical data
This data is taken from various, perhaps incompatible,
sources and stored in a uniform format
Several tools transform this data into meaningful business
information for the purpose of comparisons, trends and
forecasting
Data in a warehouse is not updates or changed in any way,
but is only loaded and accessed later on
Data is organized according to subject instead of
application.

In general a database is not a data warehouse unless it has the


following two features:

It collects information from a number of different disparate


sources and is the place where this disparity is reconciled, and

It allows several different applications to make use of the same


information.
Conceptually, a Data Warehouse looks like this:

Information Sources always include the core operational systems which


form the backbone of day-to-day activities. It is these systems which have
traditionally provided management information to support decision
making.
Decision Support Tools are used to analyze the information stored in the
warehouse, typically to identify trends and new business opportunities.
The Data Warehouse itself is the bridge between the operational systems
and the decision support tools. It holds a copy of much of the operational
system data in a logical structure which is more conducive to analysis. The
Data Warehouse, which will be refreshed in scheduled bursts from
operational systems and from relevant external data sources, provides a
single, consistent view of corporate data, leaving operational systems
unaffected.

Data Warehouse Functions:

Increasing the speed and flexibility of analysis.

Providing a foundation for enterprise-wide integration and access.

Improving or re-inventing business processes.

Gaining a clear understanding of customer behavior.

Characteristics of Data Warehouse


It is subject oriented, integrated, non-volatile, time variant
collection of data

in support of management decisions.

Subject oriented: Data Warehouse deals

corporate data
Integrated: Integrates data from different Database systems
(heterogeneous data) to

with all the subjects of

single homogeneous data.

Non-volatile: The Data Warehouse is a read only database. It


cannot be overwritten or deleted. So, its Non-volatile.

Time variant: Historical data with chronological importance, i.e.


Historical Data is maintained and analyzed for future analysis.

Architecture Of Data Warehouse


Each implementation of a data warehouse is different in its detailed
design (a schematic high-level of the architecture and its components is
given in the figure below), but all are characterized by a handful of the
following key components:

A data model to define the warehouse contents.

A carefully designed warehouse database, whether hierarchical,


relational, or multidimensional. While choosing a DBMS it must be
kept in view that the database management system should be
powerful enough to handle huge amount of data running up to
terabytes.

A front end for Decision Support System (DSS) for reporting and
for structured and unstructured analysis.

Fig: Architecture of Data Warehouse

Future Evolution of Data warehousing:


Distributed Data warehouse
Extranet Data warehouse
Biotech Data warehouse
Real-time Analysis

DATAMINING:
Data base mining or Data mining (DM) (formally termed
Knowledge Discovery in Databases KDD) is a process that aims to use
existing data to invent new facts and to uncover new relationships
previously unknown even to experts thoroughly familiar with the data.
Data mining helps extracting hidden predictive
information from databases by recognizing patterns
and trends in data. The features of the Data Mining
products are

Decision Tree
Neural Network
Traditional statistical Techniques
Prediction and Time series.

Data Mining Benefits:

In the Insurance industry, Predicting

Bankruptcy,

risk analysis, credit and Collection models.


In the Finance industry, Analysis and forecasting of
business Performance, stock and bond performance.
In the Market Research, Media selection, broadcasting
analysis, product segmentation.
In
the
Banking,
mortgage

approval,

loan

underwriting, fraud analysis and detection.

Data mining process is diagrammatically exemplified in Figure below


Transformed Data
Data Sources

Select
ed
Data

sele

Extracted
Informatio

Transfor

Assimilated
Information

Data
Warehou
se

Mine

The Data Mining Process.

Assimilat

Data Mining as a Part of the Knowledge Discovery Process


Knowledge Discovery in Databases, frequently abbreviated as KDD, typically
encompasses more than data mining. The knowledge discovery process comprises
six phases:
Data selection.
Data cleansing.
Enrichment Data transformation and encoding.

Goals of Data Mining and Knowledge Discovery


The goals of data mining fall into the following classes:

Prediction: Data mining can show how certain attributes within the data
will behave in the future.

Identification: Data patterns can be used to identify the existence of an


item, an event, or an activity.

Classification: Data mining can partition the data so that different classes
or categories can be identified based on combinations of parameters.

Optimization: One eventual goal of data mining may be to optimize the


use of limited resources such as time, space, money, or materials and to
maximize output variables such as sales or profits under a given set of
constraints.

Goals Of Data Warehousing And Mining

To provide a reliable, single, integrated source of information


To give end users access to their data

without a reliance on reports

produced by Information System (IS) department.

Allows analyzing corporate data, predictive models and improving


Business Intelligence.

Conclusion
Data Warehousing has evolved rapidly and continues
to be a very fast moving and fast changing market
segment. Given an appropriate architecture and a suitable
approach to the goal of the enterprise data warehouse, any
team can deliver a high impact, high value, high ROI and
sustainable data warehouse system that will entirely
change the range of potential outcomes for the enterprise.
Data mining is a best-in-class approach to leveraging
corporate

information.

An

Organizations

inherent

requirement for reporting, analysis, modeling and planning


applications is to use information from the data warehouse
to

help

drive

improved

business

performance.

To

successfully meet this requirement, an organization must


first understand past performance and second, have the
ability to prepare for and manage the future.
REFERENCES
Michael Corey, Michael Abbey, Ian Abramson and Ben Taub,
ORACLE8i Data Warehousing.
Kimbal, Data Warehouse Life Cycle Tool Kit.

Data Base System Concepts by Silberschatz,Korth and Sudharshan

www.datawarehouse.com
www.data-warehouse.com
http://www.google.com/
http://www.datawarehousingonline.com/

Вам также может понравиться