You are on page 1of 6

BHARATHIDASAN COLLEGE OF ARTS AND

SCIENCE
ELLISPETTAI,ERODE.

PAPER ON DATA MINING &


DATA WAREHOUSING
PRESENTED BY

II-Year B.Sc Computer Science.,


Email-ID:

ABSTRACT
Organizations are getting larger and amassing ever-increasing
amounts of data. With the increased and widespread use of technologies, interest in data
warehousing and data mining has increased rapidly. Data is a collection of entity.
Database is referred as the collection of data. Data warehouse is a group of database. It
is the centralized location, where information gathered from various sources is placed
together. Data mining is the process of analyzing data to find useful patterns. Data
Mining works with Data Warehouse. Data Warehousing provides the Enterprise with
memory and Data Mining provides the Enterprise with intelligence. Data mining is
becoming an increasingly important tool to transform the data into information. Data
mining is the extraction of hidden prognostic information from large databases. The
OLAP (On-Line Analytical Processing) tools, query languages, and data mining algorithms
help in the extraction of data from the information. The size of a data warehouse ranges
from giga byte to tera byte.

DATA WAREHOUSE DEFINITION


A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
collection of data.
Subject-oriented: The data are organized, so that data relating to the same real-world
event are linked together.
Integrated: The Data Warehouse contains data from all of an organization's operational
systems which is made consistent.
Time-variant: The changes to the data are frequently recorded, so that reports can be
produced showing changes.
Non-volatile: Data is never over-written or deleted and if once committed, the data is
static, read-only, and retained for future reporting.

DATA WAREHOUSING ARCHITECHTURE

Data is extracted from operational databases and external sources, cleaned


to minimize errors and fill-in missing information when possible, and transformed to
reconcile semantic mismatches. Loading data consist of materializing such view and
storing them in the warehouse. The cleaned and transformed data is finally loaded into
the warehouse. After loading the data, additional measures must be taken to ensure that
the data in the warehouse is periodically refreshed to reflect updates to the data
sources and to periodically purge data that is too old from the warehouse. An important
task is maintaining a warehouse is keeping track of the data currently stored in it and
this book-keeping is done by storing information about the warehouse data in system
catalogs. The system catalogs associated with the warehouse are very large and are
often stored and managed in separate database called a metadata repository.

BUGS OF DATA WAREHOUSE


Over their life, data warehouses can have high costs.

The data warehouse is usually not static.


Maintenance costs are high.
Data warehouses can get outdated relatively quickly.
Duplicate, expensive functionality may be developed.

TYPES OF DATA WAREHOUSING APPLICATIONS


Personal productivity applications such as spreadsheets, statistical packages and
graphics tools, are useful for manipulating and presenting data on individual PCs.

Data query and reporting applications deliver warehouse-wide data access through
simple, list-oriented queries, and the generation of basic reports. These reports provide a
view of historical data.
Planning and analysis applications address such essential business requirements as
budgeting, forecasting and customer profitability.

DATA MINING
Data mining is the process of extracting patterns from data. Data mining
is becoming an increasingly important tool to transform this data into information.
It is commonly used in a wide range of profiling practices, such as marketing,
surveillance, fraud detection and scientific discovery. It is the development of
computational algorithms for the identification or extraction of structure from data.
This is done in order to help reduce, model, understand, or analyze the data. Tasks
supported by data mining include prediction, segmentation, dependency modeling,
summarization, and change and deviation detection. Data mining refers to the
statistical analysis techniques used to search through large amounts of data to
discover trends or patterns. Data mining is an especially powerful tool in the
examination and analysis of huge databases. Data Mining is the automated
discovery of patterns in data. Often Data Mining is used together with OLAP for
data analysis. Data mining tools predict future trends and behaviors, allowing
businesses to make proactive, knowledge-driven decisions. The automated,
prospective analyses offered by data mining move beyond the analyses of past
events provided by retrospective tools typical of decision support systems. Data
mining tools can answer business questions that traditionally were too time
consuming to resolve.

TYPES OF DATA MINING:


Pattern mining:

"Pattern mining" is a data mining technique that involves finding existing patterns in data. Pattern-based
data mining looks for patterns that might be associated with terrorist activity.
Subject-based data mining:

"Subject-based data mining" is a data mining technique involving the search for associations between
individuals in data. Subject-based data mining uses an initiating individual or other datum that is
considered, based on other information, to be of high interest, and the goal is to determine what other
persons or financial transactions or movements.
Spatial Data mining:

Spatial data mining is the application of data mining techniques to spatial data. Spatial data mining
follows along the same functions in data mining, with the end objective to find patterns in geography.
Descriptive data mining:

Data that is not numerical (i.e., colors, names, opinions) is called qualitative data. To analyze this
information, classification analysis is best. This model of data mining is also known as the descriptive
model.

CONCLUSION
The data warehouse systems enable us to store large volume of data from a variety of interrelated
databases and process them together. A data warehouse thus solves the complex OLAP queries made by
the analyst and give the required information. Hence, data warehousing system provides the right way to
access large amount of databases at a fraction of time. Data Mining is the extraction of hidden predictive
information from large databases. This is a new powerful new technology with great
potential to help companies focus on the most important information in data warehousing. Data mining
tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven
decisions.