Академический Документы
Профессиональный Документы
Культура Документы
on
DATA WAREHOUSING AND DATA MINING
MCA-III Semester
By:
Dr. Asha Ambhaikar
Professor & Dean Students Welfare
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 1
Kalinga University, Naya Raipur
Data Warehousing & Data Mining
UNIT- I
The Compelling Need for data warehousing: Escalating Need for strategic information, failures of Past
decision-support systems, operational versus decision-support systems, data warehousing – the only viable
solution, data warehouse defined Data warehouse – The building Blocks: Defining Features, data warehouses
and data marts, overview of the components, metadata in the data warehouse Defining the business
requirements: Dimensional analysis, OLAP operations : Drilldown and roll-up, slice-and-dice or rotation.
UNIT- II Principles of dimensional modeling: , the STAR schema, STAR Schema Keys, Advantages of the STAR
Schema Dimensional Modeling: Updates to the Dimension tables, miscellaneous dimensions, the snowflake
schema, aggregate fact tables, families of STARS ,Steps for the Design & Construction of Data warehouse :
Framework , Architecture , Type of OLAP Servers : ROLAP , MOLAP , Data warehouse implementation tolls &
techniques.
UNIT- III Data Mining, Data Mining of what kind of Data , Knowledge discovery process (KDD) , What kind of
patterns can be mined , OLAP versus data mining, data mining and the data warehouse, Data mining
functionalities, classification Systems , Data processing : Cleaning , Integration & transformation, Reduction .
Data Mining primitives: What defines a Data Mining Task.
UNIT- IV Data Mining Query language (DMQL), Cluster Analysis : Partitioning , Hierarchical Density , Grid &
Model based methods., Major Data Mining Techniques, Cluster detection, decision Trees, memory-based
reasoning, link analysis, neural networks, genetic algorithms, moving into data mining, Data Mining
Applications, Benefits of data mining & applications.
Text books: 1. Paul Raj Poonia, “Fundamentals of Data Warehousing”, John Wiley & Sons, 2003. 2. W. H.
Inmon, “Building the Operational Data Store”,2nd Ed., John Wiley, 1999 3. Sam Anahony, “Data Warehousing
in the Real World: A Practical Guide for Building Decision Support Systems”, John Wiley, 2004. 4. Jarke,
“Fundamentals of Data Warehouse”, Springer
References books: 1. Kamber and Han, “Data Mining Concepts and Techniques”, Hartcourt India P. Ltd.,
2001. 2. G. K. Gupta, “Introduction to Data Mining with Case Studies”, PHI, 2006. 3. A. B. M. Shawkat Ali,
Saleh A. Wasimi, “Data Mining Methods and Techniques”, Cengage Learning, 2009. 4. Pang - Ning, Michael-
Steinbach, “Introduction to Data Mining”, Pearson, 4th Ed., 2009.
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 2
UNIT-I
A process of transforming
Information data into information
and making it available to
users in a timely enough
manner to make a
difference
Data
Technique for assembling and managing data from various sources for the
purpose of answering business questions. Thus making decisions that were
not previous possible
Data warehousing:
Subject Oriented
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 3
Organized around major subjects, such as customer, product, sales.
Focusing on the modeling and analysis of data for decision makers, not on
daily operations or transaction processing.
Integrated:
Time variant:
The time horizon for the data warehouse is significantly longer than that of
operational systems.
But the key of operational data may or may not contain “time element”.
Non volatile:
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 4
Does not require transaction processing, recovery, and concurrency
control mechanisms
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 5
heterogeneous sites involved, and the results are integrated into
a global answer set
Time saving
Competitive advantage
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 6
Goals of Data Warehouse:
Makes an organization’s information accessible.
Makes the organization’s information consistent.
Is an adaptive and durable source of information
Is a secure support that protects the
organization’s information asset
Is the foundation for decision making
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 7
OLTP vs Data Warehouse
OLTP Warehouse (DSS)
Application Oriented Subject Oriented
Used to run business Used to analyze business
Detailed data Summarized and refined
Current up to date Snapshot data
Isolated Data Integrated Data
Repetitive access Ad-hoc access
Clerical User Knowledge User
(Manager)
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 8
To summarize ...
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 9
Data Warehousing
– Semantic integration.
OLAP:
Data Mining
Apart from above basic Trends, In the early stages, four significant factors drove
many companies to move into data warehousing:
violent competition
Government deregulation
Significant Factors:
These significant factors reflect the new
Data Visualization
Parallel Processing
Query Tools
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 10
Browser Tools
Data Fusion
Multidimensional Analysis
Agent Technology
Analysis
Operational Extract Query
Transform Data Serve Reports
DBs Load
Refresh
Warehouse Data mining
Data Marts
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 11
Working of Data Warehouse:
Bottom Layer:
TOP Layer:
The top layer is a client
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 13
Project Planning and Management:
Objectives:
Review the essentials of planning for Data Warehouse
While planning a data Warehouse key issues to be considered include:
setting proper expectations, assessing risk, deciding between Top-
down and Bottom-up approaches, choosing from vendor solution.
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 14
Consider the warning sign and success factors in the final analysis;
adopt a practical approach to built a successful data warehouse.
Project Manager: It is a serious mistake to have a project manager who
is more technology-oriented than user-oriented and business-oriented.
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 15
Fig. 1.2 Data Warehouse Deployment Phases
A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
Dimension tables, such as item (item_name, brand, type), or time(day, week, month,
quarter, year)
Fact table contains measures (such as dollars_sold) and keys to each of the related
dimension tables
In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D
cuboid, which holds the highest-level of summarization, is called the apex cuboid. The
lattice of cuboids forms a data cube.
Multidimensionality:
Data can be organized the way managers like to see them, rather than the way that the
system analysts do
Different presentations of the same data can be arranged easily and quickly
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 16
Dimensions: products, salespeople, market segments, business units, geographical
locations, distribution channels, country, or industry
Measures: money, sales volume, head count, inventory profit, actual versus forecast
Multidimensional Data
Sales volume as a function of product, month,
and region
Dimensions: Product, Location, Time
Hierarchical summarization paths
Office Day
Month
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 17
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
OLAP Operations
OLAP means On Line Analytical Processing.
It is used to perform analysis on data and transform it into information for decision making
purpose.
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 18
– Find total sales.
Pivot : rotate
Region
Country
State
Location Address
Sales Representative
Low-level
Details
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 19
Slicing and Dicing
Household
Telecomm
Video Europe
Far East
Audio India
Juice
Cola 10
Milk 47
Crea 30
m 12 Product
from higher level summary to lower level summary or detailed data, or introducing
new dimensions
This can be realized by either stepping down a concept hierarchy for a dimension or
introducing additional dimensions
The slice operation performs a selection on one dimension of the given cube resulting
in a sub cube
The dice operation defines a sub cube by performing a selection on two or more
dimensions
Pivot (rotate):
It is visualization operation that rotates the data axes in new view in order to provide
an alternative presentation of the data.
Other operations
drill across: Executes queries involving (across) more than one fact table
drill through: Operation uses relational SQL facilities to drill through the bottom level
of the data cube to its back-end relational tables
OLAP Is FASMI
Fast
Analysis to
Share
Multidimensional
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 21
Information
--------------------
Important Questions:
1. What is Data Warehouse? Explain in detail.
11. What are the various OLAP operations? Explain each with neat diagram.
Dr. Asha Ambhaikar, Professor and Dean Students Welfare, Kalinga University, Naya Raipur Page 22