Академический Документы
Профессиональный Документы
Культура Документы
Lecture 4:
Data Warehousing and OLAP
Data Warehousing and OLAP
What is a data warehouse?
region ug/pg
semester
33 CSE 3124Y BI&BDA 2019- 2020
A Sample Data Cube
(based on a slide from book by J. Han and M. Kamber)
Country
Sum
Norway
Australia
sum
dimensions = 2
dimensions = 3
Office Day
Month
43 CSE 3124Y BI&BDA 2019- 2020
A Sample Data Cube
Date Total annual sales
2Qtr of TV in U.S.A.
1Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
sum
Canada
Country
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
c1 c2 c3
sum 67 12 50
c1 c2 c3
p1 56 4 50
p2 11 8 129
rollup sum
p1 110
p2 19
drill-down
c1 c2 c3
day 2 ...
p1 44 4
p2 c1 c2 c3
day 1
p1 12 50
p2 11 8 sale(c1,*,*)
c1 c2 c3
sum 67 12 50
c1 c2 c3
p1 56 4 50
p2 11 8 129
sum
sale(c2,p2,*) p1 110
p2 19 sale(*,*,*)
* c1 c2 c3 *
p1 56 4 50 110
p2 11 8 19
day 2 c1* c267 c312 * 50 129
p1 44 4 48
p2
c1 c2 c3 *
day 1 * 44 4 48 sale(*,p2,*)
p1 12 50 62
p2 11 8 19
* 23 8 50 81
c1 c2 c3 customer
day 2
p1 44 4
p2 c1 c2 c3
day 1 region
p1 12 50
p2 11 8
country
region A region B
p1 56 54
p2 11 8
(customer c1 in Region A;
customers c2, c3 in Region B)
c1 c2 c3
p1 56 4 50
p2 11 8
CUBE
Chevy 1991 white 95 ALL ALL blue 339
Chevy 1991 blue 49 chevy 1990 ALL 154
Chevy 1992 red 31 chevy 1991 ALL 199
Chevy 1992 white 54 chevy 1992 ALL 157
Chevy 1992 blue 71 ford 1990 ALL 189
Ford 1990 red 64 ford 1991 ALL 116
Ford 1990 white 62 ford 1992 ALL 128
Ford 1990 blue 63 chevy ALL red 91
chevy ALL white 236
Ford 1991 red 52 chevy ALL blue 183
Ford 1991 white 9 ford ALL red 144
Ford 1991 blue 55 ford ALL white 133
Ford 1992 red 27 ford ALL blue 156
Ford 1992 white 62 ALL 1990 red 69
Ford 1992 blue 39 ALL 1990 white 149
ALL 1990 blue 125
ALL 1991 red 107
ALL 1991 white 104
ALL 1991 blue 104
ALL 1992 red 59
53 ALL
CSE 3124Y BI&BDA 1992
2019-white
2020 116
ALL 1992 blue 110
Benefits of Multidimensional
Analysis
Client Client
Metadata Warehouse
Integration
Data Models
relations
stars & snowflakes
cubes
Operators
slice & dice
roll-up, drill down
pivoting
other
sale
orderId
date customer
product
custId custId
prodId
prodId name
name
storeId address
price
qty city
amt
store
storeId
city
Fact table
Dimension tables
Measures
sale
orderId
date customer
product
custId custId
prodId
prodId name
name
storeId address
price
qty city
amt
store
storeId
city
sType
store
city region
snowflake schema
constellations region regId name
north cold region
south warm region
rollup
drill-down
dimensions = 2
dimensions = 3
Office Day
Month
78 CSE 3124Y BI&BDA 2019- 2020
Aggregates
Operators: sum, count, max, min,
median, ave
“Having” clause
Cube (& Rollup) operator
Using dimension hierarchy
average by region (within store)
maximum by month (within date)
Monitor
& OLAP Server
other Metadata
sources Integrator
Analysis
Operational Query
Extract
DBs Serve Reports
Transform Data
Data mining
Load Warehouse
Refresh
Data Marts
service DB customer2(Joe)
ETL
(Extract, Transform, and Load)
OLTP OLAP
Data warehouse
A subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-
making process
A multi-dimensional model of a data warehouse
Star schema, snowflake schema, fact constellations