Академический Документы
Профессиональный Документы
Культура Документы
1. Objectives ............................................................................... 3
2. What is Data Warehouse?....................................................... 4
2.1. Definitions ...................................................................... 4
2.2. Data Warehouse—Subject-Oriented............................... 5
2.3. Data Warehouse—Integrated.......................................... 5
2.4. Data Warehouse—Time Variant .................................... 6
2.5. Data Warehouse—Non-Volatile..................................... 6
2.6. Data Warehouse vs. Heterogeneous DBMS ................... 7
2.7. Data Warehouse vs. Operational DBMS ........................ 7
2.8. OLTP vs. OLAP ............................................................. 8
2.9. Why Separate Data Warehouse?..................................... 9
3. Multidimensional Data Model .............................................. 10
3.1. Definitions .................................................................... 10
4. Conceptual Modeling of Data Warehousing......................... 12
4.1. Star Schema .................................................................. 13
4.2. Snowflake Schema........................................................ 14
4.3. Fact Constellation ......................................................... 15
5. A Data Mining Query Language: DMQL............................. 16
5.1. Definitions and syntax .................................................. 16
5.2. Defining a Star Schema in DMQL ............................... 17
5.3. Defining a Snowflake Schema in DMQL..................... 18
5.4. Defining a Fact Constellation in DMQL ...................... 19
5.5. Measures: Three Categories.......................................... 21
5.6. How to compute data cube measures? .......................... 22
6. A Concept Hierarchy ............................................................ 24
7. OLAP Operations in a Multidimensional Data..................... 26
8. OLAP Operations ................................................................. 29
9. Starnet Query Model for Multidimensional Databases ........ 33
10. Data warehouse architecture............................................. 34
10.1. DW Design Process ...................................................... 35
A. Bellaachia Page: 1
10.2. Three Data Warehouse models ..................................... 37
10.3. OLAP Server Architectures .......................................... 39
11. Data Warehouse Implementation...................................... 40
11.1. Materialization of data cube ......................................... 40
11.2. Cube Operation............................................................. 41
11.3. Cube Computation Methods ......................................... 43
11.4. Multi-way Array Aggregation for Cube Computation
Error! Bookmark not defined.
11.5. Indexing OLAP Data: Bitmap Index ............................ 44
11.6. Indexing OLAP Data: Join Indices............................... 45
11.7. Efficient Processing OLAP Queries ............................. 46
11.8. Data Warehouse Usage................................................. 46
11.9. Why online analytical mining? ..................................... 47
12. An OLAM Architecture.................................................... 48
A. Bellaachia Page: 2
1. Objectives
A. Bellaachia Page: 3
2. What is Data Warehouse?
2.1. Definitions
A. Bellaachia Page: 4
2.2. Data Warehouse—Subject-Oriented
A. Bellaachia Page: 5
2.4. Data Warehouse—Time Variant
A. Bellaachia Page: 6
2.6. Data Warehouse vs. Heterogeneous DBMS
A. Bellaachia Page: 7
o Access patterns: update vs. read-only but complex
queries
OLTP OLAP
Users Clerk, IT professional Knowledge worker
Function Day to day operations Decision support
DB design Application-oriented Subject-oriented
Data Current, up-to-date Historical, Summarized,
Detailed, flat relational multidimensional
Isolated Integrated, consolidated
Usage Repetitive Ad-hoc
Access Read/write, Index/hash on Lots of scans
prim. Key
Unit of work Short, simple transaction Complex query
# records Tens Millions
accessed
#users Thousands Hundreds
DB size 100MB-GB 100GB-TB
Metric Transaction throughput Query throughput, response
A. Bellaachia Page: 8
2.9. Why Separate Data Warehouse?
A. Bellaachia Page: 9
3. Multidimensional Data Model
3.1. Definitions
• Examples:
Product
Dates
Locations
A. Bellaachia Page: 10
• Cube: A lattice of cuboids
time
1-D cuboids
2-D cuboids
time,supplier item,supplier
time,location,supplie
time,item,location
3-D cuboids
time,item,supplie item,location,supplier
location
item
time
location
item
A. Bellaachia Page: 12
4.1. Star Schema
time
time_key item
day Sales Fact Table
day_of_the_week item_key
month time_key item_name
quarter brand
year item_key type
supplier_type
branch_key
branch_key location_key
units_sold street
branch_name
city
branch_type dollars_sold state_or_province
country
avg_sales
Measures
A. Bellaachia Page: 13
4.2. Snowflake Schema
item
time
item_key
time_key item_name
day Sales Fact Table brand
day_of_the_week type
month time_key supplier_type
quarter
year item_key
supplier
branch_key supplier_key
supplier_typ
location
branch location_key
location_key
branch_key units_sold street
branch_name city_key
branch_type dollars_sold
city
avg_sales
city_key
Measures city
state_or_province
country
A. Bellaachia Page: 14
4.3. Fact Constellation
time item
Shipping Fact Table
time_key item_key
day Sales Fact Table item_name time key
day_of_the_week brand
month
time_key
type item key
quarter supplier type Shipper key
year item_key
from location
branch_key to location
dollars cost
branch location_key
location units shipped
branch_key units_sold location_key
branch_name street shipper
branch_type city
dollars_sold
state_or_ shipper_key
province shipper_name
avg_sales country location_key
Measures shipper_type
A. Bellaachia Page: 15
5. A Data Mining Query Language: DMQL
Syntax:
define cube <cube_name> [<dimension_list>]:
<measure_list>
Example
define cube sales_star [time, item, branch, location]:
dollars_sold = sum(sales_in_dollars),
avg_sales = avg(sales_in_dollars),
units_sold = count(*)
Syntax:
define dimension <dimension_name>
as (<attribute_or_subdimension_list>)
Example:
define dimension item
as (item_key, item_name, brand, type,
supplier_type)
A. Bellaachia Page: 16
• Special Case (Shared Dimension Tables)
Syntax:
define dimension <dimension_name>
as <dimension_name_first_time>
in cube <cube_name_first_time>
Example:
A. Bellaachia Page: 17
5.3. Defining a Snowflake Schema in DMQL
A. Bellaachia Page: 18
5.4. Defining a Fact Constellation in DMQL
A. Bellaachia Page: 19
define dimension shipper
as ( shipper_key,
shipper_name,
location as location in cube sales,
shipper_type)
A. Bellaachia Page: 20
5.5. Measures: Three Categories
• Distributive:
• Algebraic:
A. Bellaachia Page: 21
o Example: avg(), min_N(), standard_deviation().
• Holistic:
o If there is no constant bound on the storage size
needed to describe a subaggregate.
A. Bellaachia Page: 22
• Let us then compute the two measures we have in our data
cube: dollars_sold and units_sold
A. Bellaachia Page: 23
6. A Concept Hierarchy
all
all
...
country Germany ... Spain Canada ... Mexico
office L. Chan
... M. Wind
A. Bellaachia Page: 24
• The order can be either partial or total:
country year
state quarter
week
city month
street day
• Set-grouping hierarchy:
A. Bellaachia Page: 25
7. OLAP Operations in a Multidimensional Data
Week
region
Product
Month
A. Bellaachia Page: 26
• A Sample data cube:
Total annual
Date sales
Product 1Qtr 2Qtr 3Qtr sum
TV 4Qtr
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
date, country
product,date product,country
2-D cuboids
3-D(base) cuboid
product, date, country
A. Bellaachia Page: 27
• Querying a data cube
A. Bellaachia Page: 28
8. OLAP Operations
• Objectives:
o OLAP is a powerful analysis tool:
• Forecasting
• Statistical computations,
• aggregations,
• etc.
c1 10 3 21
New Orleans
c2 12 5 9
c3 11 7 7
Virginia Date of
c4 12 11 15 sale
Camera
NO 22 8 30
roll up VA 23 18 22
A. Bellaachia Page: 29
• Drill down (roll down):
o It is the reverse of roll-up
o It is performed by stepping down a concept
hierarchy for a dimension or introducing new
dimensions.
• Pivot (rotate):
o Re-orient the cube for an alternative
presentation of the data
o Transform 3D view to series of 2D planes.
• Other operations
o Drill across: involving (across) more than one
fact table.
o Drill through: through the bottom level of the
cube to its back-end relational tables (using
SQL)
A. Bellaachia Page: 30
A. Bellaachia Page: 31
A. Bellaachia Page: 32
9. Starnet Query Model for Multidimensional Databases
Customer Orders
Shipping Customer
CONTRACTS
AIR-EXPRESS
ORDER
TRUCK
Product
Time PRODUCT LINE
REGION
DISTRICT
Location DIVISION
Organization
Each circle is called a footprint Promotion
A. Bellaachia Page: 33
10. Data warehouse architecture
A. Bellaachia Page: 34
10.1. DW Design Process
A. Bellaachia Page: 35
• Multi-Tiered Architecture
OLAP
Metadata Monitor Server
&
other Integrator
sources
Analysis
Query
Extract Reports
Operational Transform Data Serve Data
DBs Load Warehouse
Refresh
Data Marts
A. Bellaachia Page: 36
10.2. Three Data Warehouse models
• Enterprise warehouse
o Collect all of the information about subjects
spanning the entire organization.
• Data Mart
o a subset of corporate-wide data that is of value to
a specific groups of users. Its scope is confined to
specific, selected groups, such as marketing data
mart
Independent vs. dependent (directly from
warehouse) data mart.
• Virtual warehouse
o A set of views over operational databases
o Only some of the possible summary views may
be materialized
A. Bellaachia Page: 37
• A Recommended Approach
Multi-Tier Data
Warehouse
Distributed Data
Marts
Enterprise Data
Data Data Warehouse
Mart Mart
A. Bellaachia Page: 38
10.3. OLAP Server Architectures
o Two methods:
Base cuboid data is stored in a `base fact
table
Aggregate data:
► Data can be stored in the base fact
table (Summary Fact table), or
► Data can be stored in a separate
summary fact tables to store each
level of abstraction.
A. Bellaachia Page: 39
11. Data Warehouse Implementation
• Objectives:
• Cube Computation
⎧ 2n If no hierarchy
⎪
⎪ if hierarchy and
number of cuboids = ⎨ n
⎪∏ i =1
( Li + 1) Li is number of levels
⎪⎩ associated with d dim ension i
A. Bellaachia Page: 40
• This is equivalent to the define new warehouse relations
using SQL expressions
• Materialize every (cuboid) (full materialization), none (no
materialization), or some (partial materialization)
• Selection of which cuboids to materialize
Based on size, sharing, access frequency, etc.
Define new warehouse relations using SQL
expressions
A. Bellaachia Page: 41
()
A. Bellaachia Page: 42
11.3. Cube Computation Methods
• ROLAP-based cubing
o Sorting, hashing, and grouping operations
are applied to the dimension attributes in
order to reorder and cluster related tuples
o Grouping is performed on some
subaggregates as a “partial grouping step”
o Aggregates may be computed from
previously computed aggregates, rather than
from the base fact table
• MOLAP Approach
o Uses Array-based algorithm
o The base cuboid is stored as
multidimensional array.
o Read in a number of cells to compute partial
cuboids
A. Bellaachia Page: 43
11.4. Indexing OLAP Data: Bitmap Index
• Approach:
o Index on a particular column
o Each value in the column has a bit vector: bit-op is
fast
o The length of the bit vector: # of records in the base
table
o The i-th bit is set if the i-th row of the base table has
the value for the indexed column
o Not suitable for high cardinality domains
• Example:
Base Table:
Cust Region Type
C1 Asia Retail
C2 Europe Dealer
C3 Asia Dealer
C4 America Retail
C5 Europe Dealer
Index on Region:
RecID Asia Europe America
1 1 0 0
2 0 1 0
3 1 0 0
4 0 0 1
5 0 1 0
Index on Type:
RecID Retail Dealer
1 1 0
2 0 1
3 0 1
4 1 0
5 0 1
A. Bellaachia Page: 44
11.5. Indexing OLAP Data: Join Indices
• Join index:
JI(R-id, S-id)
A. Bellaachia Page: 45
11.6. Efficient Processing OLAP Queries
• Determine which operations should be performed on
the available cuboids:
o transform drill, roll, etc. into corresponding SQL
and/or OLAP operations, e.g, dice = selection +
projection
A. Bellaachia Page: 46
11.8. Why online analytical mining?
• Architecture of OLAM
A. Bellaachia Page: 47
12. An OLAM Architecture
Layer4
User Interface
Layer2
MDDB MDDB
Meta Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data Data Repository
Databases Warehouse
Data integration
A. Bellaachia Page: 48