Академический Документы
Профессиональный Документы
Культура Документы
2
2
Icons Used
Hands on
Questions Tools Exercise
A Welcome
Try it Out Contacts
Break
3
Data Warehousing Basics Session 06-08:
Overview
Introduction:
This chapter explains Data Modeling Techniques in Data
Warehouse.
4
Data Warehousing Basics Session 06-08:
Objective
Objective:
After completing this session, you will be able to:
» Explain the concept of ER Modeling and Dimensional
Modeling
5
Data Modeling for DWH
6
Data Modeling Techniques
Dimensional Modeling:
Analyzing business measures in the specific business context
End users can easily understand and navigate the data structure
7
ER Model
8
ER Modeling-Logical Design
Entity:
Characterized by a noun
Example:
Sales Organization
Product
Sales Org ID
Distribution Channel
Employee
Sales
Entity
9
ER Modeling-Logical Design (Contd.)
Entity Types:
Associative Entity:
10
ER Modeling-Logical Design (Contd.)
about its own characteristics, that are not properties of the Super
11
ER Modeling-Logical Design (Contd.)
Attributes:
Characteristics and properties of entities
Sales Organization
Sales Org ID
Attributes Distribution Channel
12
ER Modeling-Logical Design (Contd.)
Identifier:
Sales Organization
Sales Org ID
Distribution Channel
Identifier
13
ER Modeling-Logical Design (Contd.)
Relationship:
Cardinality:
Sales Detail Sales Rep
1-1
Sales Record ID Sales Rep ID
1-M
M-M
Relationship
14
Logical Data Model
15
Moving from Logical to Physical Design
Attributes to Columns
16
ER Model - Physical Design
17
Physical Design - Example
18
Logical Vs. Physical
Logical Physical
Entity Table
Attribute Column
19
Why Not ER Model ?
20
Dimensional Model
21
Types of Schema
22
Star Schema
The smaller, outer tables are points of the star. The larger table
the center from which the points radiate.
A single, large and central fact table and one table for each
dimension.
23
Star Schema (Contd.)
Dimension table 1 Dimension table 2
Key 1 Key 2
Attribute Attribute
Fact Table …………
…………
………… Key 1 …………
Key 2 Attribute
Attribute
Key 3
Key 4
Key 4
Data Key 3
Attribute Column
Attribute
…………… ……………
……………
Data
…………….
Column …………….
Attribute
Attribute
Dimension table 4 Dimension table 3
24
Example of Star Schema
25
Example of Star Schema
Store Dimension Fact Table Time Dimension
STORE KEY STORE KEY
PERIOD KEY
Store Description PRODUCT KEY
City PERIOD KEY Period Desc
State Year
Dollars Quarter
District ID
Units
District Desc. Month
Price
Region_ID Day
Region Desc. Current Flag
Regional Mgr.
Product Dimension
Resolution
Level PRODUCT KEY Sequence
Product Desc.
Brand
Color
Size
Manufacturer
Level
Example:
Select A.STORE_KEY, A.PERIOD_KEY, A.dollars from Fact_Table A
where A.STORE_KEY in
(select STORE_KEY from Store_Dimension B where region =
“North” and Level = 2)
26
Star Schema: Dimension Table
PK
Geography_dim
Attributes Elements
Region Region
De-normalized structure
Easy navigation within the dimension State State
City City
Employee Employee
27
Star Schema: Fact Table
Sales_fact
day_code prod_code cust_code empl_code units sold revenue
1211 345 1231123 1232 23 7935
1211 22 1245223 3554 12 264
1211 112 1522342 3963 6 672
1212 233 1524665 2924 34 7922
1212 112 1366454 2673 76 8512
1212 22 1403453 3554 22 484
Dimension Keys
Measures
Contains columns for measures and dimensions
28
What is Dimension?
29
What is Dimension?
30
Dimension Elements
Geography
Time Product
Components of a dimension
Represents the natural elements in the business dimension
Directly related to the dimension
Facilitates analysis from different perspectives of a dimension
Often referred to as levels of a dimension
31
Dimension Hierarchy
Time Dimension
Drill Down
Drill Up
Year 1999
32
Examples of Dimensions
» Service Dimension
» Geographic Dimension
» Time Dimension
33
Surrogate Keys
34
Why Existing Keys Should not be Used?
35
Types of Dimensions
Conformed Dimension
Degenerate Dimension
Demographic Dimension
Junk Dimension
Casual Dimension
36
Conformed Dimensions
SALES
DATE
INVENTORY
37
Causal Dimensions
38
Causal Dimension: Example
Example:
39
What is a Slowly Changing Dimension?
40
Slowly Changing Dimension: Classification
TYPE II
TYPE III
41
Slowly Changing Dimensions Type I
Example 1:
Source Target
Emp id Name Email Emp id Name Email
Source Target
Emp id Name Email Emp id Name Email
Shane@xy
z.com
42
Slowly Changing Dimensions Type I
(Contd.)
Example 2 :
Target
Source PM_P
RIMA
Emp
id
Name Email PM_V
ERSIO
RYKE N_NU
Emp id Name Email
Y MBER
10 Shane Shane@x
yz.com 1000 10 Shane Shane 0
@xyz.
com
43
Types of SCD Type 2
Versioning
Flag
Date
44
Slowly Changing Dimensions II: Versioning
Example:
10 Shane Shane@
abc.co.in
Source
PM_PRIM Emp id Name Email PM_VERSION_NUM
ARYKEY BER
Target
45
Slowly Changing Dimensions II: Versioning
(Contd.)
Example:
Source
Emp id Name Email
10 Shane Shane@
abc.com
46
Slowly Changing Dimensions Type II: Flag
Example:
10 Shane Shane@x
yz.com 1000 10 Shane Shan 1
e@xy
z.
com
Source
Target
47
Slowly Changing Dimensions Type II: Flag
Example:
Source
Emp id Name Email
10 Shane Shane@
abc.co.in
Target
48
Slowly Changing Dimensions Type II: Date
Example:
10 Shane Shane@x
yz.com
1000 10 Shane Shane 01/01/
@xyz. 00
com
Source
Target
49
Slowly Changing Dimensions II: Effective
Date
Example:
Source
Emp id Name Email
10 Shane Shane@
abc.co.in
Target
50
Slowly Changing Dimensions II: Effective
Date (Contd.)
Example:
Source
Emp id Name Email
Shane Shane@
10 abc.com
Target
51
Slowly Changing Dimensions Type III
Example:
Source Target
52
Slowly Changing Dimensions Type III
(Contd.)
Example:
Source
Emp id Name Email
10 Shane Shane@
abc.co.in
Target
53
Slowly Changing Dimensions Type III
(Contd.)
Example:
Source
Emp id Name Email
10 Shane Shane@
abc.com
Target
54
Facts and Measures
Numeric, summarized
55
Types of Facts
Semi Additive: Measures that can be added across few dimensions and not
with others.
56
Types of Fact Tables
Based on the facts classifications, there are two types of fact tables:
Cumulative: This type of fact table describes what has happened over a
period of time. For example, this fact table may describe the total sales by
product by store by day. The facts for this type of fact tables are mostly
additive facts:
» For example, the sum of Sales_Amount for all 7 days in a week represent the total
sales amount for that week
Snapshot: This type of fact table describes the state of things in a particular
instance of time, and usually includes more semi-additive and non-additive
facts:
» For example Current_Balance is a semi-additive fact, as it makes sense to add them
up for all accounts but it does not make sense to add them up through time.
» Profit_Margin is a non-additive fact, it does not make sense to add them up for the
account level or the day level.
57
Fact less Fact Tables
58
Factless Fact Tables: Coverage Tables
59
Factless Fact Tables: Event Tracking
60
Snowflake Schema
Each dimension has one key for each level of the dimension’s
hierarchy.
61
Snowflake Schema
Dim Dim
Table Table
Fact
Table
Dim Dim
Table Table
62
Snowflake Schema
cust_code
cust_name
emp_code
age_code
emp_name
age
emp_code sex_code
emp_code
city_code sex
cityname cust_code city_code
prod_code city
city_code day_code
state_code units
statename prod_code
revenue
day_code brand_code
state_code day_name prod_name
region_code
week_code
regionname
region_code week_code brand_code
country_code week_name brand_name
countryname month_code color_code
month_code
month_name
quarter_code color_code
year color_name
63
Avoid Snowflakes
64
Star vs Snow Flake Schema
Denormalised Normalized
65
Fact Constellation
66
Example of Fact Constellation
67
Fact Constellation
68
ER vs Dimensional
User access more complex Simplified data model for user access
Useful in Enterprise-wide DW
Most often used in Data Marts
implementations
69
Helper Tables
70
Helper Tables: Example
71
Allow time for questions from participants
72
Test Your Understanding
73
Data Warehousing Basics Session 06-08:
Summary
The ER modeling technique is a discipline used to illuminate
the microscopic relationships among data elements.
Star schema: A fact table in the middle connected to a set
of dimension tables.
Snowflake schema: A refinement of star schema where
some dimensional hierarchy is normalized into a set of
smaller dimension tables, forming a shape similar to
snowflake.
Fact constellations: Multiple fact tables share dimension
tables, viewed as a collection of stars, therefore called
galaxy schema or fact constellation.
74
Data Warehousing Basics Session 06-
08: Source
Ralph Kimball, Data Warehousing
Disclaimer: Parts of the content of this course is based on the materials available from the Web sites and
books listed above. The materials that can be accessed from linked sites are not maintained by
Cognizant Academy and we are not responsible for the contents thereof. All trademarks, service marks,
and trade names in this course are the marks of the respective owner(s).
75
You have completed the
Session 06-08 of
Data Warehousing Basics