Академический Документы
Профессиональный Документы
Культура Документы
Data modeling
Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization
1. Conceptual modeling 2. logical modeling 3. physical modeling A data model instance may be one of three kinds according to ANSI in 1975
Flat model
Hierarchical model
Network Model
ER Model
Dimension model
Modeling Tools
ER win ER studio Embarcadero Rational Rose Power Designer Oracle Designer Xcase
ER Modeling
Entity
{
Relationship
Factory Factory ID
Dimensional Modeling
Facts or Measures are the Key Performance Indicators of an enterprise Factual data about the subject area Numeric, summarized
Dimension
What was sold ? Whom was it sold to ? When was it sold ? Where was it sold ?
Service Dimension
Geographic Dimension Time dimension
Dimension Elements
Geography Product Time
Components of a dimension Represents the natural elements in the business dimension Directly related to the dimension Facilitates analysis from different perspectives of a dimension Often referred to as levels of a dimension.
Dimension Hierarchy
Time Dimension
Year Month Date 9/4/99 April 28/4/99 5/5/99 1999 May 17/5/99
Drill Down
Represents the natural business hierarchy within dimension elements Clarifies the drill up, drill down directions
Drill Up
Multi-Dimensional Analysis
100.0 80.0 East A East B West A West B 20.0 0.0 North A West A 1st 2nd 3rd 4th Qtr Qtr Qtr Qtr East A North A North B
Product
60.0 40.0
Geography
A B A B A B
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 90.0 20.4 19.8 26.6 87.3 19.8 30.6 38.6 34.6 31.6 29.7 37.4 33.6 30.7 45.9 46.9 45.0 43.9 44.5 45.5 43.7 42.6
Up
Down
Drill down is a process of requesting for detailed information Drill up is a process of summarizing the existing information
Dimensional Modeling
Subject Area What do you want to know about?
Atomic Detail
Dimensions
Facts
Measures
Frequency of Update How fresh do you need it? Depth of History How far back do you need to know it?
Clean, current, accurate logical models Physical models A subject area model Star / Snowflake schema design
External
Snow-flake Schema
Hybrid Schema Optimal Snow-flake Schema
Employee
Fact Table
Product
Dimension
Time
Dimension
Customer
Time_dim day_code date day_of_week month_seq month_num month_long_name month_short_name qtr_seq qtr_num quarter year
A star schema is a highly denormalized, query-centric model where the basic premise is that information can be broken into two groups: facts and dimensions.
In a star schema, facts are in a single place (the fact table) and the descriptions (or elements) that lead to those facts are in dimension tables. The star schema is built for simplicity and speed. The assumption behind it is that the database is static with no updates being performed online
City
Employee
Snow-flake schema
Customer Time Revenue Units Sold Net Profit
City
Product
Region
Brand
Country
Color
emp_code city_code cityname city_code state_code statename state_code region_code regionname region_code country_code countryname
color_code color_name
color_code color_name
color_code color_name
Although dimension tables are typically static lists, most dimension tables do change over time.
Since these changes are smaller in magnitude compared to changes in fact tables, these dimensions are known as slowly growing or slowly changing dimensions.
TYPE III
Target
Name Email
1001
Shane
Shane@xyz.c om
1001
Shane
Shane@xyz. com
Source
Emp id Name Email Emp id
Target
Name Email
1001
Shane
Shane@ abc.co.in
1001
Shane
Shane@ abc.co.in
Shane@xyz. com
10
Shane
Shane@xyz.c om
1000
10
Shane
PM_PRIMA RYKEY
Emp id
Name
PM_VERSION_NUMBER
1000
10
Shane
1001
10
Shane
Target
PM_PRIM ARYKEY
Emp id
Name
PM_VERSION_NUM BER
Target
1000
10
Shane
Shane@ xyz.com
Shane@ abc.co.in Shane@ abc.com
1001
10
Shane
1003
10
Shane
Emp id
Name
Emp id
Name
PM_CUR RENT_FL AG
10
Shane
Shane@xyz.c om
1000
10
Shane
Source Target
PM_PRIMA RYKEY
Emp id
Name
PM_CURRENT_FLAG
1000
10
Shane
1001
10
Shane
Target
PM_PRIMA RYKEY
Emp id
Name
PM_CURRENT_FLAG
Target
1000
10
Shane
1001
10
Shane
1003
10
Shane
Emp id
Name
Emp id
Name
PM_BEG IN_DAT E
PM_EN D_DATE
10
Shane
Source Target
Name
10
Shane
Shane@ abc.co.in
PM_PRIMAR YKEY
Emp id
Name
PM_BEGIN_D ATE
PM_END_D ATE
1000
10
Shane
Shane@x yz.com
01/01/00
03/01/00
1001
10
Shane
Shane@ abc.co.in
03/01/00
Target
10
Shane
Shane@ abc.com
PM_PRIM ARYKEY
Emp id
Name
PM_BEGIN_D ATE
PM_END_DA TE
1000
10
Shane
01/01/00
03/01/00
1001
10
Shane
03/01/00
05/02/00
1003
10
Shane
05/02/00
Target
Emp id
Name
PM_EFFEC T_DATE
10
Shane
Shane@xyz.c om
10
Shane
Shane@xyz. com
01/01/00
Source
Target
10
Shane
Shane@ abc.co.in
PM_PRIMAR YKEY
Emp id
Name
PM_Prev_Colu mnName
PM_EFFEC T_DATE
10
Shane
Shane@ abc.co.in
Shane@xyz.co m
01/02/00
Target
Email
Shane@ abc.com
10
Shane
PM_PRIM ARYKEY
Emp id
Name
PM_Prev_Colu mnName
PM_EFFECT_ DATE
10
Shane
Shane@ abc.com
Shane@ abc.co.in
01/03/00
Target
Conformed Dimensions
Conformed dimensions are those which are consistent across Data marts.
Essential for integrating the Data marts into an Enterprise Data warehouse
Casual Dimensions
Casual dimensions can be used for explaining why a record exists in a fact table
Casual dimensions should not change the grain of the fact table
Helper Tables
Helper tables are used when there are multi valued dimensions. That is when there is a many to many relationship between a fact table and a dimension table Helper table can be placed between two dimensions tables or between a dimension table and a fact table.