Вы находитесь на странице: 1из 55

Data Modeling

Data modeling
Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization

1. Conceptual modeling 2. logical modeling 3. physical modeling A data model instance may be one of three kinds according to ANSI in 1975

Types of data models

Flat model

Hierarchical model

Network Model

ER Model

Objected oriented model

Dimension model

Modeling Tools
ER win ER studio Embarcadero Rational Rose Power Designer Oracle Designer Xcase

ER Modeling

Review of Logical Modeling Terms & Symbols


Entities define specific groups of information
Sales Organization Sales Org ID Distribution Channel

Entity

Entities are made up of attributes

Review of Logical Modeling Terms & Symbols

Sales Organization Sales Org ID Distribution Channel Attributes

Review of Logical Modeling Terms & Symbols


One or more attribute uniquely identifies an instance of an entity
Sales Organization Sales Org ID Distribution Channel Identifier

Review of Logical Modeling Terms & Symbols


The logical model identifies relationships between entities
Sales Detail Sales Record ID Sales Rep Sales Rep ID

{
Relationship

Logical Data Model


Suppliers Supplier ID Customer Customer ID Retail Market Wholesale Industry

Manufacturing Group Manufacturing Org ID

Sales Detail Sales Record ID

Sales Rep Sales Rep ID

Sales Organization Sales Org ID Distribution Channel

Factory Factory ID

Product Product SKU

Product Sales Plan Plan ID

Dimensional Modeling

Facts and Measures

Facts or Measures are the Key Performance Indicators of an enterprise Factual data about the subject area Numeric, summarized

Dimension

What was sold ? Whom was it sold to ? When was it sold ? Where was it sold ?

Dimensions put measures in perspective


What, when and where qualifiers to the measures Dimensions could be products, customers, time, geography etc.

Some Examples of Data warehousing Dimensions


The following Dimensions are common in all Data warehouses in various forms Product Dimension

Service Dimension
Geographic Dimension Time dimension

Dimension Elements
Geography Product Time

Components of a dimension Represents the natural elements in the business dimension Directly related to the dimension Facilitates analysis from different perspectives of a dimension Often referred to as levels of a dimension.

Dimension Hierarchy
Time Dimension
Year Month Date 9/4/99 April 28/4/99 5/5/99 1999 May 17/5/99

Drill Down

Represents the natural business hierarchy within dimension elements Clarifies the drill up, drill down directions

Each element represents different levels of aggregation


End users may need custom hierarchies

Drill Up

Multi-Dimensional Analysis
100.0 80.0 East A East B West A West B 20.0 0.0 North A West A 1st 2nd 3rd 4th Qtr Qtr Qtr Qtr East A North A North B

Product

60.0 40.0

Geography

East West North

A B A B A B

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 90.0 20.4 19.8 26.6 87.3 19.8 30.6 38.6 34.6 31.6 29.7 37.4 33.6 30.7 45.9 46.9 45.0 43.9 44.5 45.5 43.7 42.6

Characteristic of online analytical processing (OLAP)

Drill Up & Drill Down


Current Result Set
East West North 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 90 20.4 30.6 38.6 34.6 31.6 45.9 46.9 45 43.9

Up

East West North

1999 158.2 135.4 181.7

Down

Jan East West North 5.712 8.568 12.852

Feb 6.528 9.792 14.688

Mar 8.16 12.24 18.36

Apr 7.672 10.808 13.132

May 8.768 12.352 15.008

Jun 10.96 15.44 18.76

Drill down is a process of requesting for detailed information Drill up is a process of summarizing the existing information

Dimensional Modeling
Subject Area What do you want to know about?

Atomic Detail
Dimensions

What level of detail do you need?


Analyze key performance indicators

Facts

Measures

Frequency of Update How fresh do you need it? Depth of History How far back do you need to know it?

Clean, current, accurate logical models Physical models A subject area model Star / Snowflake schema design

Requirements for a Dimensional model

Dimensional Modeling Methodology

External

Data Sources OLTP System

Techniques for Implementing a Dimensional model


Star Schema

Snow-flake Schema
Hybrid Schema Optimal Snow-flake Schema

Star schema- Logical structure


Dimension Dimension

Employee
Fact Table

Product

Dimension

Time

Employee Product Customer Day Units sold Revenue

Dimension

Customer

Geography_dim emp_code emp_name city_code city state_code state region_code region

Star schema: Physical view


Product_dim prod_code prod_name brand color_code
Fact table emp_code prod_code day_code cust_code units revenue

Time_dim day_code date day_of_week month_seq month_num month_long_name month_short_name qtr_seq qtr_num quarter year

Customer_dim cust_code cust_name age_code age sex_code sex city_code city

Star schema characteristics

A star schema is a highly denormalized, query-centric model where the basic premise is that information can be broken into two groups: facts and dimensions.

In a star schema, facts are in a single place (the fact table) and the descriptions (or elements) that lead to those facts are in dimension tables. The star schema is built for simplicity and speed. The assumption behind it is that the database is static with no updates being performed online

Star schema: Dimension Table


PK Geography_dim
Empl_Code 2341 3424 1232 3554 3963 2924 2673 3253 234 2342 empl_name Mike King Jim McCann Kitty Stokes Clem Akins Duncan Moore Dawn McGuire Joe Becker Geoff Bergren Garth Boyd Lin Cepele city_code 101 106 104 102 101 103 105 107 106 104 city Atlantic city Chicago Austin Medford Atlantic city Englewood Alverton Springfield Chicago Austin state_code NJ IL PA NJ NJ NJ PA IL IL PA state region_code region New Jersey 1 New Jersey Illinois 2 Illinois Pennsylvania 1 New Jersey New Jersey 1 New Jersey New Jersey 1 New Jersey New Jersey 1 New Jersey Pennsylvania 1 New Jersey Illinois 2 Illinois Illinois 2 Illinois Pennsylvania 1 New Jersey

Attributes Region State

Elements Region State City Employee

De-normalized structure Easy navigation within the dimension

City
Employee

Star schema: Fact Table


day_code prod_code cust_code empl_code units sold revenue 1211 345 1231123 1232 23 7935 1211 22 1245223 3554 12 264 1211 112 1522342 3963 6 672 1212 233 1524665 2924 34 7922 1212 112 1366454 2673 76 8512 1212 22 1403453 3554 22 484

sales_fact Dimension Keys Measures

Contains columns for measures and dimensions

Snow-flake schema
Customer Time Revenue Units Sold Net Profit

City

Product

Region

Brand

Country

Color

Snow-flake: Physical view


emp_code emp_name

emp_code city_code cityname city_code state_code statename state_code region_code regionname region_code country_code countryname

day_code day_name week_code week_code week_name month_code

emp_code cust_code prod_code day_code units revenue

cust_code cust_name age_code age sex_code sex city_code city

prod_code brand_code prod_name

brand_code brand_name color_code

month_code month_name quarter_code year

color_code color_name

Hybrid schema: Physical view


emp_code emp_name city_code city state_code state region_code region

day_code day_name week_code week_code week_name month_code

emp_code cust_code prod_code day_code units revenue

cust_code cust_name age_code age sex_code sex city_code city

prod_code brand_code prod_name

brand_code brand_name color_code

month_code month_name quarter_code year

color_code color_name

Optimal Snow-flake schema


emp_code emp_name city_code city state_code state region_code region

day_code day_name week_code week_code week_name month_code

emp_code cust_code prod_code day_code brand_code units revenue

cust_code cust_name age_code age sex_code sex city_code city

prod_code brand_code prod_name

brand_code brand_name color_code

month_code month_name quarter_code year

color_code color_name

Although dimension tables are typically static lists, most dimension tables do change over time.

What is a Slowly Changing Dimension?

Since these changes are smaller in magnitude compared to changes in fact tables, these dimensions are known as slowly growing or slowly changing dimensions.

Slowly Changing Dimension Classification


Slowly changing dimensions are classified into three different types TYPE I TYPE II

TYPE III

Slowly Changing Dimensions Type I


Source
Emp id Name Email Emp id

Target
Name Email

1001

Shane

Shane@xyz.c om

1001

Shane

Shane@xyz. com

Source
Emp id Name Email Emp id

Target
Name Email

1001

Shane

Shane@ abc.co.in

1001

Shane

Shane@ abc.co.in

Shane@xyz. com

Slowly Changing Dimensions Type II


Target Source
Emp id Name Email PM_PRI MARY KEY Emp id Name Email PM_VER SION_N UMBER

10

Shane

Shane@xyz.c om

1000

10

Shane

Shane@x yz. com

Slowly Changing Dimensions Source Versioning


Emp id Name Email 10 Shane Shane@ abc.co.in

PM_PRIMA RYKEY

Emp id

Name

Email

PM_VERSION_NUMBER

1000

10

Shane

Shane@ xyz.com Shane@ abc.co.in

1001

10

Shane

Target

Slowly Changing Dimensions Source Versioning


Emp id Name Email 10 Shane Shane@ abc.com

PM_PRIM ARYKEY

Emp id

Name

Email

PM_VERSION_NUM BER

Target

1000

10

Shane

Shane@ xyz.com
Shane@ abc.co.in Shane@ abc.com

1001

10

Shane

1003

10

Shane

Slowly Changing Dimensions Type II - Flag

Emp id

Name

Email

PM_PR IMAR YKEY

Emp id

Name

Email

PM_CUR RENT_FL AG

10

Shane

Shane@xyz.c om

1000

10

Shane

Shane@ xyz. com

Source Target

Slowly Changing Dimensions Source Flag Current


Emp id Name Email 10 Shane Shane@ abc.co.in

PM_PRIMA RYKEY

Emp id

Name

Email

PM_CURRENT_FLAG

1000

10

Shane

Shane@ xyz.com Shane@ abc.co.in

1001

10

Shane

Target

Slowly Changing Dimensions Source Flag Current


Emp id Name Email 10 Shane Shane@ abc.com

PM_PRIMA RYKEY

Emp id

Name

Email

PM_CURRENT_FLAG

Target

1000

10

Shane

Shane@ xyz.com Shane@ abc.co.in Shane@ abc.com

1001

10

Shane

1003

10

Shane

Slowly Changing Dimensions Type II

Emp id

Name

Email

PM_PRI MARY KEY

Emp id

Name

Email

PM_BEG IN_DAT E

PM_EN D_DATE

10

Shane

Shane@xyz.c om 1000 10 Shane Shane@x yz.com 01/01/00

Source Target

Slowly Changing Dimensions Source Effective Date Email


Emp id

Name

10

Shane

Shane@ abc.co.in

PM_PRIMAR YKEY

Emp id

Name

Email

PM_BEGIN_D ATE

PM_END_D ATE

1000

10

Shane

Shane@x yz.com

01/01/00

03/01/00

1001

10

Shane

Shane@ abc.co.in

03/01/00

Target

Slowly Changing Dimensions Source Effective Date Email


Emp id Name

10

Shane

Shane@ abc.com

PM_PRIM ARYKEY

Emp id

Name

Email

PM_BEGIN_D ATE

PM_END_DA TE

1000

10

Shane

Shane@ xyz.com Shane@ abc.co.in Shane@ abc.com

01/01/00

03/01/00

1001

10

Shane

03/01/00

05/02/00

1003

10

Shane

05/02/00

Target

Slowly Changing Dimensions Type III

PM_PRI MARYKE Y Emp id Name Email

Emp id

Name

Email

PM_Prev_ Column Name

PM_EFFEC T_DATE

10

Shane

Shane@xyz.c om

10

Shane

Shane@xyz. com

01/01/00

Source

Target

Slowly Changing Dimensions Type Source III Email


Emp id Name

10

Shane

Shane@ abc.co.in

PM_PRIMAR YKEY

Emp id

Name

Email

PM_Prev_Colu mnName

PM_EFFEC T_DATE

10

Shane

Shane@ abc.co.in

Shane@xyz.co m

01/02/00

Target

Slowly Changing Dimensions Type Source III


Emp id Name

Email
Shane@ abc.com

10

Shane

PM_PRIM ARYKEY

Emp id

Name

Email

PM_Prev_Colu mnName

PM_EFFECT_ DATE

10

Shane

Shane@ abc.com

Shane@ abc.co.in

01/03/00

Target

Conformed Dimensions

Conformed dimensions are those which are consistent across Data marts.

Essential for integrating the Data marts into an Enterprise Data warehouse

Casual Dimensions
Casual dimensions can be used for explaining why a record exists in a fact table

Casual dimensions should not change the grain of the fact table

Casual Dimension - Example


Example: Why did a customer buy a particular product Why did a customer use a particular ATM machine

Factless Fact Tables


The two types of factless fact tables are: Coverage tables Event tracking tables

Factless Fact Tables - Coverage Tables


Coverage tables are required when a primary fact table is sparse Example: Tracking products in a store that did not sell

Factless Fact Tables - Event Tracking


These tables are used for tracking a event: Example: Tracking student attendance

Helper Tables
Helper tables are used when there are multi valued dimensions. That is when there is a many to many relationship between a fact table and a dimension table Helper table can be placed between two dimensions tables or between a dimension table and a fact table.

Helper Tables - Example


Example : A customer having more than one bank account

Вам также может понравиться