You are on page 1of 28

1

Data Modeling Overview

A Data model is a conceptual representation of data structures(tables) required for a database and is
very powerful in expressing and communicating the business requirements.

A data model visually represents the nature of data, business rules governing the data, and how it
will be organized in the database. A data model is comprised of two parts logical design and physical
design.

Data model helps functional and technical team in designing the database. Functional team normally
refers to one or more Business Analysts, Business Managers, Smart Management Experts, End Users
etc., and Technical teams refers to one or more programmers, DBAs etc. Data modelers are
responsible for designing the data model and they communicate with functional team to get the
business requirements and technical teams to implement the database.

The concept of data modeling can be better understood if we compare the development cycle of a
data model to the construction of a house. For example Company ABC is planning to build a guest
house(database) and it calls the building architect(data modeler) and projects its building
requirements (business requirements). Building architect(data modeler) develops the plan (data
model) and gives it to company ABC. Finally company ABC calls civil engineers(DBA) to construct the
guest house(database).

Need for developing a Data Model:


A new application for OLTP(Online Transaction Processing), ODS(Operational Data Store), data

warehouse and data marts.


Rewriting data models from existing systems that may need to change reports.
Incorrect data modeling in the existing systems.
A data base that has no data models.

Advantages and Importance of Data Model


The goal of a data model is to make sure that all data objects provided by the functional team are

completely and accurately represented.


Data model is detailed enough to be used by the technical team for building the physical database.
The information contained in the data model will be used to define the significance of business,

relational tables, primary and foreign keys, stored procedures, and triggers.
Data Model can be used to communicate the business within and across businesses.

Data Modeling Tools

There are a number of data modeling tools to transform business requirements into logical data
model, and logical data model to physical data model. From physical data model, these tools can be
instructed to generate sql code for creating database.

Popular Data Modeling Tools


2

Tool Name Company Name


Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
Xcase RESolution LTD.

Data Modeling Tools: What to Learn?


Data modeling tools are the only way through which we can create powerful data models. Following
are the various options that we have to know and learn in data modeling tools before start building
data models.

Software:
How to install the data modeling tool on server/client?

Logical Data Model:


How to create entity and add definition, business rule?
How to create domains?
How to create an attribute and add definition, business rule, validation rules like default values
and check constraint?
How to create supertypes, subtypes?
How to create primary keys, unique constraint, foreign key relationships, and recursive
relationships?
How to create identifying and non-identifying relationship?
How to assign relationship cardinality?
How to phrase relationship connecting two tables?
How to assign role names?
How to create key groups?
How to create sequence no's?

Physical Data Model:


How to rename a table?
How to rename a column,validation rules like default and check constraints?
How to assign NULL and NOT NULL to columns?
How to name foreign key constraints?
How to connect to databases like MS Access, Oracle, Sibase, Terradata etc?
How to generate sql code from data model to run against databases like MS Access, Oracle,
Sibase, Terradata etc.?
How to create a data model from an existing database like MS Access, Oracle, Sibase, Terradata
etc.?
How to add database related properties to tables, indexes?
How to check different versions of the data model?
How many data modelers can concurrently work on the same version of a data model?

Dimensional Data Model:


Is there any specific notation to identify a Data Warehouse/Data mart data models?
3

Subject Area:
How to create subject area and assign relevant entities to subject area?

Reports:
How to generate reports from data model and export to .XLS, .DOC, .XML file formats?

Naming Options:
Is there any method to change the entity/table, attribute/column name from upper case to lower
case or lower case to upper case?

Import & Export:


How to create data models from .xls, .txt files etc.?
How to import and export meta data into ETL tools?

Abbreviation Document:
How to create/attach a standard abbreviation document(for naming tables, columns etc.)?

Print:
How to send data models to printer/plotter/Acrobat Reader?

Backup:
How to take backup of data model?

Others:
How to split a data model to logical and physical data model?
How to copy and paste objects within data model and across data models?
How to search an object within a data model?
How to change the font size and color of entities,attributes,relationship lines?
How to create a legend?
How to show a data model in different levels like entity level, attribute level, and definition level?

Erwin Tutorial

All Fusion Erwin Data Modeler commonly known as Erwin, is a powerful and leading data modeling
tool from Computer Associates. Computer Associates delivers several softwares for enterprise
management, storage management solutions, security solutions, application life cycle management,
data management and business intelligence.

Erwin makes database creation very simple by generating the DDL(sql) scripts from a data model by
using its Forward Engineering technique or Erwin can be used to create data models from the
existing database by using its Reverse Engineering technique.
Erwin workplace consists of the following main areas:
Logical: In this view, data model represents business requirements like entities, attributes etc.
Physical: In this view, data model represents physical structures like tables, columns, datatypes etc.
Modelmart: Many users can work with a same data model concurrently.

What can be done with Erwin?


Logical, Physical and dimensional data models can be created.
Data Models can be created from existing systems(rdbms, dbms, files etc.).
Different versions of a data model can be compared.
Data model and database can be compared.
SQl scripts can be generated to create databases from data model.
Reports can be generated in different file formats like .html, .rtf, and .txt.
4

Data models can be opened and saved in several different file types like .er1, .ert, .bpx, .xml, .ers,

.sql, .cmt, .df, .dbf, and .mdb files.


By using ModelMart, concurrent users can work on the same data model.

In order to create data models in Erwin, you need to have this All Fusion Erwin Data Modeler installed
in your system. If you have installed Modelmart, then more than one user can work on the same
model.

How to create a Logical Data Model:

In the following section, a simple example with a step by step procedure to create a logical data
model with two entities and their relationship are explained in detail.
1: Open All Fusion Erwin Data Modeler software.
2: Select the view as "Logical" from the drop-down list. By default, logical will be your workplace.
3: Click New from File menu. Select the option "Logical/Physical" from the displayed wizard. Click Ok.

4: To create an Entity, click the icon "Entity" and drop it on the workplace. By default E/1 will be
displayed as the entity name. Change it to "Country".
5: To create an Attribute, Place the cursor on the entity "Country" and right click it. From the
displayed menu, click attributes which will take you to the attribute wizard. Click "New" button on the
wizard and type attribute name as "Country Code". Select the data type as "String" and click OK.
Select the option Primary Key to identify attribute "Country Code" as the primary key. Follow the
same approach and create another attribute "Country Name" without selecting the primary key
option. Click ok, and now you will be having 2 attributes Country Code, and Country Name under the
entity "Country" in the current logical workplace.
6: Create another entity "Bank" with two attributes namely Bank Code and Bank Name by following
steps 4 and 5.
7: In order to relate these two tables country, bank, a Foreign Key relationship must be created. To
create a Foreign Key relationship, follow these steps.
(a) Click the symbol "Non Identifying Relationship".
(b) Place the cursor on the entity "Country".
(c) Place the cursor on the entity "Bank". Now you can see the relationship(a line drawn from Bank
to Country) between "Country" and "Bank". Double click on that relationship line to open
"Relationships wizard" and change the option from "Nulls Allowed" to "No Nulls" since bank should
have a country code.

The Logical Data Model created by following the above steps looks similar to the following diagram.
5

How to create a Physical Data Model:

1: Change the view from "Logical to Physical" from the drop down list.
2: Click "Database" from main menu and then click "Choose Database" from the sub menu. Then
select your target database server where the database has to be created. Click ok.

3: Place the cursor on the table "Country" and right click it. From the displayed menu, click columns
which will take you to the column wizard. Click the "Database Tab", which is next to "General Tab"
and assign datatypes "VARCHAR2(10), VARCHAR2(50) for columns COUNTRY_CODE and
COUNTRY_NAME respectively. Change the default NULL to NOT NULL for the column
COUNTRY_NAME. Similarly, repeat the above step for the BANK table. Once you have done all of
these, you can see the physical version of the logical data model in the current workplace.

The Physical Data Model created by following the above steps looks similar to the following diagram.

How to generate DDL(sql) scripts to create a database:

1: Select the view as Physical from the drop down list.


2:Click "Tools" from main menu and then click "Forward Engineer/Schema Generation" from the sub
menu which will take you to the "Schema Generation Wizard". Select the appropriate properties that
satisfies your database requirements like schema, table, primary key etc. Click preview to see your
scripts. Either you can click to generate the table in a database or you can store the scripts and run
against the database later.

The DDL(sql) scripts generated by Erwin by following the above steps looks similar to the following
script.

CREATE TABLE Country(Country_Code VARCHAR2(10) NOT NULL,


Country_Name VARCHAR2(50) NOT NULL,
CONSTRAINT PK_Country PRIMARY KEY (Country_Code));

CREATE TABLE Bank(Bank_Code VARCHAR2(10) NOT NULL,


Bank_Name VARCHAR2(50) NOT NULL, Country_Code VARCHAR2(10) NOT NULL,
CONSTRAINT PK_Bank PRIMARY KEY(Bank_Code) );

ALTER TABLE Bank ADD( CONSTRAINT FK_Bank


FOREIGN KEY (Country_Code) REFERENCES Country );

Data Modeling Development Cycle


6

Gathering Business Requirements - First Phase: Data Modelers have to interact with business
analysts to get the functional requirements and with end users to find out the reporting needs.

Conceptual Data Modeling(CDM) - Second Phase: This data model includes all major
entities, relationships and it will not contain much detail about attributes and is often used in the
INITIAL PLANNING PHASE.

Logical Data Modeling(LDM) - Third Phase: This is the actual implementation of a conceptual
model in a logical data model. A logical data model is the version of the model that represents all of
the business requirements of an organization.

Physical Data Modeling(PDM) - Fourth Phase: This is a complete model that includes all
required tables, columns, relationship, database properties for the physical implementation of the
database.

Database - Fifth Phase:


DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL
code is executed in server to create databases.

Conceptual Data Modeling

Conceptual data model includes all major entities and relationships and does not contain much
detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.

Conceptual data model is created by gathering business requirements from various sources like
business documents, discussion with functional teams, business analysts, smart management
experts and end users who do the reporting on the database. Data modelers create conceptual data
model and forward that model to functional team for their review.

Standardization Needs | Modeling data:

Several data modelers may work on the different subject areas of a data model and all data
modelers should use the same naming convention, writing definitions and business rules.

Nowadays, business to business transactions(B2B) are quite common, and standardization helps in
understanding the business in a better way. Inconsistency across column names and definition would
create a chaos across the business.

For example, when a data warehouse is designed, it may get data from several source systems and
each source may have its own names, data types etc. These anomalies can be eliminated if a proper
standardization is maintained across the organization.

Table Names Standardization:


Giving a full name to the tables, will give an idea about data what it is about. Generally, do not
abbreviate the table names; however this may differ according to organizations standards. If the
table names length exceeds the database standards, then try to abbreviate the table names. Some
general guidelines are listed below that may be used as a prefix or suffix for the table.
7

Examples:
Lookup LKP - Used for Code, Type tables by which a fact table can be directly accessed.

e.g. Credit Card Type Lookup CREDIT_CARD_TYPE_LKP


Fact FCT - Used for transaction tables:

e.g. Credit Card Fact - CREDIT_CARD_FCT


Cross Reference - XREF Tables that resolves many to many relationships.

e.g. Credit Card Member XREF CREDIT_CARD_MEMBER_XREF


History HIST - Tables the stores history.

e.g. Credit Card Retired History CREDIT_CARD_RETIRED_HIST


Statistics STAT - Tables that store statistical information.

e.g. Credit Card Web Statistics CREDIT_CARD_WEB_STAT

Column Names Standardization:


Some general guidelines are listed below that may be used as a prefix or suffix for the column.

Examples:
Key Key System generated surrogate key.

e.g. Credit Card Key CRDT_CARD_KEY


Identifier ID - Character column that is used as an identifier.

e.g. Credit Card Identifier CRDT_CARD_ID


Code CD - Numeric or alphanumeric column that is used as an identifying attribute.

e.g. State Code ST_CD


Description DESC - Description for a code, identifier or a key.

e.g. State Description ST_DESC


Indicator IND to denote indicator columns.

e.g. Gender Indicator GNDR_IND

Database Parameters Standardization:


Some general guidelines are listed below that may be used for other physical parameters.

Examples:
Index Index IDX for index names.

e.g. Credit Card Fact IDX01 CRDT_CARD_FCT_IDX01


Primary Key PK for Primary key constraint names.

e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01


Alternate Keys AK for Alternate key names.

e.g. Credit Card Fact AK01 CRDT_CARD_FCT_AK01


Foreign Keys FK for Foreign key constraint names.

e.g. Credit Card Fact FK01 CRDT_CARD_FCT_FK01

Steps to create a Data Model


8

These are the general guidelines to create a standard data model and in real time, a data model may
not be created in the same sequential manner as shown below. Based on the enterprises
requirements, some of the steps may be excluded or included in addition to these.

Sometimes, data modeler may be asked to develop a data model based on the existing database. In
that situation, the data modeler has to reverse engineer the database and create a data model.

1 Get Business requirements.


2 Create High Level Conceptual Data Model.
3 Create Logical Data Model.
4 Select target DBMS where data modeling tool creates the physical schema.
5 Create standard abbreviation document according to business standard.
6 Create domain.
7 Create Entity and add definitions.
8 Create attribute and add definitions.
9 Based on the analysis, try to create surrogate keys, super types and sub types.
10 Assign datatype to attribute. If a domain is already present then the attribute should be
attached to the domain.
11 Create primary or unique keys to attribute.
12 Create check constraint or default to attribute.
13 Create unique index or bitmap index to attribute.
14 Create foreign key relationship between entities.
15 Create Physical Data Model.
15 Add database properties to physical data model.
16 Create SQL Scripts from Physical Data Model and forward that to DBA.
17 Maintain Logical & Physical Data Model.
18 For each release (version of the data model), try to compare the present version with the
previous version of the data model. Similarly, try to compare the data model with the database to
find out the differences.
19 Create a change log document for differences between the current version and previous version
of the data model.

Data Modeler Role

Business Requirement Analysis:


Interact with Business Analysts to get the functional requirements.
Interact with end users and find out the reporting needs.
Conduct interviews, brain storming discussions with project team to get additional requirements.
Gather accurate data by data analysis and functional analysis.

Development of data model:


Create standard abbreviation document for logical, physical and dimensional data models.
Create logical, physical and dimensional data models(data warehouse data modelling).
Document logical, physical and dimensional data models (data warehouse data modelling).

Reports:
Generate reports from data model.

Review:
Review the data model with functional and technical team.
9

Creation of database:
Create sql code from data model and co-ordinate with DBAs to create database.
Check to see data models and databases are in synch.

Support & Maintenance:


Assist developers, ETL, BI team and end users to understand the data model.
Maintain change log for each data model.

Data Modeling Report

From Data Modeling tools, reports can be easily generated for technical and business needs. The
reports that have been generated from logical data model and physical data model are called as
business reports and technical reports respectively. Most of the data modeling tools provide default
reports like subject area reports, entity reports, attribute reports, table reports, column reports,
indexing reports, relationship reports etc. The advantage of these reports is, whether they are
technical or non-technical, everybody would understand what is going on within the organization.

Other than default reports provided by data modeling tools, a data modeler can also create
customized reports as per the needs of an organization. For example, if an expert asks of both logical
and physical reports of a particular subject area in one file(e.g in .xls), logical and physical reports
can be easily merged and reports can be easily generated accordingly. Data Modeling tools provide
the facility of sorting, filtering options and the reports can be exported into file formats like .xls, .doc,
.xml etc.

Logical Data Model Report:

Logical Data Model Report describes information about business such as the entity names, attribute
names, definitions, business rules, mapping information etc.

Logical Data Model Report Example:


10

Physical Data Model Report:

Physical Data Model Report describes information such as the ownership of the database, physical
characteristics of a database (in oracle, table space, extents, segments, blocks, partitions etc),
performance tuning (processors, indexing), table name, column name, data type, relationship
between the tables, constraints, abbreviations, derivation rules, glossary, data dictionary, etc., and is
used by the technical team.

Physical Data Model Report Example:

Conceptual Data Modeling

Conceptual data model includes all major entities and relationships and does not contain much
detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.

Conceptual data model is created by gathering business requirements from various sources like
business documents, discussion with functional teams, business analysts, smart management
experts and end users who do the reporting on the database. Data modelers create conceptual data
model and forward that model to functional team for their review.

Conceptual Data Model - Highlights


CDM is the first step in constructing a data model in top-down approach and is a clear and accurate

visual representation of the business of an organization.


CDM visualizes the overall structure of the database and provides high-level information about the

subject areas or data structures of an organization.


CDM discussion starts with main subject area of an organization and then all the major entities of

each subject area are discussed in detail.


CDM comprises of entity types and relationships. The relationships between the subject areas and the

relationship between each entity in a subject area are drawn by symbolic notation(IDEF1X or IE). In a data
model, cardinality represents the relationship between two entities. i.e. One to one relationship, or one to
many relationship or many to many relationship between the entities.
CDM contains data structures that have not been implemented in the database.
11

In CDM discussion, technical as well as non-technical team projects their ideas for building a sound

logical data model.

See Figure 1.1 below

Consider an example of a bank that contains different line of businesses like savings, credit card,
investment, loans and so on. In example(figure 1.1) conceptual data model contains major entities
from savings, credit card, investment and loans. Conceptual data modeling gives an idea to the
functional and technical team about how business requirements would be projected in the logical
data model.

Figure 1.1 : Example of Conceptual Data Model

Enterprise Data Modeling

The development of a common consistent view and understanding of data elements and their
relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data
modeling provides access to information scattered throughout an enterprise under the control of
different divisions or departments with different databases and data models.

Enterprise Data Modeling is sometimes called as global business model and the entire information
about the enterprise would be captured in the form of entities.

Data Model Highlights

When a enterprise logical data model is transformed to a physical data model, super types and sub
types may not be as is. i.e. the logical and physical structure of super types and sub types may be
entirely different. A data modeler has to change that according to the physical and reporting
requirement.

When a enterprise logical data model is transformed to a physical data model, length of table names,
column names etc may exceed the maximum number of the characters allowed by the database. So
a data modeler has to manually edit that and change the physical names according to database or
organizations standards.
12

One of the important things to note is the standardization of the data model. Since a same attribute
may be present in several entities, the attribute names and data types should be standardized and a
conformed dimension should be used to connect to the same attribute present in several tables.

Standard Abbreviation document is a must so that all data structure names would be consistent
across the data model.

See Figure 1.4 below:

Consider an example of a bank that contains different line of businesses like savings, credit card,
investment, loans and so on. In example(figure 1.2) enterprise data model contains all entities,
attributes, relationships, from lines of businesses savings, credit card, investment and loans.

Figure 1.4 : Example of Enterprise Data Model

Logical Data Modeling

This is the actual implementation and extension of a conceptual data model. A Logical data model is
the version of a data model that represents the business requirements(entire or part) of an
organization and is developed before the physical data model.

As soon as the conceptual data model is accepted by the functional team, development of logical
data model gets started. Once logical data model is completed, it is then forwarded to functional
teams for review. A sound logical design should streamline the physical design process by clearly
defining data structures and the relationships between them. A good data model is created by clearly
thinking about the current and future business requirements. Logical data model includes all required
entities, attributes, key groups, and relationships that represent business information and
define business rules.
13

Example of Logical Data Model: Figure 1.2

In the example, we have identified the entity names, attribute names, and relationship. For detailed
explanation, refer to relational data modeling.

Physical Data Modeling

Physical data model includes all required tables, columns, relationships, database properties
for the physical implementation of databases. Database performance, indexing strategy, physical
storage and denormalization are important parameters of a physical model.

Logical data model is approved by functional team and there-after development of physical data
model work gets started. Once physical data model is completed, it is then forwarded to technical
teams(developer, group lead, DBA) for review. The transformations from logical model to physical
model include imposing database rules, implementation of referential integrity, super types and sub
types etc.
14

Example of Physical Data Model: Figure 1.3

In the example, the entity names have been changed to table names, changed attribute names to
column names, assigned nulls and not nulls, and datatype to each column.

When a data modeler works with the client, his title may be a logical data modeler or a physical data
modeler or combination of both. A logical data modeler designs the data model to suit business
requirements, creates and maintains the lookup data, compares the versions of data model,
maintains change log, generate reports from data model and whereas a physical data modeler has to
know about the source and target databases properties.

A physical data modeler should know the technical-know-how to create data models from existing
databases and to tune the data models with referential integrity, alternate keys, indexes and how to
match indexes to SQL code. It would be good if the physical data modeler knows about replication,
clustering and so on.

The differences between a logical data model and physical data model is shown below.

Logical vs Physical Data Modeling


Logical Data Model Physical Data Model
Represents business information and defines Represents the physical implementation of the model in a
business rules database.
Entity Table
Attribute Column
Primary Key Primary Key Constraint
Alternate Key Unique Constraint or Unique Index
Inversion Key Entry Non Unique Index
Rule Check Constraint, Default Value
Relationship Foreign Key
Definition Comment
15

Relational(OLTP) Data Modeling

Relational Data Model is a data model that views the real world as entities and relationships. Entities
are concepts, real or abstract about which information is collected. Entities are associated with each
other by relationship and attributes are properties of entities. Business rules would determine the
relationship between each of entities in a data model.

The goal of relational data model is to normalize (avoid redundancy)data and to present it in a good
normal form. While working with relational data modeling, a data modeler has to understand 1st
normal form thru 5th normal form to design a good data model.

Following are some of the questions that arise during the development of entity relationship data
model. A complete business and data analysis would lead to design a good data model.

1 What will be the future scope of the data model? How to normalize the data?
2 How to group attributes in entities?
3 How to name entities, attributes, keys groups, relationships?
4 How to connect one entity to other? What sort of relationship is that?
5 How to validate the data?
6 How to normalize the data?
7 How to present reports?

The sample source data shown in the table below provides the information about employees, their
residential state, county, city and their employer names and manager names. It also describes
employees working for an "American Bank" that has got many branches in several states. From data
modeler point of view, analysis of the source data raises following questions.
How to group and organize the data?
How to avoid denormalization since employee's residential data like state name, county Name, city

Name are repeated in most of the records.


What sort of relationship is between employer and employee?
What sort of relationship is between the employee and state, city, county?

Sample Source Data


Emp Emp Emp
State County City Manager Employer DateTime
First Last Full
Name Name Name Name Name Stamp
Name Name Name
American 1/1/2005
Paul
New York Shelby Manhattan Paul Young Bank of 11:23:31
Young
New York AM
American 1/1/2005
Panama Chris Paul
Florida Jefferson Chris Davis Bank of 11:23:31
City Davis Young
Florida AM
American 1/1/2005
Louis Paul
California Montgomery San Hose Louis Johnson Bank of 11:23:31
Johnson Young
California AM
American
1/1/2005
New Jersey Sam Paul Bank of
Hudson Sam Mathew 11:23:31
Jersey City Mathew Young New
AM
Jersey
American 1/1/2005
Nancy Paul
New York Shelby Manhattan Nancy Robinson Bank of 11:23:31
Robinson Young
New York AM
American 1/1/2005
Panama Sheela Chris
Florida Jefferson Sheela Shellum Bank of 11:23:31
City Shellum Davis
Florida AM
16

American 1/1/2005
Louis
California Montgomery Shelby Jeff Bill Jeff Bill Bank of 11:23:31
Johnson
California AM
American
1/1/2005
New Jersey John Sam Bank of
Hudson John Burrell 11:23:31
Jersey City Burrell Mathew New
AM
Jersey

In the next page, we will discuss how to resolve these problems in order to design a good relational
data model.

Upon discussion with business analysts, data modeler can come up with the following conclusions
regarding grouping and relationship between the data. These conclusions play a vital role in
designing the data model as well as expanding for future scope.
Many cities can be in one county. City names will be unique across the country.
Many counties can be in one state. County names will be unique across the country.
Many states can be in USA. State names will be unique across the country USA.
One employee can work with many branches at same time.
For some employees, managers may not be there.

In order to implement the above decisions, relational data modeling is done in the following manner.
To achieve normalization, relevant attributes of employee, employer lookup, state lookup, county

lookup and city lookup tables should be grouped and created.


In order to validate the data of employee table, employee table has been connected to state, county,

and city lookups. Whenever state, county, city data is entered in employee table, data would be checked
against respective lookup tables and correct data is stored. Hence there is no need to carry redundant data
of state, county, city lookup in employee table.
All tables are identified by primary keys(PK). So data can be uniquely identifed from tables.
Records can be inserted or updated directly in the respective lookup table. For example if a state

name changes, then the change will be updated only in the state lookup, hence this change will not affect
other tables like employee.
Since one employee can work in many branches at the same time, table EmployeeEmployerXREF has

been created and it resolves many to many relationships.


Since an employee can be a manager in many occasions, column "manager identifier" has been added

and becomes a foreign key to column employee identifier. The "manager identifier" column would contain
the same value as of an employee identifier. Sometimes it may contain null values also. For example, Paul
Young is the topmost person and doesn't have any managers.
A new column DateTimeStamp has been added to all tables. This column gives the information about

the date and time when the row was inserted or updated.

The completed relational data model is shown in Figure 1.5 and the corresponding data are shown in
separate tables in the next page.
17

Example of Relational Data Model: Figure 1.5

The completed relational data model is shown in Figure 1.5 and the corresponding data stored in
database are shown in separate tables below.

State Lookup
State Code State Name DateTimeStamp
NY New York 1/1/2005 11:23:31 AM
FL Florida 1/1/2005 11:23:31 AM
CA California 1/1/2005 11:23:31 AM
NJ New Jersey 1/1/2005 11:23:31 AM

County Lookup
County Code County Name DateTimeStamp
NYSH Shelby 1/1/2005 11:23:31 AM
FLJE Jefferson 1/1/2005 11:23:31 AM
CAMO Montgomery 1/1/2005 11:23:31 AM
NJHU Hudson 1/1/2005 11:23:31 AM

City Lookup
City Code City Name DateTimeStamp
NYSHMA Manhattan 1/1/2005 11:23:31 AM
FLJEPC Panama City 1/1/2005 11:23:31 AM
CAMOSH San Hose 1/1/2005 11:23:31 AM
NJHUJC Jersey City 1/1/2005 11:23:31 AM

Employee
Emp State County City Manager Emp First Emp Last Emp Full
DateTimeStamp
Id Code Code Code Id Name Name Name
1/1/2005 11:23:31
1 NY NYSH NYSHMA Paul Young Paul Young
AM
1/1/2005 11:23:31
2 FL FLJE FLJEPC 1 Chris Davis Chris Davis
AM
Louis 1/1/2005 11:23:31
3 CA CAMO CAMOSH <1/td> Louis Johnson
Johnson AM
Sam 1/1/2005 11:23:31
4 NJ NJHU NJHUJC 1 Sam Mathew
Mathew AM
5 NY NYSH NYSHMA 1 Nancy Robinson Nancy 1/1/2005 11:23:31
18

Robinson AM
Sheela 1/1/2005 11:23:31
6 FL FLJE FLJEPC 2 Sheela Shellum
Shellum AM
1/1/2005 11:23:31
7 CA CAMO CAMOSH 3 Jeff Bill Jeff Bill
AM
1/1/2005 11:23:31
8 NJ NJHU NJHUJC 4 John Burrell John Burrell
AM

Employer Lookup
Employer
Employer Name DateTimeStamp
Id
American Bank of 1/1/2005 11:23:31
1001
NewYork AM
1/1/2005 11:23:31
1002 American Bank of Florida
AM
American Bank of 1/1/2005 11:23:31
1003
California AM
American Bank of New 1/1/2005 11:23:31
1004
Jersey AM

Employee Employer XREF


Employee Id Employer Id DateTimeStamp
1 1001 1/1/2005 11:23:31 AM
2 1002 1/1/2005 11:23:31 AM
3 1003 1/1/2005 11:23:31 AM
4 1004 1/1/2005 11:23:31 AM
5 1001 1/1/2005 11:23:31 AM
6 1002 1/1/2005 11:23:31 AM
7 1003 1/1/2005 11:23:31 AM
8 1004 1/1/2005 11:23:31 AM

Dimensional Data Modeling

Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good
examples of dimensions are location, product, time, promotion, organization etc. Dimension tables
store records related to that particular dimension and no facts(measures) are stored in these tables.

For example, Product dimension table will store information about products(Product Category,
Product Sub Category, Product and Product Features) and location dimension table will store
information about location( country, state, county, city, zip. A fact(measure) table contains
measures(sales gross value, total units sold) and dimension columns. These dimension columns are
actually foreign keys from the respective dimension tables.
19

Example of Dimensional Data Model: Figure 1.6

In the example figure 1.6, sales fact table is connected to dimensions location, product, time and
organization. It shows that data can be sliced across all dimensions and again it is possible for the
data to be aggregated across multiple dimensions. "Sales Dollar" in sales fact table can be calculated
across all dimensions independently or in a combined manner which is explained below.
Sales Dollar value for a particular product
Sales Dollar value for a product in a location
Sales Dollar value for a product in a year within a location
Sales Dollar value for a product in a year within a location sold or serviced by an employee

In Dimensional data modeling, hierarchies for the dimensions are stored in the dimensional table
itself. For example, the location dimension will have all of its hierarchies from country, state, county
to city. There is no need for the individual hierarchial lookup like country lookup, state lookup, county
lookup and city lookup to be shown in the model.

Uses of Dimensional Data Modeling

Dimensional Data Modeling is used for calculating summarized data. For example, sales data could
be collected on a daily basis and then be aggregated to the week level, the week data could be
aggregated to the month level, and so on. The data can then be referred to as aggregate data.
Aggregation is synonymous with summarization, and aggregate data is synonymous with summary
data. The performance of dimensional data modeling can be significantly increased when
materialized views are used. Materialized view is a pre-computed table comprising aggregated or
joined data from fact and possibly dimension tables which also known as a summary or aggregate
table.

Dimension Table

Dimension table is one that describe the business entities of an enterprise, represented as
hierarchical, categorical information such as time, departments, locations, and products. Dimension
tables are sometimes called lookup or reference tables.

Location Dimension
20

In a relational data modeling, for normalization purposes, country lookup, state lookup, county
lookup, and city lookups are not merged as a single table. In a dimensional data modeling(star
schema), these tables would be merged as a single table called LOCATION DIMENSION for
performance and slicing data requirements. This location dimension helps to compare the sales in
one region with another region. We may see good sales profit in one region and loss in another
region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our
marketing strategy etc.

Example of Location Dimension: Figure 1.8

In the above example, the location part of the Dimensional data model diagram is shown for easy
understanding. It shows all the lookups country, state, county and city are connected to the single
location dimension. Below are the data stored in each table found in the above location part.
Dimension tables have been explained in detail under the section Dimensions.

Country Lookup
Country Code Country Name DateTimeStamp
United States Of
USA 1/1/2005 11:23:31 AM
America

State Lookup
State Code State Name DateTimeStamp
NY New York 1/1/2005 11:23:31 AM
FL Florida 1/1/2005 11:23:31 AM
CA California 1/1/2005 11:23:31 AM
NJ New Jersey 1/1/2005 11:23:31 AM

County Lookup
County Code County Name DateTimeStamp
NYSH Shelby 1/1/2005 11:23:31 AM
FLJE Jefferson 1/1/2005 11:23:31 AM
CAMO Montgomery 1/1/2005 11:23:31 AM
NJHU Hudson 1/1/2005 11:23:31 AM

City Lookup
City Code City Name DateTimeStamp
NYSHMA Manhattan 1/1/2005 11:23:31 AM
21

FLJEPC Panama City 1/1/2005 11:23:31 AM


CAMOSH San Hose 1/1/2005 11:23:31 AM
NJHUJC Jersey City 1/1/2005 11:23:31 AM

Location Dimension
Location Dimension Id Country Name State Name County Name City Name DateTimeStamp
1 USA New York Shelby Manhattan 1/1/2005 11:23:31 AM
2 USA Florida Jefferson Panama City 1/1/2005 11:23:31 AM
3 USA California Montgomery San Hose 1/1/2005 11:23:31 AM
4 USA New Jersey Hudson Jersey City 1/1/2005 11:23:31 AM

Relational Data Modeling is used in OLTP systems which are transaction oriented and Dimensional
Data Modeling is used in OLAP systems which are analytical based. In a data warehouse
environment, staging area is designed on OLTP concepts, since data has to be normalized, cleansed
and profiled before loaded into a data warehouse or data mart. In OLTP environment, lookups are
stored as independent tables in detail whereas these independent tables are merged as a single
dimension in an OLAP environment like data warehouse.

Relational vs Dimensional
Relational Data Modeling Dimensional Data Modeling
Data is stored in RDBMS Data is stored in RDBMS or Multidimensional databases
Tables are units of storage Cubes are units of storage
Data is normalized and used for OLTP. Data is denormalized and used in datawarehouse and data
Optimized for OLTP processing mart. Optimized for OLAP
Several tables and chains of relationships Few tables and fact tables are connected to dimensional
among them tables
Volatile(several updates) and time variant Non volatile and time invariant
SQL is used to manipulate data MDX is used to manipulate data
Summary of bulky transactional data(Aggregates and
Detailed level of transactional data
Measures) used in business decisions
User friendly, interactive, drag and drop multidimensional
Normal Reports
OLAP Reports

Dimension Table

Dimension table is one that describe the business entities of an enterprise, represented as
hierarchical, categorical information such as time, departments, locations, and products. Dimension
tables are sometimes called lookup or reference tables.

Location Dimension

In a relational data modeling, for normalization purposes, country lookup, state lookup, county
lookup, and city lookups are not merged as a single table. In a dimensional data modeling(star
schema), these tables would be merged as a single table called LOCATION DIMENSION for
performance and slicing data requirements. This location dimension helps to compare the sales in
one region with another region. We may see good sales profit in one region and loss in another
region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our
marketing strategy etc.
22

Example of Location Dimension: Figure 1.8

Country Lookup
Country Code Country Name DateTimeStamp
United States Of
USA 1/1/2005 11:23:31 AM
America

State Lookup
State Code State Name DateTimeStamp
NY New York 1/1/2005 11:23:31 AM
FL Florida 1/1/2005 11:23:31 AM
CA California 1/1/2005 11:23:31 AM
NJ New Jersey 1/1/2005 11:23:31 AM

County Lookup
County Code County Name DateTimeStamp
NYSH Shelby 1/1/2005 11:23:31 AM
FLJE Jefferson 1/1/2005 11:23:31 AM
CAMO Montgomery 1/1/2005 11:23:31 AM
NJHU Hudson 1/1/2005 11:23:31 AM

City Lookup
City Code City Name DateTimeStamp
NYSHMA Manhattan 1/1/2005 11:23:31 AM
FLJEPC Panama City 1/1/2005 11:23:31 AM
CAMOSH San Hose 1/1/2005 11:23:31 AM
NJHUJC Jersey City 1/1/2005 11:23:31 AM

Location Dimension
Country
Location State County City DateTime
Dimension Id Name Name Name Stamp
Name
1 USA New York Shelby Manhattan 1/1/2005 11:23:31 AM
2 USA Florida Jefferson Panama City 1/1/2005 11:23:31 AM
3 USA California Montgomery San Hose 1/1/2005 11:23:31 AM
4 USA New Jersey Hudson Jersey City 1/1/2005 11:23:31 AM
23

Product Dimension
In a relational data model, for normalization purposes, product category lookup, product sub-
category lookup, product lookup, and and product feature lookups are are not merged as a single
table. In a dimensional data modeling(star schema), these tables would be merged as a single table
called PRODUCT DIMENSION for performance and slicing data requirements.

Example of Product Dimension: Figure 1.9

Product Category Lookup


Product Category Code Product Category Name DateTimeStamp
1 Apparel 1/1/2005 11:23:31 AM
2 Shoe 1/1/2005 11:23:31 AM

Product Sub-Category Lookup


Product Product DateTime
Sub-Category Code Sub-Category Name Stamp
11 Shirt 1/1/2005 11:23:31 AM
12 Trouser 1/1/2005 11:23:31 AM
13 Casual 1/1/2005 11:23:31 AM
14 Formal 1/1/2005 11:23:31 AM

Product Lookup
Product Code Product Name DateTimeStamp
1001 Van Heusen 1/1/2005 11:23:31 AM
1002 Arrow 1/1/2005 11:23:31 AM
1003 Nike 1/1/2005 11:23:31 AM
1004 Adidas 1/1/2005 11:23:31 AM

Product Feature Lookup


Product Feature Code Product Feature Description DateTimeStamp
10001 Van-M 1/1/2005 11:23:31 AM
10002 Van-L 1/1/2005 11:23:31 AM
10003 Arr-XL 1/1/2005 11:23:31 AM
10004 Arr-XXL 1/1/2005 11:23:31 AM
10005 Nike-8 1/1/2005 11:23:31 AM
10006 Nike-9 1/1/2005 11:23:31 AM
10007 Adidas-10 1/1/2005 11:23:31 AM
10008 Adidas-11 1/1/2005 11:23:31 AM
24

Product Dimension
Product Product Product Sub- Product Product DateTime
Dimension Id Category Name Category Name Name Feature Desc Stamp
1/1/2005
100001 Apparel Shirt Van Heusen Van-M
11:23:31 AM
1/1/2005
100002 Apparel Shirt Van Heusen Van-L
11:23:31 AM
1/1/2005
100003 Apparel Shirt Arrow Arr-XL
11:23:31 AM
1/1/2005
100004 Apparel Shirt Arrow Arr-XXL
11:23:31 AM
1/1/2005
100005 Shoe Casual Nike Nike-8
11:23:31 AM
1/1/2005
100006 Shoe Casual Nike Nike-9
11:23:31 AM
1/1/2005
100007 Shoe Casual Adidas Adidas-10
11:23:31 AM
1/1/2005
100008 Shoe Casual Adidas Adidas-11
11:23:31 AM

Organization Dimension
In a relational data model, for normalization purposes, corporate office lookup, region lookup, branch
lookup, and employee lookups are not merged as a single table. In a dimensional data modeling(star
schema), these tables would be merged as a single table called ORGANIZATION DIMENSION for
performance and slicing data.

This dimension helps us to find the products sold or serviced within the organization by the
employees. In any industry, we can calculate the sales on region basis, branch basis and employee
basis. Based on the performance, an organization can provide incentives to employees and subsidies
to the branches to increase further sales.

Example of Organization Dimension: Figure 1.10

Corporate Lookup
Corporate Code Corporate Name DateTimeStamp
CO American Bank 1/1/2005 11:23:31 AM

Region Lookup
Region Code Region Name DateTimeStamp
SE South East 1/1/2005 11:23:31 AM
MW Mid West 1/1/2005 11:23:31 AM
25

Branch Lookup
Branch Code Branch Name DateTimeStamp
FLTM Florida-Tampa 1/1/2005 11:23:31 AM
ILCH Illinois-Chicago 1/1/2005 11:23:31 AM

Employee Lookup
Employee Code Employee Name DateTimeStamp
E1 Paul Young 1/1/2005 11:23:31 AM
E2 Chris Davis 1/1/2005 11:23:31 AM

Organization Dimension
Organization Corporate Region Branch Employee DateTime
Dimension Id Name Name Name Name Stamp
Florida- 1/1/2005 11:23:31
1 American Bank South East Paul Young
Tampa AM
Illinois- 1/1/2005 11:23:31
2 American Bank Mid West Chris Davis
Chicago AM

Time Dimension
In a relational data model, for normalization purposes, year lookup, quarter lookup, month lookup,
and week lookups are not merged as a single table. In a dimensional data modeling(star schema),
these tables would be merged as a single table called TIME DIMENSION for performance and slicing
data.

This dimensions helps to find the sales done on date, weekly, monthly and yearly basis. We can have
a trend analysis by comparing this year sales with the previous year or this week sales with the
previous week.

Example of Time Dimension: Figure 1.11

Year Lookup
Year Id Year Number DateTimeStamp
1 2004 1/1/2005 11:23:31 AM
2 2005 1/1/2005 11:23:31 AM

Quarter Lookup
Quarter Number Quarter Name DateTimeStamp
1 Q1 1/1/2005 11:23:31 AM
2 Q2 1/1/2005 11:23:31 AM
3 Q3 1/1/2005 11:23:31 AM
4 Q4 1/1/2005 11:23:31 AM

Month Lookup
26

Month Number Month Name DateTimeStamp


1 January 1/1/2005 11:23:31 AM
2 February 1/1/2005 11:23:31 AM
3 March 1/1/2005 11:23:31 AM
4 April 1/1/2005 11:23:31 AM
5 May 1/1/2005 11:23:31 AM
6 June 1/1/2005 11:23:31 AM
7 July 1/1/2005 11:23:31 AM
8 August 1/1/2005 11:23:31 AM
9 September 1/1/2005 11:23:31 AM
10 October 1/1/2005 11:23:31 AM
11 November 1/1/2005 11:23:31 AM
12 December 1/1/2005 11:23:31 AM

Week Lookup
Week Number Day of Week DateTimeStamp
1 Sunday 1/1/2005 11:23:31 AM
1 Monday 1/1/2005 11:23:31 AM
1 Tuesday 1/1/2005 11:23:31 AM
1 Wednesday 1/1/2005 11:23:31 AM
1 Thursday 1/1/2005 11:23:31 AM
1 Friday 1/1/2005 11:23:31 AM
1 Saturday 1/1/2005 11:23:31 AM
2 Sunday 1/1/2005 11:23:31 AM
2 Monday 1/1/2005 11:23:31 AM
2 Tuesday 1/1/2005 11:23:31 AM
2 Wednesday 1/1/2005 11:23:31 AM
2 Thursday 1/1/2005 11:23:31 AM
2 Friday 1/1/2005 11:23:31 AM
2 Saturday 1/1/2005 11:23:31 AM

Time Dimension
Time Day Month Day
Year Quarter Month Month Week DateTime
Dim Of Day of Cal Date
No No No Name No Stamp
Id Year No Week
1/1/2005
1 2004 1 Q1 1 January 1 1 5 1/1/2004 11:23:31
AM
1/1/2005
2 2004 32 Q1 2 February 1 5 1 2/1/2004 11:23:31
AM
1/1/2005
3 2005 1 Q1 1 January 1 1 7 1/1/2005 11:23:31
AM
1/1/2005
4 2005 32 Q1 2 February 1 5 3 2/1/2005 11:23:31
AM

Slowly Changing Dimensions

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product
price changes over time; People change their names for some reason; Country and State names may
change over time. These are a few examples of Slowly Changing Dimensions since some changes are
happening to them over a period of time.

Slowly Changing Dimensions are often categorized into three types namely Type1, Type2 and
Type3. The following section deals with how to capture and handling these changes over time.
27

The "Product" table mentioned below contains a product named, Product1 with Product ID being the
primary key. In the year 2004, the price of Product1 was $150 and over the time, Product1's price
changes from $150 to $350. With this information, let us explain the three types of Slowly Changing
Dimensions.

Product Price in 2004:


Product ID(PK) Year Product Name Product Price
1 2004 Product1 $150

Type 1: Overwriting the old values.


In the year 2005, if the price of the product changes to $250, then the old values of the columns
"Year" and "Product Price" have to be updated and replaced with the new values. In this Type 1,
there is no way to find out the old value of the product "Product1" in year 2004 since the table now
contains only the new price and year information.

Product
Product ID(PK) Year Product Name Product Price
1 2005 Product1 $250

Type 2: Creating an another additional record.


In this Type 2, the old values will not be replaced but a new row containing the new values will be
added to the product table. So at any point of time, the difference between the old values and new
values can be retrieved and easily be compared. This would be very useful for reporting purposes.

Product
Product ID(PK) Year Product Name Product Price
1 2004 Product1 $150
1 2005 Product1 $250

The problem with the above mentioned data structure is "Product ID" cannot store duplicate values
of "Product1" since "Product ID" is the primary key. Also, the current data structure doesn't clearly
specify the effective date and expiry date of Product1 like when the change to its price happened. So,
it would be better to change the current data structure to overcome the above primary key violation.

Product
Effective Expiry
Product ID(PK) Year Product Name Product Price
DateTime(PK) DateTime
1 01-01-2004 12.00AM 2004 Product1 $150 12-31-2004 11.59PM
1 01-01-2005 12.00AM 2005 Product1 $250

In the changed Product table's Data structure, "Product ID" and "Effective DateTime" are composite
primary keys. So there would be no violation of primary key constraint. Addition of new columns,
"Effective DateTime" and "Expiry DateTime" provides the information about the product's effective
date and expiry date which adds more clarity and enhances the scope of this table. Type2 approach
may need additional space in the data base, since for every changed record, an additional row has to
be stored. Since dimensions are not that big in the real world, additional space is negligible.

Type 3: Creating new fields.


In this Type 3, the latest update to the changed values can be seen. Example mentioned below
illustrates how to add new columns and keep track of the changes. From that, we are able to see the
current price and the previous price of the product, Product1.
28

Product
Current Product Old Product
Current
Product ID(PK) Old Year
Product Price
Year Name Price
1 2005 Product1 $250 $150 2004

The problem with the Type 3 approach, is over years, if the product price continuously changes, then
the complete history may not be stored, only the latest change will be stored. For example, in year
2006, if the product1's price changes to $350, then we would not be able to see the complete history
of 2004 prices, since the old values would have been updated with 2005 product information.

Product
Product Product Old Product
Product ID(PK) Year Old Year
Name Price Price
1 2006 Product1 $350 $250 2005