Вы находитесь на странице: 1из 41

Day1

Data
warehousin
g Concepts
P. Srinivas

Agenda:
Definition of Data warehouse

What is a data warehouse?


A data warehouse is an integrated System where all the data
From different organizations are stored and shared for Business
Decisions and operational excellence
Bill Inmon, The father of Data warehousing says that
A data warehouse is a subject-oriented,
Integrated, time-variant, and non-volatile collection of data in
Support of managements decisions.
http://www.inmoncif.com
Ralph Kimball, is a famous Data warehousing specialist in USA
Who authored many books and practicing since a very long time
http://www.kimballgroup.com

What is subject-oriented mean?


Any Subject Area data may be loaded into Database server,
and analyse the same is termed as subject-oriented.
Ex:
An Enterprise is a factory which has several departments
working together to achieve the Quality of Service and
goals.
OLTP Application
Data

Data
Warehouse/storage
Subject area wise

Human Resource
system

HR

Customer Information
System

Customer

Finance

Finance

Sales and Distribution

SD

What is Integrated?
Legal Business data from different countries of the world is
gathered and stored in a warehouse.
For eg.,
A Corporation(company) has different international
Companies located in different locations.
Suppose, Human Resource Department in companies located in
UK, USA, UAE, India may be collected in One server and
Create KPI analytical reports which will help in Business
Management from Tactical SBU Manager to Strategic
Management.

What is time-variant?
Time-variant means, The corporate data will be loaded to
A database every day or 8 times/three hours a day OLTP data
Will be stored.
For Example:
Some subject Areas data loading is done as tailored to load
Into data warehouse
Yearly once
Half yearly once
Quarterly
Monthly once
Weekly once
Daily once
Hourly once

What is Non-Volatile Business data?


As the Customer data from the source systems/Applications is
Legal, Update anomalies are not allowed except a front end
Application user.
SELECT
Digital laws of the land are strict, Companies come to India
for development

For Example,
USA is a super power country, which has given permission
To access their citizens data in India. People work as Production
Support personnel/Analysts and watch the valuable data
as they shall be given privileges;;;

BASIC COMPONENTS OF A DATA WAREHOUSE

REQUIREMENT GATHERING Materialization as a Project

Requirement gathering is key/(s) to DW project/(s)


Interviews with Front Office personnel for Inputs
Interviews for access privileges on Source System/(s)
Review the Interviews and prepare Business
- Requirement Document (BRD)

Project Planning

After Estimates and tendering process, Manager sends a mail


To DM to provide the infra structure required to start the work.
With Client, there will be brain-storm session about the BRD

What are Data Marts


Physical
Logical

Physical Data Mart


The physical tables are designed particularly for one subject
area or Manager to see his every day business related reports.
Indexes are created for easy retrieval irrespective of the
Database Engines speed/velocity to retrieve data.
Denormalized model is designed and loaded with that
Kind of transformed data for quick results.
E.F. Codds 3rdNormal form is not used.

Logical Datamarts

The Data mart is created on top of the 3rd normal database tables.
Entity-Relationship diagrams and full proof details will be
Available.
The Presentation layer is created using Views.
The three-dimensional data model is created in
front end Applications.

Data Design

What is Entity Relationship modelling?


Contextual Data Modelling (CDM)
Logical Data Modelling (LDM)
Physical Data Modelling (PDM)

Entity = Table
Attributes = columns
Rows = Tuples
Cardinality = No. of Records

What is a fact table?


A fact is a real observation in the market place
The latest todays data and future dated data will be
Indicator, but in reality it will be considered for Mining object.
A fact table is a Big table designed to accommodate the
Measures. Measures are attributes designed and
which have Math datatypes most of the times.
Given a functional scenario, The modeller would create a model
and help the design team.
A fact table may have a big primary key to attain UNIQUE ness of
Records.
The measures are attributes/Columns in a table.
Generally, IT is considered as a cooked table as the attributes
Data is created using database utilities or UNIX procedures
and the modified data is loaded.
The measures are additive, semi-additive numbers with
meaningful units of measurement

What is a fact-less fact table?

When the designer finds no measures additive, Substractive,


multiply or divisible in a fact table, then its called as FLFT
Eg. Logs, Events and Coverage tables, Student number
, Employee number, etc.,

What is a confirmed Fact table?


The data of one or more Fact table may be used
for loading into another fact table
To generate reports, The presentation layer may be
Designed joining two or more fact tables.

What are Additive, Semiadditive and Non-additive facts?

What is a dimensional table?

A dimension Table is one of a set of companion tables


to a fact table
Each dimension is defined by a primary key that serves as
the basis for referential integrity with any given fact table to
which it is joined.
Most dimension tables contain many textual attributes (fields)
that are the basis for constraining and grouping of fact data
In Data warehouses.

What is a Slowly changing Dimension?


Type0
Type1
Type2
Type3
SCD Type0 means The value of the record is never updated
Date
Under any circumstances.
of
Date of
Joining
Eg.,
Promotion
servic
e
Empno MgrNo DOJ
DOP
7369
3345
12/06/1995
NULL
7369 3344 12/06/1995
12/05/2005
2323
7369
15/05/2005

SCD Type1

The record will be overwritten if a change/update comes. There


wont be any history information stored.

For ex. Let us take Residential details updation of an employee

Name |Residence_Addr |Contract_start_dt Contract_close_dt


Loyang |#223, RT Nagar,Blr |12th Dec 2015
NULL
Loyang |#235, RT Nagar, BLR|15th July 2015
NULL

SCD Type2
New record is created when a change is made/required.

For ex. Let us take Residential details updation of an employee


Name |Residence_Addr |Contract_start_dt Contract_close_dt
Loyang |#223, RT Nagar,Blr |12th Dec 2015
14th July 2015
Loyang |#235, RT Nagar, BLR|15th July 2015
NULL

Note: The NULL value may be changed or As per state Government


BBMPs agreement of 11 months agreement 14th June 2015.
Some companies keep the date 9999-12-31 which is a higher date
for active record.

SCD Type3
An old and new version of this attribute is continuously maintained
in the dimension table

What is De-generate Dimension?

Degenerate word in Webster Dictionary says,


-Declined from the standard norm
-Is Mathematically simple

Generally, Some of the line items in Fact table looks like


De-generate items which has no reference in Dimension table.
There are generally single column/attribute table which is
Called as De-generate dimension.
When Transactional data is used this type of dimension
will come into existence.

What is a Junk Dimension?

It is a convenient grouping of random flags and attributes


To get them out of a fact table and into a useful
Dimensional framework.
For example
The open ended comments field often attached to a fact
fable record will create a Junk meaning less but Important
for text analytics

What is Foreign keys, Primary keys, & Surrogate keys?

What is Quarterly Snapshot

Different Source Systems data

ETL Extract types


Full Load/refresh
Incremental Load
Transactional events

What is purging of data?

Cleansing, De-duping and Merge

Transformation rules
Integration
History maintenance of a person events
De-normalization
Referential Integrity checks
Data type conversions
Calculations, & Derivations
Aggregation for quick retrieval
How to handle NULL values

Staged data load job control services


Job Definition
Run decks
Proc Libs
Job scheduling
Monitoring
Log files

What is soft delete of Corporate data?

What are upstream applications?

What are downstream applications?

Metadata Repository design and storage of data

What is presentation layer?


Logical Design
Physical Design

What is Drill-down/detailed and Drill-up/Highlevel Summary


of data?

What is ANSI standard? American National Standards Institute


What is BSA Standard ? Business Software Alliance
And what is EAR? Export Administration Regulations

Roles in a typical Data warehousing Project The BackOffice

Business Analyst/(s)
Subject Matter experts (SME)
ETL Developer/(s)
Reports developer/(s)
System Analyst/(s)
Project Manager/(s)
Designer
Data Modeller
Data Steward/(s)
Project Leader/(s)
ETL Pre-prod Support person/(s)
Report tool Pre-prod support person/(s)
Prod Server support person/(s)
Database Administrator/(s)
System Administrator/(s)
System Engineer/(s)

THANK YOU!
Any Questions!

Clarifications required!

Вам также может понравиться