Академический Документы
Профессиональный Документы
Культура Документы
What is metadata
• Continuous availability
operations
Finance Payroll
Operational
Periodic
Data Refresh
Strategic
Tactical Information
Information
• Subject Oriented
• Integrated
• Time variant
Billing Usage
Accounting Revenu
e
Historical data
Meta data
Subject
quick decisions
Data Warhouse Training - Series 1 23
Data Warehouse-
Application Areas
Some Business Applications of a data warehouse:
• Risk management
• Financial analysis
• Marketing programs
• Profit trends
• Procurement analysis
• Inventory analysis
Data Warhouse Training - Series 1 24
DATA MARTS
Data Mart
Enterprise
Data Warehouse
Low cost
user group.
Rapid response
warehouse.
Datamart Advantages :
Typically single subject area and fewer dimensions
Limited feeds
Limited scope
Demonstrates ROI
Data Warhouse Training - Series 1 28
Disadvantages of Data Mart
Data Mart disadvantages :
• Does not provide integrated view of business
information.
Integrated
Subject oriented
Volatile, where update can be done
Current valued, containing data that is a
day or perhaps a month old
Contains detailed data only.
• Data physically
separated from
production environment
ODS to insulate it from the
processing demands of
reporting and analysis
Flat 60,5.2,”JOHN”
files 72,6.2,”DAVID”
Operational
Data Store
Relational
Database
Excel files
Data Warhouse Training - Series 1 35
Operational Data Store-
Update schedule
ODS
Data warehouse
Data Data
Information
Information
Access
Access
Data warehouse
Reporting tools
ODS Mining
Operational
& External OLAP
data Data
Staging
layer Data Information Web
Marts Servers Browsers
Administration
Data Warhouse Training - Series 1 42
Operational & External Data
layer
• The database-of-
record
• Consists of system
specific reference
data and event data
• Source of data for the
data warehouse.
• Contains detailed
data
• Continually changes
Operational due to updates
&
External
• Stores data up to the
Data last transaction.
Layer
• This includes
decoding production
data and merging of
records from multiple
DBMS formats.
Data
Staging
layer
• To build and
maintain the
data warehouse
and data
directory
information
• To keep the
Data warehouse
up-to-date.
as summarized data
•Used for short- and long-term business planning and decision making
covering multiple business units.
Staging
Enterprise Datawarehouse
DM 2 DM 1 DM 3
Manufacturing, etc.
1. Coordinated environment
2. Analysis paralysis
3. Scope control
4. Time to market
Staging
DM 2 DM 1 DM 3
Enterprise Datawarehouse
The EDMA is expanded in this area to include the full range of detail
required for the design and development of the incremental ADM.
Bottom up approach
Business
Requirement
ETL
ETL Data
Data Info
Info
Refine Ware Access
Model
Ware Access
house
house
OLAP
External Data
OLTP Map Data Storage
System sources
Web
Browsers
Entity
04/29/11 Data Warhouse Training - Series 1
Review of Logical Modeling
Terms & Symbols
Entities are made up of
attributes
Sales Organization
Sales Org ID
Distribution Channel
Attributes
Identifier
04/29/11
{ Relationship
Data Warhouse Training - Series 1
Logical Data Model
Suppliers Customer Retail Wholesale
Supplier ID Customer ID Market Industry
Factory Product
Factory ID Product SKU
Product
Time
Components of a dimension
Represents the natural elements in the
business dimension
Directly related to the dimension
Facilitates analysis from different
perspectives of a dimension
Often referred to as levels of a dimension.
Data Warhouse Training - Series 1 71
Dimension Hierarchy
Time Dimension
Year 1999
Drill Down
Drill Up
Month April May
80.0
East A
60.0 East B
Product
West A
40.0
West B
20.0 North A
North A
e
m
West A
Ti
0.0 North B
East A
1st 2nd
3rd 4th
Qtr Qtr
Geography Qtr Qtr
Down
Facts Measures
Physical models
Logical Modeling
Map Req. to OLTP
External
Data Sources
OLTP System
Snow-flake Schema
Hybrid Schema
Employee Product
Fact Table
Employee
Product
Customer
Dimension Day Dimension
Units sold
Time Customer
Revenue
Attributes Elements
structure State
State
Easy navigation City
City
within the dimension
Employee Employee
sales_fact
Dimension Keys
Measures
Revenue
Units Sold
Net Profit
Product
City
Region
Brand
Country Color
week_code brand_code
week_name brand_name
month_code color_code
month_code
month_name
quarter_code color_code
year color_name
week_code brand_code
week_name brand_name
month_code color_code
month_code
month_name
quarter_code color_code
year color_name
changing dimensions.
TYPE I
TYPE II
TYPE III
Source Target
Emp id Name Email Emp id Name Email
Target
Source PM_PRI
MARYK
Emp id Name Email PM_VER
SION_N
EY UMBER
Emp id Name Email
10 Shane Shane@xyz.c
om 1000 10 Shane Shane@x 0
yz.
com
10 Shane Shane@
abc.co.in
10 Shane Shane@
abc.com
10 Shane Shane@xyz.c
om 1000 10 Shane Shane@ 1
xyz.
com
Source
Target
10 Shane Shane@
abc.co.in
Target
Data Warhouse Training - Series 1 95
Slowly Changing
Dimensions - Flag Current
Source
Emp id Name Email
10 Shane Shane@
abc.com
10 Shane Shane@xyz.c
om
1000 10 Shane Shane@x 01/01/00
yz.com
Source
Target
Shane Shane@
10 abc.co.in
Target
Data Warhouse Training - Series 1 98
Slowly Changing Dimensions -
Effective Date
Source
Emp id Name
Email
Shane Shane@
10 abc.com
Source Target
Target
Data Warhouse Training - Series 1 101
Slowly Changing
Dimensions Type III
Source
Emp id Name
Email
Shane Shane@
10 abc.com
Target
Data Warhouse Training - Series 1 102
Conformed Dimensions
Coverage tables
E- EXTRACTION
T- TRANSFORMATION
L- LOADING
• Rule 1
• Rule 2
• Rule 3
Transformation
Engine
Integrator
Error
View
Check
Correct Loader Wareh
arehouse
se
Format differences
Data Variations
Context
Across locations the same code could
represent different customers
Across periods of time a product code could
have been reused
Data Warhouse Training - Series 1 113
Extraction
Data
from
80 tables tables 30
Filter
Oracle
Data from 10
tables Where
Date<10/12/99
50 tables
f iles
Sybase ta f rom
Da
Target
Name =
Concat(First Name,
Last Name)
Indiana Jones
Sherlock Homes
Staging
Area
eg r ated
i nt
m ed&
ra n sfor load
l ean,T data
C
Cleaning,
Transformation
& Integration of
Raw data
r k
Enterprise
o
Data Warehouse
w
h e
t
of
0 %
8
to
0
•Extract •Load
•Design •Extract •Load •Access
•Replication •Access&&Analysis
6
•Design •Scrub •Index •Replication Analysis
•Mapping •Scrub •Index •Resource
•Mapping •Transform •Aggregation •Data
•DataSet
SetDistribution
Distribution •ResourceScheduling
Scheduling&&Distribution
Distribution
•Transform •Aggregation
Meta
MetaData
Data
System
SystemMonitoring
Monitoring
Data Warhouse Training - Series 1 117
Factors Influencing ETL
Architecture
Extraction
Periodic/
Full Extract Incremental
Extract
Data Mart
Full Extract
Source System
Data Mart
Full Extract
Source System
Data Mart
Full Extract
Source System
Incremental Extract
Source System
Incremental Extract
Source System
Changed data
Source System
Incremental Extract
Changed data
Existing data updated
using changed data
Classifications
Changing continuous values to discrete ranges (e.G.
Temperatures to temperature ranges)
Splitting of fields
Merging of fields
Average OLTP
Daily
Productivity Data ware
figures Average
house
Source Target
Splitting Schema Schema
Transformation
“15-10-1992” 15 10 1999
Multiply by 1/43
Rs. 10000 $232.56
Revenue in Revenue in
Rupees Dollars
Multiply by 0.4536
1000 lbs. 453.56 kgs.
Production in Production in
Pounds Kilograms
Source Target
Schema Schema
Target
Gender
Male – M
Female – F
Joseph R Smith
123 Maine St.
MA - 70127
Data Warehouse
New data
OR
Point-in-Time
Snapshot New Data Added to
(e.g.. Monthly) Existing Data
Changed
data
operational systems.
data warehouse
and
Meta
MetaData
Data
System
SystemMonitoring
Monitoring
Technical Operational
Processing Information
This Meta data stores information about the activities
involved in the processing of data such as scheduling
and archives etc
Specifications
Implementation