Data Warehouse Development Approach

Data Warehouse Development Approaches
1
1
Fundamental Questions
Before deciding to build a data warehouse for your organization, you need to ask the following basic and fundamental questions and address the relevant issues:

Top-down or bottom-up approach? Enterprise-wide or departmental? Which firstdata warehouse or data mart? Build pilot or go with a full-fledged implementation? Dependent or independent data marts?
2
Data Warehouse Development Approaches
Data warehouse development approaches

Inmon Model: EDW approach Kimball Model: Data mart approach
Which model is better?
There is no one-size-fits-all strategy to data warehousing One alternative is the hosted warehouse
General Data Warehouse Development Approaches
Big bang approach
Incremental approach: Top-down incremental approach Bottom-up incremental approach
ISQS 6339, Data Mgmt & BI, Zhangxi Lin
Big Bang Approach

Analyze enterprise requirements
Build enterprise data warehouse
Report in subsets or store in data marts
Incremental Approach to Warehouse Development

Multiple iterations Shorter implementations Validation of each phase
Increment 1 Strategy Definition Analysis Design
Iterative
Build
Production
Top-Down Approach
Analyze requirements at the enterprise level Develop conceptual information model Identify and prioritize subject areas Complete a model of selected subject area Map to available data Perform a source system analysis Implement base technical architecture Establish metadata, extraction, and load
processes for the initial subject area
Create and populate the initial subject area

data mart within the overall warehouse framework
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 7
Top down
The advantages of this approach are:
A truly corporate effort, an enterprise view of data

Inherently architectednot a union of disparate data marts Single, central storage of data about the content Centralized rules and control May see quick results if implemented with iterations
The disadvantages are:

Takes longer to build even with an iterative method High exposure/risk to failure
Needs high level of cross-functional skills

High outlay without proof of concept
Bottom-Up Approach
Define the scope and coverage of the
data warehouse and analyze the source systems within this scope
Define the initial increment based on the

political pressure, assumed business benefit and data volume
Implement base technical architecture

and establish metadata, extraction, and load processes as required by increment
Create and populate the initial subject

areas within the overall warehouse framework
Bottom-Up
The advantages of this approach are:
Faster and easier implementation of manageable pieces

Favorable return on investment and proof of concept Less risk of failure Inherently incremental; can schedule important data marts first Allows project team to learn and grow
The disadvantages are:

Each data mart has its own narrow view of data Permeates redundant data in every data mart
Perpetuates inconsistent and irreconcilable data

Proliferates unmanageable interfaces
10
Dimensional Modeling Process
High level dimensional model design

Choosing business model Declaring the grain Choosing dimensions Identifying the facts
Detailed dimensional model development Dimensional model review and validation

IS Core users Business community
Final design iteration
11
Supplemental Slides : Data Warehouse Design Phases
12
Defining the Business Requirements
The concept of business dimensions is fundamental to the requirements definition for a data warehouse.
13
Information package
Your primary goal in the requirements definition phase is to compile information packages
Once you have firmed up the information packages, youll be able to proceed to the other phases. Essentially, information packages enable you to:
Define the common subject areas Design key business metrics Decide how data must be presented Determine how users will aggregate or roll up Decide the data quantity for user analysis or query Decide how data will be accessed
14
15
16
Supplemental Slides : The Others
17
Snowflake Schema Model

Country
Direct use by some tools More flexible to change Provides for speedier data loading Can become large and unmanageable Degrades query performance More complex metadata
State County City
18
18
Degenerate Dimensions
order_number and order_line in the fact table
For example, you may be looking for average number of products per order. Then you will have to relate the products to the order number to calculate the average. Attributes such as order_number and order_line in the example are called degenerate dimensions and these are kept as attributes of the fact table.
19
Storage and Performance Considerations

Database sizing Data partitioning Indexing Star query optimization
20
20
Database Sizing - Test Load Sampling

Analyze a representative sample of the data chosen using proven statistical methods. Ensure that the sample reflects: Test loads for different periods Day-to-day operations Seasonal data and worst-case scenarios Indexes and summaries
21
21
Data Partitioning
Breaking up of data into separate physical units that can be handled independently Types of data partitioning Horizontal partitioning. Vertical partitioning
22
22
Indexing
Indexing is used for the following reasons: It is a huge cost saving, greatly improving performance and scalability. It can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed.
23
23
Parallelism
Sales table P1 P2 P3
Customers table
P1
P2
P3
Parallel Execution Servers

24
24
Using Summary Data
Designing summary tables offers the following benefits: Provides fast access to precomputed data Reduces use of I/O, CPU, and memory
25
25

Data Warehouse Development Approach

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Warehouse Development Approach

Загружено:

Авторское право:

Доступные форматы

Data Warehouse Development Approaches

Data Warehouse Development Approaches

Data warehouse development approaches

Which model is better?

General Data Warehouse Development Approaches

Big bang approach

Incremental approach: Top-down incremental approach Bottom-up incremental approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin

Big Bang Approach

Build enterprise data warehouse

Report in subsets or store in data marts

ISQS 6339, Data Mgmt & BI, Zhangxi Lin

Incremental Approach to Warehouse Development

Increment 1 Strategy Definition Analysis Design

Create and populate the initial subject area

A truly corporate effort, an enterprise view of data

The disadvantages are:

Needs high level of cross-functional skills

Define the initial increment based on the

Implement base technical architecture

Create and populate the initial subject

Faster and easier implementation of manageable pieces

The disadvantages are:

Perpetuates inconsistent and irreconcilable data

Dimensional Modeling Process

High level dimensional model design

Detailed dimensional model development Dimensional model review and validation

Final design iteration

ISQS 6339, Data Mgmt & BI, Zhangxi Lin

Supplemental Slides : Data Warehouse Design Phases

Defining the Business Requirements

Supplemental Slides : The Others

Snowflake Schema Model

Storage and Performance Considerations

Database sizing Data partitioning Indexing Star query optimization

Database Sizing - Test Load Sampling

Parallel Execution Servers

Using Summary Data

Вам также может понравиться