Вы находитесь на странице: 1из 25

Data Warehouse Development Approaches

1
1

Fundamental Questions
Before deciding to build a data warehouse for your organization, you need to ask the following basic and fundamental questions and address the relevant issues:

Top-down or bottom-up approach? Enterprise-wide or departmental? Which firstdata warehouse or data mart? Build pilot or go with a full-fledged implementation? Dependent or independent data marts?
2

Data Warehouse Development Approaches

Data warehouse development approaches


Inmon Model: EDW approach Kimball Model: Data mart approach

Which model is better?

There is no one-size-fits-all strategy to data warehousing One alternative is the hosted warehouse

General Data Warehouse Development Approaches

Big bang approach

Incremental approach: Top-down incremental approach Bottom-up incremental approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin

Big Bang Approach


Analyze enterprise requirements

Build enterprise data warehouse

Report in subsets or store in data marts

ISQS 6339, Data Mgmt & BI, Zhangxi Lin

Incremental Approach to Warehouse Development


Multiple iterations Shorter implementations Validation of each phase

Increment 1 Strategy Definition Analysis Design

Iterative

Build

Production
ISQS 6339, Data Mgmt & BI, Zhangxi Lin

Top-Down Approach
Analyze requirements at the enterprise level Develop conceptual information model Identify and prioritize subject areas Complete a model of selected subject area Map to available data Perform a source system analysis Implement base technical architecture Establish metadata, extraction, and load
processes for the initial subject area

Create and populate the initial subject area


data mart within the overall warehouse framework
ISQS 6339, Data Mgmt & BI, Zhangxi Lin 7

Top down
The advantages of this approach are:

A truly corporate effort, an enterprise view of data


Inherently architectednot a union of disparate data marts Single, central storage of data about the content Centralized rules and control May see quick results if implemented with iterations

The disadvantages are:


Takes longer to build even with an iterative method High exposure/risk to failure

Needs high level of cross-functional skills


High outlay without proof of concept

Bottom-Up Approach
Define the scope and coverage of the
data warehouse and analyze the source systems within this scope

Define the initial increment based on the


political pressure, assumed business benefit and data volume

Implement base technical architecture


and establish metadata, extraction, and load processes as required by increment

Create and populate the initial subject


areas within the overall warehouse framework
ISQS 6339, Data Mgmt & BI, Zhangxi Lin

Bottom-Up
The advantages of this approach are:

Faster and easier implementation of manageable pieces


Favorable return on investment and proof of concept Less risk of failure Inherently incremental; can schedule important data marts first Allows project team to learn and grow

The disadvantages are:


Each data mart has its own narrow view of data Permeates redundant data in every data mart

Perpetuates inconsistent and irreconcilable data


Proliferates unmanageable interfaces

10

Dimensional Modeling Process

High level dimensional model design


Choosing business model Declaring the grain Choosing dimensions Identifying the facts

Detailed dimensional model development Dimensional model review and validation


IS Core users Business community

Final design iteration

ISQS 6339, Data Mgmt & BI, Zhangxi Lin

11

Supplemental Slides : Data Warehouse Design Phases

12

Defining the Business Requirements

The concept of business dimensions is fundamental to the requirements definition for a data warehouse.

13

Information package
Your primary goal in the requirements definition phase is to compile information packages

Once you have firmed up the information packages, youll be able to proceed to the other phases. Essentially, information packages enable you to:
Define the common subject areas Design key business metrics Decide how data must be presented Determine how users will aggregate or roll up Decide the data quantity for user analysis or query Decide how data will be accessed

14

15

16

Supplemental Slides : The Others

17

Snowflake Schema Model



Country

Direct use by some tools More flexible to change Provides for speedier data loading Can become large and unmanageable Degrades query performance More complex metadata
State County City

18
18

Degenerate Dimensions
order_number and order_line in the fact table

For example, you may be looking for average number of products per order. Then you will have to relate the products to the order number to calculate the average. Attributes such as order_number and order_line in the example are called degenerate dimensions and these are kept as attributes of the fact table.

19

Storage and Performance Considerations


Database sizing Data partitioning Indexing Star query optimization

20
20

Database Sizing - Test Load Sampling


Analyze a representative sample of the data chosen using proven statistical methods. Ensure that the sample reflects: Test loads for different periods Day-to-day operations Seasonal data and worst-case scenarios Indexes and summaries

21
21

Data Partitioning
Breaking up of data into separate physical units that can be handled independently Types of data partitioning Horizontal partitioning. Vertical partitioning

22
22

Indexing

Indexing is used for the following reasons: It is a huge cost saving, greatly improving performance and scalability. It can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed.

23
23

Parallelism
Sales table P1 P2 P3

Customers table

P1

P2

P3

Parallel Execution Servers


24
24

Using Summary Data

Designing summary tables offers the following benefits: Provides fast access to precomputed data Reduces use of I/O, CPU, and memory

25
25

Вам также может понравиться