Вы находитесь на странице: 1из 35

Data Warehouse: Methodology and Tools

Concepts, Architectures and Products

FORWISS - Bavarian Research Centre for Knowledge Based Systems


1999 FORWISS

Overview

The Process of Planning and Building a Warehouse


Data Warehouse Architecture (revisited) Classification of Tools Focus: OLAP Tools
Multidimensional Data Modeling OLAP Architectures OLAP query languages

Tool Demonstration Summary

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

OLAP Design Cycle


Using the Data Warehouse

Requirement Analysis

Implementation

Conceptual Design (Implementation Independent) Logical + Physical Design (e.g. Product specific)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Data Warehouse Architecture


Data Analysis

Reporting, OLAP, Data Mining

Data Storage

Repository Middleware (Populations-Tools)

Data-Migration Operational Data Sources


DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Classification of Tools

Frontend Tools Data Storage Tools (Databases) ETL Tools


Extraction Transformation Loading

Repository Systems
Metadata Storage

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Repository Systems

Manage Different Kinds of Metadata


Business Metadata
E.g. How is revenue computed

Technical Metadata
When was data last loaded from which system Data model for OLTP and OLAP databases

Functionality
Communication hub for different tools
Guides user exploration Guides development process Impact analysis

E.g. Viasoft Rochade, Softlab Enabler,...


2001 FORWISS,Carsten Sapia - sapia@forwiss.de

DW: Tools and Projects

ETL Tools

Extraction: Range of Supported Data Sources


Mainframe legacy databases COBOL Files Relational Databases Filebased data storage (Excel, Word,XML,...)

Transformation
(Graphical) Specification of Transformation Rules (Expressive Power)

Loading
Ability to use database features (e.g. bulk loading)

Process Management
Scheduling, Monitoring, Error Handling

Informatica PowerMart, Hummingbird Genio, Acta...


2001 FORWISS,Carsten Sapia - sapia@forwiss.de

DW: Tools and Projects

Databases for DW

Special Indexing Techniques


Multidimensional Indexes Bitmap Indexes Foreign Column Indexes

Support for Materialized Views (Preaggregation) Special Analytical Capabilities (e.g. SQL Extensions)
Top N Ranking

Bulk Loading Capabilities


Offline, No concurrency control

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Frontend Tools

Reporting
Why did it happen? Interactive OLAP Ad hoc-Queries What will happen?
Additional Benefit
DW: Tools and Projects

What happened?

What happened why and how?

Data Mining
Number of Users
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

10

The Users view (OLAP Tool)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

11

Multidimensional OLAP (MOLAP)

specialized database technology


multidimensional storage structures E.g. Hyperion Essbase, Oracle Express, Cognos PowerPlay (Server)

Frontend Tool

+ + +

Query Performance Powerful MD Model write access Database Features


multiuser access/ backup and recovery

Multidim. Database

DW: Tools and Projects

Sparsity Handling -> DB Explosion


2001 FORWISS,Carsten Sapia - sapia@forwiss.de

12

Relational OLAP (ROLAP)


Frontend Tool
MDInterface +

idea: use relational data storage star (snowflake) schema E.g. Microstrategy, SAP BW advantages of RDBMS
+ scalability, reliability, security etc.

ROLAPEngine
SQL

Meta Data

Sparsity handling
Query Performance Data Model Complexity

Relational DB
DW: Tools and Projects

no write access
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

13

Client (Desktop) OLAP


ClientOLAP

proprietary data structure on the client data stored as file mostly RAM based architectures E.g. Business Objects, Cognos PowerPlay

+
+
DW: Tools and Projects

mobile user
ease of installation and use data volume no multiuser capabilites
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

14

DW Integration
MOLAP ROLAP ClientOLAP

ROLAPEngine Multidim. Database

DW-DB (mostly relational)


DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

15

Combining Architectures I
Drill through

highly

aggregated data data

Multidim. Database

dense

95%

of the analysis requirements

Relational Database
DW: Tools and Projects

detailed

data (sparse)

5%

of the requirements
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

16

Combining Architectures II
Hybrid OLAP (HOLAP)
equal

treatment of MD and Rel

Data
Storage

type at the discretion of the administrator Partitioning

Cube

HOLAP System
Meta Data

Multidim. Storage
DW: Tools and Projects

Relational Storage
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

17

OLAP Standards

Idea: define interface between client and server


Benefit: Component oriented architectures Proposal 1: OLAP Council
union of OLAP Tool producers not implemented so far (even by the council members)

Proposal 2: Microsoft - OLEDB for OLAP (shot ODBO)


standardizes a data model and an MD query language (MDX) specification contains lots of optional functionality all major vendors committed themselves to the standard

will be the de facto standard

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Practical Case Study


Building a Warehouse

artwork copyright Intersystems GmbH 1999 FORWISS

19

Conceptual Design
Using the Data Warehouse

Requirement Analysis

Implementation

Conceptual Design (Implementation Independent) Logical + Physical Design (tool specific)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

20

The Modeling Process


Which business process is being modeled? What is the subject of analysis (fact) and what is being measured? On what granularity level is active analysis being done? Which properties (dimensions) determine the measures?

Which different levels of aggregation are meaningful?


What additional information is needed for the different levels? What is the variability and the cardinality of the dimensions?

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

21

Facts

Fact = Subject of Analysis Measures = Attributes describing facts

Sales Quantity, Price

Derived Measures
Additivity of Measures

Profit
Quantity Items in stock additiv resp. to plants/ not additiv w.r.t. time profit margin

globally additiv additiv for some dimensions

not additiv at all

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

22

Dimensions

Dimensions = static structure of business information


Used for navigating the data space Choosing the necessary granularity Dimension Members = Instances of a dimension
e.g. 8.12.1997 and Juli 1997 are members of dimension time

Structuring Dimension
using different dimension levels (hierarchies) using descriptive attributes

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

23

Simple Hierarchies
Month Quarter 1/2 Year Period Year

Januar 99 Februar 99
Mrz 99 April 99 Mai 99 Juni 99 Juli 99 August 99 Sept. 99 ............
DW: Tools and Projects

1. Quartal 99

Dimension Level
1. Halbjahr 99

2. Quartal 99 1999 3. Quartal 99 2. Halbjahr 99

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

24

Unbalanced Hierarchies
Plant/Site Business Unit

Business Division

Enterprise

Plant1
Div A ... Great Outdoors Bu 1 ... Div B Bu 2 ...
DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Plant 1

Plant 0815

25

Alternative Hierarchies
Customer Geogr. Region Country

Bavaria Customer 01 Customer 02 Customer 03 Customer 04 Customer 05 Customer 06 Partner Retailer Consumer Customer Group
DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Hessen Hamburg

Germany

26

Alternative Pathes
Ort Geogr. Region Country

Munich 01 Munich 02

Bayern Hessen

Munich 03
Wrzburg 01 Wrzburg 02 Frankfurt 01

Hamburg
Germany Germany (South) Germany (West)

Germany (North)

Sales Region
DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

27

Criteria for a good MD Design

dimensions should be independent dimensionality of a cube should be max. 7-8 dimensions


interpretation of results is difficult for a large number of dimensions

hierarchies should have a fan-out of max. 30


long drill-down times large drill-down results insert additional levels for structuring purposes (e.g. insert state
between city and country)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

28

Graphical Notation (ME/R)


Fact-Name

Measure 1 ... Measure n

A Fact and its measures .. is characterized by dimensions

Level-Name Attribute 1 ... Attribute n

A Dimension Level with attributes ..can be classified according to...

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

29

Example Data Model


Year

Month Region Day Country Sale Line Prod. Type Product Revenue Cost Order Qty Sales Rep Name Code

Branch

Customer

Margin Range

Customer Type
DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

30

Cognos PowerPlay- Architecture


PowerPlay Client

Client

PowerCube (Proprietary, Compressed) Transformer PowerPlay Server

OLEDB for OLAP

Server

Impromptu

OLEDB Provider e.g. MS OLAP Services, SAP BW,

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

31

Logical+Physical Design
Using the Data Warehouse

Requirement Analysis

Implementation

Conceptual Design (Implementation Independent) Logical + Physical Design (tool specific)

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

32

Practical Demonstration

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

33

Summary and Conclusions

Multidimensional modeling is performed on different levels


conceptual model (tool independent level) following requirement analysis logical and physical design before implementation

Distinction between two types of data


quantifying data: measures, cells of the cube, fact table qualifying data: properties, dimensions, dimension tables

Hierarchical structures of dimensions can be complex ME/R notation can be used to document conceptual models Several ways to map an MD model to a relational DB
2001 FORWISS,Carsten Sapia - sapia@forwiss.de

DW: Tools and Projects

34

Canonical Query (I)


Restriction Element

A
B A B

Result Measures m1 m2

Query Result Result Granularity

DW: Tools and Projects

2001 FORWISS,Carsten Sapia - sapia@forwiss.de

35

Canonical Query (II)


Canonical Query Definition Result Measures Restriction Elements Result Granularity m1 r1 g1 r2 g2 mk rn gn

SELECT g1,...,gn, aggr(m1),..., aggr(mk) FROM FactName, Dim1,..., Dimn WHERE Dim1.level(r1) = r1 AND ... AND Dimn.level(rn) = rn AND Dim1.d1=FactName.d1 AND ... AND Dimn.dn=FactName.dn GROUP BY g1,...,gn
DW: Tools and Projects 2001 FORWISS,Carsten Sapia - sapia@forwiss.de

Вам также может понравиться