You are on page 1of 12

An overview of Data Warehousing and OLAP Technology

Surajit Chaudhuri
Microsoft Reserch, Redmond

Umeshwar Dayal
HP Labs, Palo Alto

Presented by:- Krishma Dutta

Outline
Introduction Need of Data Warehousing and OLAP Architecture of Data Warehousing Front-Back End Tools Database Design Methodology Conclusion

Data Warehousing- An Introduction


Defined in many different ways:
In simplest terms Data Warehouse can be defined as collection of Data marts A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decision-making process.W. H. Inmon A data warehousing is a collection of decision support technologies, aimed at enabling the knowledge worker to make better decisions

Need of Data Warehousing and OLAP


Data Warehousing

Decision support requires historical data which operational Databases do not typically maintain

Decision support requires consolidation (aggregation, summarization) of data

OLAP

Unacceptable Performance while execution of complex OLAP queries

Multidimensional data model is not supported by DBMS

Tiered Architecture
External Sources Operational Databases
Extract Transform Load Refresh

Tier1: Data Warehouse Server

Tier2: OLAP Server OLAP Server

Tier3: Clients

Data Warehouse

Serve

Analysis Query/Reports Data mining

Data Marts

Data Sources

Data Storage

OLAP Engine Front-End Tools

Front-Back End Tools


Front End Tools
Rollup (Drill-up) Drill-down (Roll-down) Slice and dice
Toronto Montreal
Household Automobile Kitchen Office

Back End Tools


Data Cleaning Load Refresh

Bob Jamie Brit Todd

Conceptual Model
Date
TV PC PVR sum 1

sum

Total annual sales of TV in U.S.A. U.S.A Canada Mexico sum

ALL

Country

Database Design Methodology


Database Designs

Star Schema

Snowflake Schema

A fact table in the middle connected to a set of dimension tables

A refinement of star

schema where
hierarchy is normalized into a set

of smaller dimension
tables, forming a shape similar to snowflake

Star Schema
Time
T_key T_day T_day_week T_month T_quarter T_year

Sales Fact Table


Time_key Item_key item
I_key I_name I_brand I_type I_supplier_type

Branch
B_key B_name B_type

Branch_key Location_key Measures Units_sold

location
location_key street city province country

Dollars_sold
Avg_sales

Star Schema

Snowflake Schema
Time
T_key T_day T_day_week T_month T_quarter T_year

Sales Fact Table


time_key item_key Item
I_key I_name I_brand I_type I_supplier_type Supplier S_key S_type

Branch
B_key B_name B_type

branch_key location_key Measures units_sold

Location
location_key street city City C_key C_city C_province C_country

dollars_sold
avg_sales

Snowflake Schema

Summary
Data warehouse
A subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decision-making process

Architecture of Date warehouse


Consisting of Warehouse servers, front end and back end tools

OLAP operations: drilling, rolling, slicing, dicing and pivoting Multi dimensional model of Data warehouse
Data cube Star Schema Snowflake Schema

Thank You