Вы находитесь на странице: 1из 25

Data Warehousing

Course Introduction

11/12/2019 Adnan Waheed 1


Course Introduction
• Course title
• Data Warehousing
• Course code
• IT-444
• Prerequisite
• Database Management Systems

11/12/2019 Adnan Waheed 2


Course Books
• Text Book
1. Paulraj Ponniah, ―Data Warehousing Fundamentals: A Comprehensive
Guide for IT Professionals‖, ISBN: 0-471-41254-6, John Wiley & Sons Inc.,
New York.
• Reference Books
1. Vincent Rainardi, ―Building a Data Warehouse with Examples in SQL
Server‖, ISBN: 978-1-59059-931-0.
2. Ralph Kimball, ―The data warehousing toolkit‖, John Wiley & Sons Inc.

11/12/2019 Adnan Waheed 3


Assessment Criteria
Sessional 25% Mid 25% Final 50%

Quizzes 10%

Assignments 5%

Project/Presentation 10%

11/12/2019 Adnan Waheed 4


Course Contents
1. Course Introduction
2. Need for Data Warehousing
3. Data Warehouse Building Blocks
4. Design Approaches
5. Architectural Components
6. Defining the Business Requirements
7. Data Warehouse Technical Architecture
8. Case Study
9. Dimensional Modeling
10. Advanced Dimensional Modeling
11. ETL
11. ETL Practical
12. OLAP
13. OLAP Practical
14. Presentations / Vivas

11/12/2019 Adnan Waheed 5


Problem: Heterogeneous Information Sources
“Heterogeneities are everywhere”
Personal
Databases

World
Scientific Databases
Wide
Web
Digital Libraries

 Different interfaces
 Different data representations
 Duplicate and inconsistent information
CS 336 6
Problem: Data Management in Large Enterprises
• Vertical fragmentation of informational systems
(vertical stove pipes)
• Result of application (user)-driven development of
operational systems
Sales Planning Suppliers Num. Control
Stock Mngmt Debt Mngmt Inventory
... ... ...

Sales Administration Finance Manufacturing ...


CS 336 7
Goal: Unified Access to Data

Integration System

World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases

 Collects and combines information


 Provides integrated view, uniform user interface
 Supports sharing
CS 336 8
Why a Warehouse?
 Two Approaches:
 Query-Driven (Lazy)
 Warehouse (Eager)

Source Source

CS 336 9
The Traditional Research Approach
 Query-driven (lazy, on-demand)
Clients

Integration System Metadata

...

Wrapper Wrapper Wrapper

...
Source Source Source

CS 336 10
Disadvantages of Query-Driven Approach
 Delay in query processing
 Slow or unavailable information sources
 Complex filtering and integration

 Inefficient and potentially expensive for frequent


queries
 Competes with local processing at sources

 Hasn’t caught on in industry

CS 336 11
The Warehousing Approach
 Information Clients
integrated in
advance Data
Warehouse
 Stored in wh for
direct querying
Integration System Metadata
and analysis
...

Extractor/ Extractor/ Extractor/


Monitor Monitor Monitor

...
Source Source Source
CS 336 12
Advantages of Warehousing Approach
• High query performance
• But not necessarily most current information
• Doesn’t interfere with local processing at sources
• Complex queries at warehouse
• OLTP(online transaction processing) at information sources
• Information copied at warehouse
• Can modify, annotate, summarize, restructure, etc.
• Can store historical information
• Security, no auditing
• Has caught on in industry

CS 336 13
Not Either-Or Decision
• Query-driven approach still better for
• Rapidly changing information
• Rapidly changing information sources
• Truly vast amounts of data from large numbers of sources
• Clients with unpredictable needs

CS 336 14
What is a Data Warehouse?
A Practitioners Viewpoint

“A data warehouse is simply a single, complete, and


consistent store of data obtained from a variety of
sources and made available to end users in a way
they can understand and use it in a business
context.”
-- Barry Devlin, IBM Consultant

CS 336 15
What is a Data Warehouse?
An Alternative Viewpoint
“A DW is a
• subject-oriented,
• integrated,
• time-varying,
• non-volatile
collection of data that is used primarily in organizational decision
making.”
-- W.H. Inmon, Building the Data Warehouse, 1992

CS 336 16
A Data Warehouse is...
• Stored collection of diverse data
• A solution to data integration problem
• Single repository of information
• Subject-oriented
• Organized by subject, not by application
• Used for analysis, data mining, etc.
• Optimized differently from transaction-oriented db
• User interface aimed at executive

CS 336 17
… Cont’d
• Large volume of data (Gb, Tb)
• Non-volatile
• Historical
• Time attributes are important
• Updates infrequent
• May be append-only
• Examples
• All transactions ever at Sainsbury’s
• Complete client histories at insurance firm
• LSE financial information and portfolios

CS 336 18
Generic Warehouse Architecture
Client Client
Query & Analysis
Design Phase Loading

Warehouse Metadata
Maintenance

Optimization
Integrator

Extractor/ Extractor/ Extractor/


Monitor Monitor Monitor

...
CS 336 19
Data Warehouse Architectures:
Conceptual View
Operational Informational
• Single-layer systems systems

• Every data element is stored once only


• Virtual warehouse “Real-time data”

• Two-layer
• Real-time + derived data
• Most commonly used approach in Operational
systems
Informational
systems

industry today
Derived Data

Real-time data

CS 336 20
Three-layer Architecture: Conceptual View
• Transformation of real-time data to derived data
really requires two steps

Operational Informational
systems systems

View level
“Particular informational
Derived Data
needs”

Physical Implementation
Reconciled Data
of the Data Warehouse

Real-time data

CS 336 21
Data Warehousing: Two Distinct Issues
(1) How to get information into warehouse
“Data warehousing”
(2) What to do with data once it’s in warehouse
“Warehouse DBMS”
• Both rich research areas
• Industry has focused on (2)

CS 336 22
Issues in Data Warehousing
• Warehouse Design
• Extraction
• Wrappers, monitors (change detectors)
• Integration
• Cleansing & merging
• Warehousing specification & Maintenance
• Optimizations
• Miscellaneous (e.g., evolution)

CS 336 23
OLTP vs. OLAP
 OLTP: On Line Transaction Processing
 Describes processing at operational sites
 OLAP: On Line Analytical Processing
 Describes processing at warehouse

CS 336 24
Warehouse is a Specialized DB
Standard DB (OLTP) Warehouse (OLAP)
• Mostly updates  Mostly reads
• Many small transactions  Queries are long and complex
• Mb - Gb of data  Gb - Tb of data
• Current snapshot  History
• Index/hash on p.k.  Lots of scans
• Raw data  Summarized, reconciled data
• Thousands of users (e.g.,  Hundreds of users (e.g.,
clerical users) decision-makers, analysts)
CS 336 25

Вам также может понравиться