Вы находитесь на странице: 1из 16

22-01-10

Seminar
Muhammad Tariq

In the Beginning, life was


simple…

1
22-01-10

But …

Our Information Needs…

2
22-01-10

Kept growing. (The Spider web)

 A decision support database that is


maintained separately from the organization’s
operational databases.
 A data warehouse is a
 subject-oriented,
 integrated,
 time-varying,
 non-volatile
collection of data that is used primarily in
organizational decision making

3
22-01-10

 “A collection of integrated, subjectoriented


databases designed to supply the
information required for decisionmaking.”
-- W. Inmon (1992)

4
22-01-10

 Data Warehouse is designed around


“subjects” rather than processes
 A company may have
 Retail Sales System
 Outlet Sales System
 Catalog Sales System
 DW will have a Sales Subject Area

5
22-01-10

 Heterogeneous Source Systems


 Little or no control
 Need to Integrate source data
 For Example: Product codes could be different
in different systems
 Arrive at common code in DW

6
22-01-10

 Most business analysis


has a time component
 Trend Analysis (historical
data is required)

7
22-01-10

Used For …

 Making consolidated reports


 Finding relationships and correlations
 Data mining
 Examples
 Banks identifying credit risks
 Insurance companies searching for fraud
 Medical research

How Do Data Warehouses Differ


From Operational Systems?
 Goals
 Structure
 Size
 Performance optimization
 Technologies used

8
22-01-10

Operational v/s Information


System
Features Operational Information
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database Knowledge workers
professional
Function Day to day operation Decision support
Data Current Historical
View Detailed,flat relational Summarized,
multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Mostly read

Operational v/s Information


System
Features Operational Information
Focus Data in Information out
Number of records tens millions
accessed

Number of users thousands hundreds


DB size 100MB to GB 100 GB to TB
Priority High performance,high High flexibility,end-
availability user autonomy
Metric Transaction throughput Query througput

9
22-01-10

 A logical design technique that seeks to


eliminate data redundancy
 Illuminates the microscopic relationships
among data elements
 Perfect for OLTP systems
 Responsible for success of transaction
processing in Relational Databases

10
22-01-10

ER models are NOT suitable for DW?


 End user cannot understand or remember
an ER Model
 Many DWs have failed because of overly
complex ER designs
 Not optimized for complex, ad-hoc queries
 Data retrieval becomes difficult due to
normalization
 Browsing becomes difficult

 Facts are stored in FACT Tables


 Dimensions are stored in DIMENSION
tables
 Dimension tables contains textual
descriptors of business
 Fact and dimension tables form a Star
Schema
 “BIG” fact table in center surrounded by
“SMALL” dimension tables

11
22-01-10

 Measures or facts
 Facts are “numeric” & “additive”
 For example; Sale Amount, Sale Units
 Factors or dimensions
 Star Schemas
 Snowflake & Starflake Schemas

12
22-01-10

 Data Extraction
 Data Cleaning
 Data Transformation
 Convert from legacy/host format to warehouse
format
 Load
 Sort, summarize, consolidate, compute views,
check integrity, build indexes, partition

 Consumes 70-80% of project time


 Heterogeneous Source Systems
 Little or no control over source systems
 Source systems scattered
 Different currencies, measurement units
 Ensuring data quality

13
22-01-10

 Commercial tools:
 Warehouse Builders (Oracle)
 MS Data Transformation Services
 DataStage
 SAS ETL Server
 Typical functions
 Define source, query (run SQL), define
transformation, define target, verify
transformation, schedule run, audit report

 Tools
 Query & reporting
 OLAP
 Data mining, visualization, segmentation,
clustering
 New developments: text mining, web mining &
personalization
 Mining multimedia data

14
22-01-10

 Commercial tools
 Crystal Report, Impromptu, WebFocus
 Increasingly common mode of delivery:
 Web-enabled

Useful URLs

 Ralph Kimball’s home page


 http://www.rkimball.com
 Larry Greenfield’s Data Warehouse Information
Center
 http://pwp.starnetinc.com/larryg/
 Data Warehousing Institute
 http://www.dw-institute.com/
 OLAP Council
 http://www.olapcouncil.com/

30

15
22-01-10

Thank you

16

Вам также может понравиться