Вы находитесь на странице: 1из 24

1

Data Warehouse
Understanding Data Warehouse

Amalia Anjani A.
anjani.arifiyanti@gmail.com

What
is Data Warehouse?
Defined in many different ways, but not rigorously.
A decision support database that is maintained separately from the

organizations operational database


Support information processing by providing a solid platform of
consolidated, historical data for analysis.

A single, complete, and consistent source of data obtained from a variety of

sources and made available to end users in a way that they can understand
and use in business context Barry Devlin

A data warehouse is a subject-oriented, integrated, time-variant, and

nonvolatile collection of data in support of managements decision-making


process.W. H. Inmon

Data warehousing:
The process of constructing and using data warehouses

DW Subject Oriented
Operational Systems

Data Warehouse System

Sales system

Customer data

Payroll system

Employee data

Purchasing system

Vendor data

DW Subject Oriented
Oriented to the major subject areas of the organization

defined in the data model.


Insurance company: customer, product, claim, account, etc

Operational database organized differently


Based on type of insurance: auto, life, medical, etc

Giving information about a particular subject rather than the

details regarding the on-going operations of the company


Focusing on the modeling and analysis of data for decision

makers, not on daily operations or transaction processing.

DW Integrated
Operational Systems

Data Warehouse System

Marketing system

Order system

Billing system

Customer data

DW - Integrated
Constructed by integrating multiple, heterogeneous data

sources
relational databases, flat files, on-line transaction
records
Data cleaning and data integration techniques are
applied.
Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
E.g., Hotel price: currency, tax, breakfast covered, etc.

When data is moved to the warehouse, it is converted.

DW Time Variant
Operational Systems

Order system

60-90 days

Data Warehouse System

Customer data

5-10 years

DW Time Variant
The time horizon for the data warehouse is significantly

longer than that of operational systems.


Operational database: current value data.
Data warehouse data: provide information from a

historical perspective (e.g., past 5-10 years)


Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly

DW Non Volatile
Operational Systems

Data Warehouse System

create

update

Order system

insert

delete

load

Customer
data

access

10

DW Non Volatile
A physically separate store of data transformed from the

operational environment.
Operational update of data does not necessarily occur in

the data warehouse environment.


Often requires only two operations in data accessing:
initial loading of data and access of data.

11

Why
use Data Warehouse?
We collect tons of data, but we cant access it.
We need to slice and dice the data every which way.
Business people need to get at data easily.
Just show me what is important.
We spend entire meeting arguing about who has the right

number rather than making decisions.


We want people to use information to support more factbased decision making.

12

The Data Related Problem


Data in organizations often has the following

characteristics:
Massive volume
Dispersed
Difficult to access
Badly integrated
Complex data structures
Not suitable for high level business queries

13

The Information Needs Behind the Data Warehouse


Organization need information which is:
More holistic in its coverage of the business
Selected and enriched
Easily accessible
More easily understandable
High quality
Directly applicable to the decision situation

14

Data Sources
Production data

Data from transactional processes


Internal data
Spreadsheets, document, customer profiles, transactional
databases
Archived data
External data
Data from external system

15

Data Warehouse vs. Heterogeneous DBMS


Traditional heterogeneous DB integration:
Build wrappers/mediators on top of heterogeneous databases
Query driven approach

A query posed to a client site is translated into queries appropriate

for individual heterogeneous sites; The results are integrated into


a global answer set
Involving complex information filtering
Competition for resources at local sources

Data warehouse: update-driven, high performance


Information from heterogeneous sources is integrated in advance

and stored in warehouses for direct query and analysis

16

The integration problem

17

The Integrated Data Warehouse

Data
Warehou
se

18

Data Warehouse vs. Operational DBMS


OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory, banking,

manufacturing, payroll, registration, accounting, etc.


OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
Distinct features (OLTP vs. OLAP):
User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
Database design: ER + application vs. star + subject
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only

19

OLTP VS OLAP
OLTP

OLAP

users

clerk, IT professional

knowledge worker

function

day to day operations

decision support

DB design

application-oriented

subject-oriented

data

current, up-to-date
detailed, flat relational
isolated
repetitive

historical,
summarized, multidimensional
integrated, consolidated
ad-hoc
lots of scans

unit of work

read/write
index/hash on prim. key
short, simple transaction

# records accessed

tens

millions

#users

thousands

hundreds

DB size

100MB-GB

100GB-TB

metric

transaction throughput

query throughput, response

usage
access

complex query

20

Why Separate Data Warehouse?


High performance for both systems
DBMS tuned for OLTP: access methods, indexing,

concurrency control, recovery


Warehousetuned for OLAP: complex OLAP queries,
multidimensional view, consolidation.

Different functions and different data:


Missing data: Decision support requires historical data which

operational DBs do not typically maintain


Data consolidation: Decision Support requires consolidation
(aggregation, summarization) of data from heterogeneous
sources
Data quality: Different sources typically use inconsistent data
representations, codes and formats which have to be
reconciled

21

Requirement
The DW system must make information easily accessible.
The DW system must present information consistently.
The DW system must adapt to change
The DW system must present information in a timely way
The DW system must be a secure bastion that protect the

information assets
The DW system must serve as the authoritative and
trustworthy foundation for improved decision making.
The business community must accept the DW system to
deem it successful.

22

23

What to learn next?


Multi-dimensional data model
Cube
Scheme
Architecture DW

24

Individual Assignment
Create a report that explain: (A4 page)
Multidimensional data model (cube, fact table, dimension, etc)
Scheme (star, snowflake, etc)
Architecture data warehouse system

Print the report, and bring it to next class.

Вам также может понравиться