Вы находитесь на странице: 1из 43

Introduction to

Data Warehousing
The Architecture of Data
What’s has been
learned from data
summaries by Logical model
Business
who, what, rules
when, where,... physical layout of
data
Metadata

Database schema
who,
Summary data what,
when,
Operational data where,
Why all the Excitement ?
 The ability to perform a complete financial analysis of business
processes will allow organizations to make decision based on an
understanding of the entire system rather than using rough
estimates based on incomplete data

 The ability to rationalize and automate the process of building an


integrated enterprisewide information store rather than developing
many individual DSSs and the corresponding infrastructure.

 The h/w, s/w and storage cost related to the development,


deployment and maintenance of large informational data stores
continue to decline.

 The benefits of data warehousing can be easily extended to


strategic decision-making which can yield vary large and tangible
benefits.
The Need for data Warehousing
 Data Warehousing is an architectural construct of
information system that provides users with current and
historical decision support information that is hard to
access in traditional operational data store.
 Decisions need to be made quickly and correctly, using
all available data.
 Users are business domain experts, not computer
professionals.
 The amount of data doubles every 18 month, which
affects the response time and sheer ability to comprehend
its contents.
Computing Paradigm Shift
 A traditional view of computing can be that of a user
that access a computer via communication network.
** There is a need to access a known computer
paradigm that resided on a known remote system.
 A new way of computing is that users use
computers to solve problems and to request services
distributed across a network.
** Distributed client/server computing is being
positioned in enterprise.
Business Paradigm Shift

 A traditional business enterprise from the manual


back and front office to an automate back office for
online transaction processing.
 These innovations improved the processing speed
and throughput while having the same business
results.
 Users are demanding more value for the money
and are well aware of the competitive offerings.
The Role of IT
New Enterprises,
Business Process Reengineering
Paperless Office
Expert System

C/S, Distributed computing


Open System
Multimedia, Object orientation
Automated enterprises
Back Office Automation
OLTP
Proprietary System Mainframe
Manual Back Office
Manual Front Office
Operational/Informational
Data
 Operational data focuses on transactional functions such
as bank card withdrawals and deposits.
*This data is part of the corporate infrastructure;
it is detailed, nonredundant and updateable and this
reflects current values.
 Informational data is organized around subjects such as
customer, vendor and product.
 It focuses on providing answers to problems posed by
decision maker.
 It is often summarized redundant to support varying data
views and nonupdateable.
 Informational data is obtained from Operational data.
Operational System

 Are Organized by application.


 Are update intensive
 Use current data
 Are optimize for higher performance
 Access few records per transaction, often direct
access by primary key.
 Support a large no. of short transaction.
 Support a large no. of concurrent user.
Differences Between OD Vs ID
OD ID
Data Access Current Value Summarized, Derived
Data Organization By Application By Subject
Data Stability Dynamic Static until refresh
Data Structure Optimize for transactions Optimize for complex
query
Access Frequency High Medium to Low
Access type Read/Update/Delete Read/ Aggregate
Usage Predictable/Repetitive Ad hoc,Unstructure
Response Time Sub second(<1 s to 2-3 s) Several second to min.
Operational Data Store
 An ODS is an architectural concept to support day-to-
day operational decision support and contains current
value data propagated from operational application.
 The data maintained in the ODS to be subjected to
frequent changes as the corresponding data in the
operational system change.
 ODS provides an alternative to operational DSS
application
*DSS access data directly from the OLTP system
*DSS has performance impact on OLTP.
Data Warehouse
Characteristics
A data warehouse can be viewed as an information
system with
 It is a database designed for analytical task, using
data from multiple application.
 Its usage is read-intensive
 Its contents is periodically update (addition)
 It contains current and historical data to provide a
historical perspective of info.
 It contains a few large tables.
 Each query frequently results in a large results set
and involves frequent full table scan and multiple
join.
What is a Data Warehouse
 “A Data Warehouse is a Subject-oriented,
Integrated, Time-variant, and Nonvolatile
collection of data in support of
Management’s Decision-making Process.”
--- W. H. Inmon
 Collection of data that is used primarily in
organizational decision making
 A decision support database that is maintained
separately from the organization’s operational database
Data Warehouse - Subject
Oriented
 Subject Oriented: oriented to the major subject
areas of the corporation that have been defined
in the data model.
E.g. for an insurance company: customer,
product, transaction or activity, policy, claim,
account, and etc.
 Operational DB and applications may be
organized differently
E.g. based on type of insurance's: auto, life,
medical, fire, ...
Data Warehouse – Integrated

 There is no consistency in encoding, naming


conventions, …, among different data sources.

 Heterogeneous data sources.

 When data is moved to the warehouse, it is


converted.
Data Warehouse - Non-Volatile

 Operational data is regularly accessed and


manipulated a record at a time, and update is done
to data in the operational environment.

 Warehouse Data is loaded and accessed. Update of


data does not occur in the data warehouse
environment.
Data Warehouse - Time
Variance
 The time horizon for the data warehouse is
significantly longer than that of operational
systems.
Operational Database: Current value data.
Data Warehouse Data : Nothing more than a
sophisticated series of snapshots, taken of at some
moment in time.
 The key structure of operational data may or may
not contain some element of time. The key
structure of the data warehouse always contains
some element of time.
Terminology
 Current Detail Data: Data that is acquired directly from
the operational databases and often represents an entire
enterprises.
 Old Detail Data: This represents history of the subject
areas; this data is trend analysis possible.
 Data Marts: A subset of a Data Warehouse that supports
the requirements of a particular department or business
function.
 Summarized Data: Data that is aggregated along the lines
required for executive level reporting, trend analysis and
enterprise wise decision making
Continue…
 Drill Down: An ability of a knowledge worker to perform
business analysis in a top down fashion, traversing the
summarization levels from highly summarized data to the
underlying current or old detail.
 Meta Data: It is a data about data containing
 The location and description of warehouse system
components.
 names, definition, structure and content of the data
warehouse and end user view.
 Identification of authoritative data source.
 A history of warehouse updates.
Advantages of Warehousing
• High query performance
• Queries not visible outside warehouse
• Local processing at sources unaffected
• Can operate when sources unavailable
• Can query data not stored in a DBMS
• Extra information at warehouse
– Modify, summarize (store aggregates)
– Add historical information
2-Tier Data Warehouse Arch.
Warehouse Server

META DATA
CLIENT
WAREHOUSE DATA
GUI/Presentation
Logic
Query Specification Data Logic
Data Analysis Data Services
Report Formatting Metadata
Summarizing File Services
Data Access
Multitiered Data Warehouse Arch.
Warehouse Server
Meta Data Meta Data
Client Multidimensional Warehouse Data
Data Server

GUI/Presentation Filtering Data Logic


Logic
Summarizing Data Services
Query Specification
Metadata
Data Analysis Metadata File Services
Report Formatting
Summarizing Multidimension
Data Access
External Operational
data databases
sources
Data Extract
Transform
The Architecture of
1.Admin Clean Up
Load
Data Warehouse
Platform
Meta Data 6.Mgmt Environment
Platform

2.Repository
3.Data Warehouse
DBMS

4.Data Marts 7.Informatio


n Delivery
System
MDDB MRDB

5.Application
and Tools

OLAP Tools Data Mining Tools Report, Query, EIS Tools

**Multi Relational Database **Multi Dimensional Database


DATA MARTS
Data Marts

A subset of a Data Warehouse that supports


the requirements of a particular department or
business function.
Data Marts

Differences From Data Warehouses


– Only focuses on the requirements for users
associated with one business function
– Usually don’t contain detailed operational
data
– More easily understood and navigated
Types of Data Marts

•Multidimensional

•Relational

•Networked
Multidimensional Data Marts

 Contain numeric data

 Contain sparsely populated matrices

 Maintain structure as data enters the framework

 Only changed cells are updated


Relational Data Marts

 Contains numeric and textual data

 Maintain structured data

 Employ indexes

 Support star schemas


Networked Data Marts

 Distributed data marts can be accessed


from a single workstation.
 Data needs to be cleansed and
standardized.
 A standard metadata directory must exist.
CLIENT/SERVER

COMPUTING MODEL
The Client/Server computing model implies
a cooperative processing of requests
submitted by a client to the server which
processes the requests and return the
results to the client
HOST BASED
PROCESSING

Host CPU Corporate Database

Application
 Host-based processing environment does not have
any distributed application capabilities.

 Host-based application processing is perform on


one computer system with attached unintelligent
“dumb” terminal.
MASTER-SLAVE PROCESSING

Data

Slave CPU Host CPU master Slave CPU

Application

Slave CPU
 Next-Higher level of distributed application
processing

 Slave computer are attached to master computer and


perform application processing related function only
as directed by their master.

 Application processing in a master-slave


environment is some what distributed but
unidirectional master to slave.
Ist Generation C/S Processing
DATA

SERVER PRINTER

Application

LAN
Application

Application
 Found in LAN, also known as Shared Device LAN.
 PCs are attached to a system processing environment
device that allows these PCs to share a common resource
Eg-A file, A printer.
 Such shared device are called as server.
Eg- File server, Printer server.

DRAWBACK
 All application processing is performed on individual
PCs and only certain function is are distributed.
C/S Processing Model
DATA

SERVER
PRINTER
APPLICATION

Application

LAN
Application

Application
 This model is extension of shared device processing.

 In this model, application processing is divided between


the client and server.

 The processing is actually initiated and partially


controlled by the client but not in Master/Slave fashion.

 Both the client and server cooperate to successfully


execute the application
IInd Generation C/S Processing
 It deals with the Server, dedicated to application, data,
transaction, mgmt, etc.

 Data structure supported by this model range from


relational to multidimensional to structure to
unstructured.

 The IInd Gen. is characterized by a multitiered


architecture which promotes migration of the application
logic from the client to the application server in a three
tiered environment.
Server Function
The following function are required of server by user.
1.File Sharing
2.Printer Sharing
3.Database Access
A server must satisfied these requirement.

1.Multi user support


2.Scalability
3.Performance and throughput
4.Storage capacity
5.Avalibility
6.Networking and Communication

Вам также может понравиться