Вы находитесь на странице: 1из 40

Data warehousing concepts

Hanu

Agenda

OLTP Vs OLAP
Modeling Techniques
User Profile
Top down approach
Bottom up approach

Traditional OLTP systems


OLTP systems are highly structured sets of
information that support the ongoing and day-to-day
operation of an organization

These databases usually hold information about


small subsets of the organization split on the basis of
Business functions
e.g. sales, purchase,travel
Geographical locations e.g. Northern region,
Eastern region
Logical units

e.g. REUD, BCMD, IHLD, EISA

OLTP (Contd)
Transactional database require a highly
normalized database design to achieve
performance goals and to optimize on storage
space
These databases need to record, on a real-time
basis, every transaction that the organization
enters into

What is OLAP ?
An organizations success also depends on its
ability to analyze data (through views and reports)
and make intelligent decisions that potentially
affect its future. Systems that facilitate such
analyses are called
On Line Analytical
Processing (OLAP) systems

Why not OLTP for OLAP?


OLTP databases do not contain historical data
OLTP databases contain small subsets of
organizational data

OLTP databases are heterogeneous in nature and


geographically distributed systems

In other words...
OLTP systems are
Fragmented
Not integrated.
Difficult to access.
Disparate sources.
Disparate platforms.
Poor data quality.
Redundant data.
Difficult to understand.

Data warehouse
A Data Warehouse is a copy of the enterprise
operational data, suitably modified to support the
needs of analytical processes and stored outside
the operational database.
According to Bill Inmon, known as the father of
Data Warehousing, a data warehouse is a subject
oriented, integrated, time-variant, nonvolatile
collection of data in support of management
decisions.

OLAP Vs OLTP
Data warehouse database
Designed for analysis of
business measures by categories
and attributes
Optimized for bulk loads and
large, complex, unpredictable
queries that access many rows
per table
Loaded with consistent, valid
data; requires no real time
validation
Supports few concurrent users
relative to OLTP

OLTP database
Designed for real-time business
operations
Optimized for a common set of
transactions, usually adding or
retrieving a single row at a time
per table
Optimized for validation of
incoming data during
transactions; uses validation
data tables
Supports thousands of
concurrent users

Data warehouse architecture


Data Warehouse
Server
(Tier 1)

OLAP Servers
(Tier 2)

Clients
(Tier 3)

e.g., MOLAP
Semistructured
Sources

Data
Warehouse
extract
transform
load
refresh
etc.

Analysis
serve
Query/Reporting

serve
e.g., ROLAP

Operational
DBs

serve

Data Marts

Data Mining

D/W Architecture Goals


Deliver a great user experience user acceptance
is the measure of success
Function without interfering with OLTP systems
Provide a central repository of consistent data
Answer complex queries quickly
Provide a variety of powerful analytical tools,
such as OLAP and data mining

Characteristic of D/W

Are based on a dimensional model


Contain historical data
Include both detailed and summarized data
Consolidate disparate data from multiple
sources while retaining consistency
Focus on a single subject, such as sales,
inventory, or finance

User Profile
Statisticians (2%)
Knowledge workers (15%)
Information Consumers (83%)

Steps in implementing D/W


Identify and gather requirements
Design the dimensional model
Develop the architecture, including the
Operational Data Store (ODS)
Design the relational database and OLAP cubes
Develop the data maintenance applications
Develop analysis applications
Test and deploy the system

Identify and gather requirements

Identify the Sponsor


Meet the Business Users
Meet Data experts
Communicate with users often and
thoroughly

Identify The Business Areas


For Telecom D/W

Customer Behavior
Corporate Customer
Customer Service
Accounts
Settlements
Partner
Supplier
Competitor
Marketing

Sources and Targets


Sources
Telephone call detail recording
Customer Service such as ordering service and
disconnecting lines
Customer payment processing

Targets

Studies of minutes of call use by customer group


Segmentation of customers by minutes of call use
Product bundling analysis
Customer Payment analysis

Design the dimensional model

Identify the dimensions


Should match with Business needs
Identify the grain of the detail
Decide on
Star Schema
Snow-flake Schema
Star-flake Schema

Star Schema

Star Schema

Snowflake Schema

Snowflake Schema

Design consideration of Dimension


Table

Star or Snowflake
Level of hierarchies
Surrogate Key
Date and Time

Slowly changing Dimension


Type 1: Overwrite the dimension record.
Type 2: Add a new dimension record.
Type 3: Create new fields in the dimension
record.
Tracking bands can reduce the updation to
some extent
Nightmare if source and report not in sync

Rapidly changing Dimensions


Breaking offending
dimension attributes
Fact less facts!
Confirmed
Dimensions

Fact tables

Multiple Fact tables


Additive measures
Non-additive/Semi additive measures
Calculated Measures
Granularity

ETL
Extract, Transform and Load process may be
described as the process of selecting, migrating,
transforming, cleansing and converting mapped
data from the legacy environment to data
warehouse environment.

Extraction
Push strategy
Pull strategy

Transformation
Transformation involves applying complex filters,
removing the inconsistency between data from
different sources, conditional transforms, complex
calculations to create derived data etc. Cleansing
of data could be an important part of the
transformation process

Loading
Loading involves the insertion of data into the
target system, that is, the data warehouse. Loading
is the last step before the users see the data. It
involves populating the fact and dimension tables
as well as aggregation tables that are part of the
physical data model

Loading approach
Transform and Load
Load and Transform
Transform while Loading

Issues in Loading
Volume and frequency of loading
Disk space
Scheduling

Data Marts
A data mart is a repository of data gathered from
operational data and other sources that is designed
to serve a particular community of knowledge
workers. In scope, the data may derive from an
enterprise-wide database or data warehouse or be
more specialized. The emphasis of a data mart is on
meeting the specific demands of a particular group
of knowledge users in terms of analysis, content,
presentation, and ease-of-use

OLAP
ROLAP
MOLAP
HOLAP

Few Popular tools


ETL
Data Junction.
Microsoft DTS (Available with SQL Server 7.0 and
above)
Oracle Warehouse Builder.
Informatica- PowerCenter
IBM- Data Warehouse Manager
AbIntio

Few Popular tools


OLAP

Cognos
Business Objects
Power Analyzer
Microsoft Analysis service
Micro strategy
DB2 OLAP Server
Hyperion OLAP Server

Few Popular tools


Data Mining
Intelligent Miner
DARWIN
SAS

References
http://192.168.121.14/asp/Search/DispDoc.asp?Do
cNo=8703&KCURating=8.61&ContentType=Inte
rnal+Literature
http://www.datawarehouse-training.com
http://www.datawarehousing.com
http://www.caworld.com/proceedings/2000/dat
a_warehousing/ws006pn/sld001.htm
http://sdgcomputing.com
http://www.dmreview.com

Thank You
Hanu

Вам также может понравиться