Adbms: Data Warehousing OLAP Technology

ADBMS
Data Warehousing OLAP Technology
What is Data Warehouse?
Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the organizations operational database Support information processing by providing a solid platform of consolidated, historical data for analysis. A data warehouse is a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of managements decision-making process.W. H. Inmon Data warehousing: The process of constructing and using data warehouses
January 28, 2013
Data WarehouseSubjectOriented
Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for
decision makers, not on daily operations or

transaction processing.
Provide a simple and concise view around
particular subject issues by excluding data that

are not useful in the decision support process.
January 28, 2013 3
Data WarehouseIntegrated
Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.

4
January 28, 2013
Data WarehouseTime Variant
The time horizon for the data warehouse is significantly longer than that of operational systems.

Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain time element.
Every key structure in the data warehouse

January 28, 2013
Data WarehouseNon-Volatile
A physically separate store of data transformed from the
operational environment.
Operational update of data does not occur in the data warehouse environment.
Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing:
initial loading of data and access of data.
January 28, 2013
Data Warehouse vs. Heterogeneous DBMS
Traditional heterogeneous DB integration:
Build wrappers/mediators on top of heterogeneous databases Query driven approach
When a query is posed to a client site, a metadictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set Complex information filtering, compete for resources
Data warehouse: update-driven, high performance
Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis
7
January 28, 2013
Data Warehouse vs. Operational DBMS
OLTP (on-line transaction processing)

Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. Major task of data warehouse system Data analysis and decision making User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries
8
OLAP (on-line analytical processing)

Distinct features (OLTP vs. OLAP):

January 28, 2013
OLTP vs. OLAP

OLTP users function DB design data clerk, IT professional day to day operations application-oriented current, up-to-date detailed, flat relational isolated repetitive read/write index/hash on prim. key short, simple transaction tens thousands 100MB-GB transaction throughput OLAP knowledge worker decision support subject-oriented historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans complex query millions hundreds 100GB-TB query throughput, response
9
usage access unit of work # records accessed #users DB size metric

January 28, 2013
Data Warehouse Design
OLAP
Objectives
What
is OLAP Need for OLAP Features & functions of OLAP Different OLAP models OLAP implementations
January 28, 2013
12
Demand for OLAP

To develop DM, three approaches In all approaches, Data Marts rest on Dimensional Model Data Marts are sufficient for basic data analysis Users need to go beyond such basic analysis
January 28, 2013
13
Demand for OLAP

Need for Multidimensional Analysis Fast Access & Powerful Calculations Limitations of other analysis methods like:

SQL Spreadsheets Report Writers
January 28, 2013
14
Demand for OLAP
Traditional tools of report writers, query products, spreadsheets, & language interfaces do not match the user expectations as far as performing multidimensional analysis with complex calculations is concerned. Tools used with OLTP and basic DW environments do not match up to the task
January 28, 2013
15
OLAP is the Answer!

OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into the data through fast, consistent, interactive, access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
January 28, 2013
16
Why is OLAP useful?
Facilitates multidimensional data analysis by pre-computing aggregates across many sets of dimensions Provides for:

Greater speed and responsiveness Improved user interactivity
January 28, 2013
17
Data Warehouses
A data warehouse is based on a multidimensional data model which views data in the form of a data cube A data cube allows data to be modeled and viewed in multiple dimensions In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.
18
January 28, 2013
Lattice of Cuboids
all time item location supplier
0-D(apex) cuboid
1-D cuboids
time,item
time,location
item,location item,supplier
location,supplier
time,supplier time,item,location
2-D cuboids
time,location,supplier
3-D cuboids
item,location,supplier
time,item,supplier
4-D(base) cuboid
time, item, location, supplier
January 28, 2013 19
CUBE
Fact table view:
sale prodId p1 p2 p1 p2 p1 p1 storeId c1 c1 c3 c2 c1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4
Multi-dimensional cube:
c1 44 c2 8 c2 4 c3 50 c3
day 2 day 1
p1 p2 c1 p1 12 p2 11
dimensions = 3
January 28, 2013
20
Aggregates
Add up amounts for day 1
In SQL: SELECT sum(amt) FROM SALE WHERE date = 1
sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4
81
January 28, 2013
21
Aggregates
Add up amounts by day
In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
ans
date 1 2
sum 81 48
January 28, 2013
22
Aggregates
Operators: sum, count, max, min, median, avg Having clause Using dimension hierarchy
average by region (within store) maximum by month (within date)
January 28, 2013
23
Cube Aggregation
Example: computing sums
day 2
day 1
p1 p2 c1 p1 12 p2 11 c1 44 c2 8 c2 4 c3 50 c3
...
p1 p2
c1 56 11
c2 4 8
c3 50
sum
c1 67
c2 12
c3 50
129
p1 p2 sum 110 19
24
rollup drill-down
January 28, 2013
Cube Operators
day 2
day 1
p1 p2 c1 p1 12 p2 11 c1 44 c2 8 c2 4 c3 50 c3
... sale(c1,*,*)
sum c1 67 c2 12 c3 50
p1 p2
c1 56 11
c2 4 8
c3 50
129
p1 p2 sum 110 19
sale(c2,p2,*)
January 28, 2013
sale(*,*,*) sale(*,p1,*)
25
Extended Cube
*
day 2
day 1
p1 p2 *
p1 p2 * c1 12 11 23
p1 p2 * c1 44
c1 56 11 67 c2 4
c2 4 8 12 c3
c3 50
* 50 48 48
* 110 19 129
44 c2
8 8
4 c3 50
50
sale(*,p2,*)
* 62 19 81
January 28, 2013
26
Aggregation Using Hierarchies
day 2 day 1
p1 p2 c1 p1 12 p2 11
c1 44 c2 8
c2 4 c3 50
c3
customer region country
p1 p2
region A region B 56 54 11 8
(customer c1 in Region A; customers c2, c3 in Region B)
January 28, 2013
27
Pivoting
Fact table view:
Multi-dimensional cube:
day 2 day 1
c1 44 c2 8 c2 4 c3 50 c3
p1 p2 c1 p1 12 p2 11
p1 p2
c1 56 11
c2 4 8
c3 50
January 28, 2013
28
Cube Aggregates Lattice

129
all
p1 c1 67 c2 12 c3 50
city
product
date
city, product
p1 p2 c1 56 11 c2 4 8 c3 50
city, date
product, date
use greedy algorithm to decide what to materialize
29
day 2 day 1
c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8
city, product, date
January 28, 2013
Dimension Hierarchies
all
cities city c1 c2 state CA NY
state
city
January 28, 2013
30
Dimension Hierarchies
all city product date
city, product
city, date
product, date state state, date state, product state, product, date
city, product, date
not all arcs shown...

January 28, 2013 31
Interesting Hierarchy
all years
weeks quarters
time day 1 2 3 4 5 6 7 8 week 1 1 1 1 1 1 1 2 month 1 1 1 1 1 1 1 1 quarter 1 1 1 1 1 1 1 1 year 2000 2000 2000 2000 2000 2000 2000 2000
months
conceptual dimension table
days
January 28, 2013 32
SAMPLE CUBE
TV PC VCR Total sum Q1 sales 1Qtr 2Qtr
Date
3Qtr
In U.S.A
Total Q1 sales
In Canada
Total Q1 sales
In Canada
Total sales
Mexico
sum
In Mexico In all countries

January 28, 2013
In Mexico
Total Q2 sales Total Q1 sales all countries In
TOTAL SALES
Country
Total annual sales of TV in U.S.A. 4Qtr sum Total annual sales U.S.A of PC in U.S.A. Total sales Total annual sales of In U.S.A VCR in U.S.A. Canada Total sales
33
OLAP Operations
Roll-Up Drill-Down Slice & Dice Pivot Drill-Across Drill-Through
January 28, 2013
34
OLAP Operations
January 28, 2013
35
Slicing
January 28, 2013
36
Dicing (Sub-cube)
January 28, 2013
37
Roll-Up
January 28, 2013
38
Drill-Down
January 28, 2013
39
Other OLAP Operations

o Drill-Across: Queries involving more than one fact table o Drill-Through: Makes use of SQL to drill through the bottom level of a data cube down to its back-end relational tables o Pivot (rotate): Pivot (also called "rotate") is a visualization operation which rotates the data axes in view in order to provide an alternative presentation of the data. Other examples include rotating the axes in a 3-D cube, or transforming a 3-D cube into a series of 2D planes.
January 28, 2013
40
Other OLAP Operations

o Moving Averages o Growth Rates o Depreciation o Currency Conversion o Statistical Functions o Top N or Bottom N queries
January 28, 2013
41
Conceptual vs. Actual
The cube is a logical way of visualizing the data in an OLAP setting Not how the data is actually represented on disk Two ways of storing data:

ROLAP: Relational OLAP MOLAP: Multidimensional OLAP
January 28, 2013
42
OLAP & CUBE
Construction of the data cube is key to the operation of OLAP The computation process creates a set of aggregates on the various dimensions of the data The CUBE operator
January 28, 2013
43
An example of the CUBE Operator
January 28, 2013
44
The CUBE Operator

Proposed by Gray et al* Effectively involves a series of GROUP-BY operations to aggregate data Creates power set on all attributes according to:
A measure An aggregator function
*J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D. Reichart, M. Venkatrao, F. Pellow and H. Pirahesh.
Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29-54, 1997.
January 28, 2013 45
CUBING Problem
Problem: this generates a lot of data and work (2n sets in total, where n is the number of dimensions) Solution: optimized algorithms to run faster, consume less memory, and perform fewer I/Os.
January 28, 2013
46
Efficient Computation of Data Cubes

o
ROLAP-based cubing algorithms (Agarwal et al96) Array-based cubing algorithm (Zhao et al97)
S. Agarwal, R. Agrawal, P. M. Deshpande, A.Gupta, J. F. Naughton, R. Ramakrishnan and S.Sarawagi. On the computation of multidimensional aggregates. In VLDB'96. Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97.
January 28, 2013
47
Efficient Computation of Data Cubes

o o o o
How many cuboids in a cube with 3 dimensions? Answer: As many group by operations? No hierarchies involved!!
o
o
associated with dimension I 10 dimensions & 4 levels for each dimension Total Cuboids = 510
(Li +1), where Li is the number of levels
January 28, 2013
48
Approaches to OLAP Servers
It is all about which DBMS you choose to store your data warehouse data RDBMS ROLAP MDDB MOLAP BOTH - HOLAP
January 28, 2013
49
Approaches to OLAP Servers

Three possibilities for OLAP servers (1) Relational OLAP (ROLAP) Relational and specialized relational DBMS to store and manage warehouse data OLAP middleware to support missing pieces (2) Multidimensional OLAP (MOLAP) Array-based storage structures Direct access to array data structures (3) Hybrid OLAP (HOLAP)
Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools
January 28, 2013
50
ROLAP

Special schema design: star, snowflake Special indexes: bitmap, multi-table join Proven technology (relational model, DBMS), tend to outperform specialized MDDB especially on large data sets Products IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
51
January 28, 2013
MOLAP
MDDB: a special-purpose data model Facts stored in multi-dimensional arrays Dimensions used to index array Sometimes on top of relational DB Products
Pilot, Arbor Essbase, Gentia
January 28, 2013
52
ROLAP vs. MOLAP
January 28, 2013
53
Hybrid OLAP - HOLAP

o
Best of both worlds

Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools
January 28, 2013
54
HOLAP
RDBMS Server MDBMS Server
Multidimensional access
Multidimensional data SQL-Reach Through
Client
SQL-Read
User data
Meta data Derived data
Multidimensional Viewer
Relational Viewer
SQL-Read
January 28, 2013
55
ROLAP, MOLAP, or HOLAP

IF A. You require write access B. Your data is under 50 GB C. Your timetable to implement is 60-90 days D. Lowest level already aggregated E. Data access on aggregated level F. Youre developing a general-purpose application for inventory movement or assets management THEN Consider an MDD /MOLAP solution for your data mart
IF
A. Your data is over 100 GB B. You have a "read-only" requirement C. Historical data at the lowest level of granularity D. Detailed access, long-running queries E. Data assigned to lowest level elements THEN Consider an RDBMS/ROLAP solution for your data mart. IF A. OLAP on aggregated and detailed data B. Different user groups C. Ease of use and detailed data THEN Consider an HOLAP for your data mart
January 28, 2013
56
Conclusions

ROLAP: RDBMS -> star/snowflake schema MOLAP: MDDB -> Cube structures ROLAP or MOLAP: Data models used play major role in performance differences MOLAP: for summarized and relatively lesser volumes of data (100GB) ROLAP: for detailed and larger volumes of data Both storage methods have strengths and weaknesses The choice is requirement specific, though currently data warehouses are predominantly built using RDBMSs/ROLAP. HOLAP is emerging as the OLPA server of choice
January 28, 2013
57

Adbms: Data Warehousing OLAP Technology

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Adbms: Data Warehousing OLAP Technology

Загружено:

Авторское право:

Доступные форматы

ADBMS

Data Warehousing OLAP Technology

What is Data Warehouse?

January 28, 2013

decision makers, not on daily operations or

Provide a simple and concise view around

particular subject issues by excluding data that

E.g., Hotel price: currency, tax, breakfast covered, etc.

When data is moved to the warehouse, it is converted.

January 28, 2013

Data WarehouseTime Variant

Every key structure in the data warehouse

January 28, 2013

A physically separate store of data transformed from the

initial loading of data and access of data.

January 28, 2013

Data Warehouse vs. Heterogeneous DBMS

Traditional heterogeneous DB integration:

Build wrappers/mediators on top of heterogeneous databases Query driven approach

Data warehouse: update-driven, high performance

January 28, 2013

Data Warehouse vs. Operational DBMS

OLTP (on-line transaction processing)

OLAP (on-line analytical processing)

Distinct features (OLTP vs. OLAP):

January 28, 2013

OLTP vs. OLAP

usage access unit of work # records accessed #users DB size metric

Data Warehouse Design

January 28, 2013

Demand for OLAP

January 28, 2013

Demand for OLAP

SQL Spreadsheets Report Writers

January 28, 2013

Demand for OLAP

January 28, 2013

OLAP is the Answer!

January 28, 2013

Why is OLAP useful?

Greater speed and responsiveness Improved user interactivity

January 28, 2013

January 28, 2013

January 28, 2013

January 28, 2013

January 28, 2013

average by region (within store) maximum by month (within date)

January 28, 2013

January 28, 2013

Aggregation Using Hierarchies

customer region country

(customer c1 in Region A; customers c2, c3 in Region B)

January 28, 2013

January 28, 2013

Cube Aggregates Lattice

city, product, date

January 28, 2013

January 28, 2013

city, product, date

not all arcs shown...

conceptual dimension table

In Mexico In all countries