Академический Документы
Профессиональный Документы
Культура Документы
Outline
2
A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data cube technology From data warehousing to data mining
Limitations of SQL
A Freshman in Business needs a Ph.D. in SQL -- Ralph Kimball
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013 3
Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.
Major task of data warehouse system Data analysis and decision making
OLTP vs OLAP
5
OnlineTransaction Processing
Complex Queries
Data Warehouse
aggregated & historical data, Static and Low volume
OLTP vs OLAP
6
OLAP Pros
8
It is a powerful visualization paradigm It provides fast, interactive response times It is good for analyzing time series It can be useful to find some clusters and outliers Many vendors offer OLAP tools
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
Aggregation -- (total sales, percent-to-total) Comparison -- Budget vs. Expenses Ranking -- Top 10, quartile analysis Access to detailed and aggregate data Complex criteria specification Visualization
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
access methods, indexing, concurrency control, recovery complex OLAP queries, multidimensional view, consolidation.
Missing data: Decision support requires historical data which operational DBs do not typically maintain Data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources Data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled
multidimensional data model which views data in the form of a data cube
A data cube allows data to be modeled and viewed in multiple dimensions (such as sales)
Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables
Definitions
an n-Dimensional base cube is called a base cuboid The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid The lattice of cuboids forms a data cube
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
0-D(apex) cuboid
1-D cuboids
time,item
time,location
item,location
item,supplier
location,supplier
time,supplier time,item,location
2-D cuboids
time,location,supplier
3-D cuboids
item,location,supplier
time,item,supplier
4-D(base) cuboid
time, item, location, supplier
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
Star schema
Snowflake schema
A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake
Fact constellations
Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact
Chapter 2 Online Analytical Processing constellation Tuesday, October 29, 2013
Item
Measures
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
Time
Supplier
Item
item_key item_name brand type supplier_key
supplier_key supplier_type
City
city_key city province_or_street country
Euros_sold Avg_sales
Measures
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
Time
time_key day day_of_the_week month quarter year Item
Measures
shipper
First time as cube definition define dimension <dimension_name> as <dimension_name_first_time> in cube <cube_name_first_time>
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
define dimension time as (time_key, day, day_of_week, month, quarter, year) define dimension item as (item_key, item_name, brand, type, supplier_type) define dimension branch as (branch_key, branch_name, branch_type) define dimension location as (location_key, street, city, province_or_state, country) define cube shipping [time, item, shipper, from_location, to_location]:
dollar_cost = sum(cost_in_dollars), unit_shipped = count(*)
define dimension time as time in cube sales define dimension item as item in cube sales define dimension shipper as (shipper_key, shipper_name, location as location in cube sales, shipper_type) define dimension from_location as location in cube sales define dimension to_location as location in cube sales
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
Distributive
if the result derived by applying the function to n aggregate values is the same as that derived by applying the function on all the data without partitioning.
Algebraic
if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function.
Holistic
country
Canada
...
Mexico
Ireland
...
France
city office
Toronto
...
...
Belfast
Blackrock
Tuesday, October 29, 2013
Multidimensional Data
23
Product
MDDM
24
Sales by product line over the past six months Sales by store between 1990 and 1995
Store Info
Product Info
Dimension tables
Time Info
...
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
Star Schema
25
France Germany
sum
Country
TV PC VCR sum
1Qtr
2Qtr
Date
3Qtr 4Qtr
sum
Ireland
date
product,country
country
1-D cuboids
date, country
product,date
from higher level summary to lower level summary or detailed data, or introducing new dimensions
Pivot (rotate)
Other operations
drill through: through the bottom level of the cube to its back-end relational tables (using SQL)
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
Shipping Method
CONTRACTS
AIR-EXPRESS TRUCK ORDER PRODUCT LINE
Time
ANNUALY QTRLY CITY SALES PERSON COUNTRY DAILY
Product
PRODUCT ITEM PRODUCT GROUP
DISTRICT
REGION Location DIVISION Organization Tuesday, October 29, 2013
Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middle ware to support missing pieces
Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services greater scalability Array-based multidimensional storage engine (sparse matrix techniques) fast indexing to pre-computed summarized data User flexibility, e.g., low level: relational, high-level: array specialized support for SQL queries over star/snowflake schemas
Chapter 2 Online Analytical Processing Tuesday, October 29, 2013
Client/Server Architecture
32
Framework for the new systems to be designed, developed and implemented Divide the OLAP system into several components that define its architecture
Same
C/S
33
Which Technology?
34
1) Performance:
How fast will the system appear to the end-user?
MDD server vendors believe this is a key point in their favor. 2) Data volume and scalability: While MDD servers can handle up to 50GB of storage,
RDBMS servers can handle hundreds of gigabytes and terabytes.
What if Analysis
35
IF A. You require write access B. Your data is under 50 GB C. Your timetable to implement is 60-90 days D. Lowest level already aggregated E. Data access on aggregated level F. Youre developing a general-purpose application for inventory movement or assets management THEN Consider an MDD /MOLAP solution for your data mart IF A. Your data is over 100 GB B. You have a "read-only" requirement C. Historical data at the lowest level of granularity D. Detailed access, long-running queries E. Data assigned to lowest level elements THEN Consider an RDBMS/ROLAP solution for your data mart. IF
A. OLAP on aggregated and detailed data B. Different user groups C. Ease of use and detailed data
THEN Consider an HOLAP for your data mart
Examples
36
ROLAP
MOLAP
Telecommunication startup: call data records (CDRs) ECommerce Site Credit Card Company Analysis and budgeting in a financial department Sales analysis
Sales department of a multi-national company Banks and Financial Service Providers
HOLAP
Tools
37
ROLAP:
ORACLE 8i ORACLE Reports; ORACLE Discoverer ORACLE Warehouse Builder Arbors Softwares Essbase ORACLE Express Server ORACLE Express Clients (C/S and Web) MicroStrategys DSS server Platinum Technologies Plantinum InfoBeacon ORACLE 8i ORACLE Express Serve ORACLE Relational Access Manager ORACLE Express Clients (C/S and Web)
MOLAP:
HOLAP:
Conclusion
38