Вы находитесь на странице: 1из 42

DATA

CUBES
Online Analytic
Processing
OLAP

2
OLAP
• OLAP: Online Analytic
Processing
• OLAP queries are complex queries that
• Touch large amounts of data
• Discover patterns and trends in the data
• Typically expensive queries that take long
time
Select salary
• Also called decision-support queries
From Emp
• In contrast to OLAP: Where ID =
• OLTP: Online Transaction 100;
• Processing
OLTP queries are simple queries, e.g., over banking or
airline systems
• OLTP queries touch small amount of data for fast
transactions
3
OLTP vs. OLAP
 On-Line Transaction Processing (OLTP):
– technology used to perform updates on operational
or transactional systems (e.g., point of sale systems)

 On-Line Analytical Processing (OLAP):


– technology used to perform complex analysis of the
data in a data warehouse
OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into
data through fast, consistent, interactive access to a
wide variety of possible views of information that has
been transformed from raw data to reflect the
dimensionality of the enterprise
as understood by the user.
[source: OLAP Council: www.olapcouncil.org] 4
OLAP AND DATA WAREHOUSE

OLA
OLA
P
P
Server
Internal
Reports
Source
s Data Dat Query and
Integration a
Warehouse Analysis
Operation Componen
Componen Data
al DBs t
t Minin
g
Met
a
data Clien
Externa
l t
Source Tool
s s
5
OLAP AND DATA WAREHOUSE
• Typically, OLAP queries are executed over a separate copy
of the working data
• Over data warehouse

• Data warehouse is periodically updated, e.g., overnight


• OLAP queries tolerate such out-of-date gaps

• Why run OLAP queries over data warehouse??


• Warehouse collects and combines data from multiple sources
• Warehouse may organize the data in certain formats to support
OLAP queries
• OLAP queries are complex and touch large amounts of data
• They may lock the database for long periods of time
• Negatively affects all other OLTP transactions
6
OLAP ARCHITECTURE

7
EXAMPLE OLAP APPLICATIONS

• Market Analysis
• Find which items are frequently sold over the summer
but not over winter?

• Credit Card Companies


• Given a new applicant, does (s)he a credit-worthy?
• Need to check other similar applicants (age, gender,
income, etc…) and observe how they perform, then
do prediction for new applicant

OLAP queries are also called


“decision- support”
queries 8
MULTI-DIMENSIONAL VIEW
• Data is typically viewed as
Locatio points in multi-dimensional
n space
NY
Item MA
s CA Raw data cubes
(raw level
bread 10 without
Orange
aggregation)
juice
47

Milk 2%fat 30 Typical OLAP


applications have
Milk 1%fat 12 many dimensions
Tim
3/1 3/2 3/3 3/4 e
9
ANOTHER EXAMPLE

gender
nts
de
c i
age ac
'

10
DATA
CUBES
• Data cube is a structure that enable OLAP to
achieves the multidimensional functionality.

• The data cube is used to represent data along


some measure of interest.

• Data Cubes are an easy way to look at the data


( allow us to look at complex data in a simple
format).

• Although called a "cube", it can be 2-dimensional,


3- dimensional, or higher-dimensional.
DATA
CUBES
• databases design s is for OLTP and
efficiency in data storage.

• data cube design is for efficiency in


data retrieval (ensures report
optimization).

• The cube is comparable to a table in


a relational database.
Dimensions Measures and Hierarchies
• data cubes have categories of data called
dimensions and measures.
• measure
– represents some fact (or number) such as cost
or units of service.
• dimension
– represents descriptive categories of data such
as time or location.
Hierarchy

Some dimensions can have multiple levels forming 
a hierarchy. 
For example dates have year, month, day; 
geography has country, region, city; 
product might have category, subcategory and 
the product.
Dimensions And
Measures
Data Cubes
Concepts
• Three important concepts associated
with data cubes :
1. Slicing.
2. Dicing.
3. Rotating.
Slicin
g
• the term slice most often refers to a
two- dimensional page selected from
the cube.

• subset of a multidimensional array


corresponding to a single value for one or
more members of the dimensions not in
the subset.
Slicin
g

Slicing-Wireless
Mouse
Slicin
g

Slicing-
Asia
Dicin
g
• A related operation to slicing .

• in the case of dicing, we define a subcube


of the original space.

• Dicing provides you the smallest


available slice.
Dicin
g
SELECT PRODUCT, SUM(REVENUE) FROM SALES
WHERE PRODUCTS= ‘OPV’ GROUP BY
PRODUCTS ;---- Slicing
EXAMPLE:
SELECT PRODUCT, SUM(REVENUE) FROM
SALES WHERE PRODUCTS= ‘EL’ AND
LOCATION=’EUROPE’ GROUP BY PRODUCTS;
---------DICING
Usage

Slice is used to select one particular dimension


from a given cube and to provide a new subcube.

Dice is used to select two or more dimensions from


a given cube and to provide a new subcube.
Rotatin
g
• Some times called pivoting.

• Rotating changes the dimensional orientation


of the report from the cube data.

• For example …
– rotating may consist of swapping the rows and
columns, or moving one of the row dimensions
into the column dimension
– or swapping an off-spreadsheet dimension with
one of the dimensions in the page display
Rotatin
g
Dimension
s
• represents descriptive categories of data
such as time or location.

• Each dimension includes different levels


of categories.
Dimension
s
Categorie
s
• is an item that matches a specific
description or classification such as years in
a time dimension.

• Categories can be at different levels


of information within a dimension.
Categorie
s
• parent category
– is the next higher level of another category in
a drill-up path.

• child category
– is the next lower level category in a drill-
down path.
Categorie
s
Categorie
s
measur
es
• The measures are the actual data values
that occupy the cells as defined by the
dimensions selected.
• Measures include facts or variables
typically stored as numerical fields.
measur
es
Computed versus Stored Data Cubes
• The goal is to retrieve the information
from the data cube in the most efficient
way possible.
• Three possible solutions are:
– Pre-compute all cells in the cube.
– Pre-compute no cells.
– Pre-compute some of the cells.
Computed versus Stored Data Cubes
• If the whole cube is pre-computed
– Advantage
• the queries run on the cube will be very
fast.
– Disadvantage
• pre-computed cube requires a lot of
memory.
Computed versus Stored Data Cubes
• To minimize memory requirements, we can
pre- compute none of the cells in the cube.

• But the queries on the cube will run more


slowly.

• As a compromise between these two, we can


pre- compute only those cells in the cube which
will most likely be used for decision support
queries.
representation of
Totals
• A simple data cube does not contain totals.
• The storage of totals increases the size of
the data cube but can also decrease the
time to make total-based queries.

• A simple way to represent totals is to add


an additional layer on n sides of the n-
dimensional data cube.
representation of
Totals

Вам также может понравиться