Академический Документы
Профессиональный Документы
Культура Документы
Processing
ROLAP servers vs. MOLAP servers Index Structures Cube computation What to Materialize? Algorithms
Client
Source
Source
Source
ROLAP Server
sale
prodId p1 p2 p1
date 1 1 2
sum 62 19 48
utilities
relational DBMS
MOLAP Server
M.D. tools
Product
2 3 4 Date
utilities
multidimensional server
MOLAP
TV PC VCR sum 1Qtr 2Qtr
Date
3Qtr 4Qtr
sum
Country
MOLAP
c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c0 B 13 14 15 16 28 24 2 3 4 20 40 36 52 60 44 56
b3
b2
9
5 1
b1
b0
a0
a1
a2
a3
Challenges in MOLAP
Creating array data from data in tables Efficient techniques for Cube computation
Index Structures
Popular in Warehouses
inverted
Inverted Lists
18 19
r4 r18 r34 r35 r5 r19 r37 r40
20 23
20 21 22
age index
inverted lists
data records
9
...
23 25 26
name age joe 20 fred 20 sally 21 nancy 20 tom 20 pat 25 dave 21 jeff 26
Query:
Get
List for age = 20: r4, r18, r34, r35 List for name = fred: r18, r52 Answer is intersection: r18
10
Bit Maps
18 19
20 23
20 21 22
1 1 0 1 1 0 0 0 0
age index
bit maps
data records
11
...
23 25 26
0 0 1 0 0 0 1 0 1 1
id 1 2 3 4 5 6 7 8
name age joe 20 fred 20 sally 21 nancy 20 tom 20 pat 25 dave 21 jeff 26
Bitmap Index
Index on a particular column Each value in the column has a bit vector: bit-op is fast The length of the bit vector: # of records in the base table The i-th bit is set if the i-th row of the base table has the value for the indexed column not suitable for high cardinality domains Base table Index on Region Index on Type
Cust C1 C2 C3 C4 C5
Type RecIDAsia Europe America RecID Retail Dealer Retail 1 1 0 1 1 0 0 Dealer 2 2 0 1 0 1 0 Dealer 3 1 0 0 3 0 1 Retail 4 0 0 1 4 1 0 0 1 0 5 0 1 Dealer 5
12
Query:
Get
List for age = 20: 1101100000 List for name = fred: 0100000001 Answer is intersection: 010000000000
13
Join
Combine SALE, PRODUCT relations In SQL: SELECT * FROM SALE, PRODUCT WHERE ...
sale prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4
product id p1 p2 name price bolt 10 nut 5
joinTb prodId p1 p2 p1 p2 p1 p1
price 10 5 10 5 10 10
storeId c1 c1 c3 c2 c1 c2
date 1 1 1 1 2 2
amt 12 11 50 8 44 4
14
Join Indexes
join index
product id p1 p2 name price bolt 10 nut 5 jIndex r1,r3,r5,r6 r2,r4
sale
rId r1 r2 r3 r4 r5 r6
prodId storeId p1 c1 p2 c1 p1 c3 p2 c2 p1 c1 p1 c2
date 1 1 1 1 2 2
amt 12 11 50 8 44 4
15
16
Counting Exercise
17
Lattice of Cuboids
129
all
p1 c1 67 c2 12 c3 50
city
product
date
city, product
p1 p2 c1 56 11 c2 4 8 c3 50
city, date
product, date
day 2 day 1
c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8
Dimension Hierarchies
all
cities city c1 c2 state CA NY
state
city
19
Dimension Hierarchies
all city product date
city, product
city, date
product, date state state, date state, product state, product, date
How
every (cuboid) (full materialization), none (no materialization), or some (partial materialization)
of which cuboids to materialize
21
Selection
Derived Data
22
sale
joinTb prodId p1 p2 p1 p2 p1 p1
price 10 5 10 5 10 10
storeId c1 c1 c3 c2 c1 c2
date 1 1 1 1 2 2
amt 12 11 50 8 44 4
23
Transform drill, roll, etc. into corresponding SQL and/or OLAP operations, e.g., dice = selection + projection
24
What to Materialize?
Store in warehouse results useful for common queries Example: total sales
day 2 day 1
c1 c2 c3 p1 44 4 p2 c1 c2 c3 p1 12 50 p2 11 8
...
p1 p2
c1 56 11
c2 4 8
c3 50
p1
c1 67
c2 12
c3 50
129
p1 p2 c1 110 19
materialize
25
Materialization Factors
Type/frequency of queries Query response time Storage cost Update cost
26
Iceberg Cube
Computing only the cuboid cells whose count or other aggregates satisfying the condition like
HAVING COUNT(*) >= minsup
Motivation
Only
a small portion of cube cells may be above the water in a sparse cube Only calculate interesting cellsdata above certain threshold
27
Challenges in MOLAP
Creating array data from data in tables Efficient techniques for Cube computation