Вы находитесь на странице: 1из 24

Data Warehousing and

Data Mining
Data Warehouse Design
Physical Design
Bitmap Indexing

Contain a large number of data


organised in a table structure, i.e. in
the form of rows and columns

Read Write Tables (High volume use


of Insert, Update, Delete statements)
Read Only Tables Static Data
Aggregate Tables
Dimension Tables
Fact Tables

Indexes help to quickly locate pieces of


information about a topic which are
scattered around a number of
resources. For e.g, the index of a book
tells what pages of the book refer to a
concept listed in the indexes.
Indexes are created in Databases/
Datawarehouses for efficient
identification of the requested rows

the act of classifying and providing an index


in order to make items easier to retrieve

B Tree (Binary Tree) Indexing


Hash Table
Bitmap Indexing

Normally used for OLTP applications


(many read-write operations)

B-trees are always height balanced, with all leaf


nodes at the same level
Update and search operations affect only a few disk
pages, so performance is good.
B-trees keep related records on the same disk page,
which takes advantage of locality of reference.
B-trees guarantee that every node in the tree will be
full at least to a certain minimum percentage. This
improves space efficiency while reducing the typical
number of disk fetches necessary during a search or
update operation over many thousands of records.

Contains a fact table, plus a number


of dimension tables whose keys are
foreign keys of the fact table.

How would you efficiently answer the


query: Find the total sales of
raincoats for each location last year.
You may assume suitable indexes.

Number of relations can be large


join size estimation is difficult.
Dimension tables are often small,
especially when considering only the
tuples matching a select operation.
Often data can be assumed to be
static (a snapshot of the operational
data). This means that we have the
option to precompute.

Suppose there are only 4 different


locations in our previous example.
Then we may represent the locations of N
fact table tuples using only 2N bits.
However, a (clustered) index on location
seems to require at least N log N bits.

For each possible value


(of location) and each
tuple, store a bit that is 1
if the tuple contains the
given value.
Store these bits ordered
by column.
In the context of this
example, the bitmap
index is a join index that
can be used when
computing joins.

Suitable for columns having a very small


number of unique values (very low
cardinality)
Normally used where the cardinality is <
0.05%

create bitmap index gender_state on


person (sex)

Create a set of bitmap indexes from


data
Quantize data into buckets
Table with no insert, update , delete
or very few DML statements is a
good choice for bitmap indexes.

For example

Bitmaps can be combined using the


logical operations AND, OR , NOT.

A AND B

A OR B

NOT A

Note: Some RDBMSs like Oracle also implements a MINUS

operation internally
A MINUS B is equivalent to A AND NOT B

Implement one or two bitmap


compression techniques like WordAligned Hybrid (WAH) and Byte-Aligned
Bitmap Code (BBC)
Compare storage requirements with and
without compression for various data
sets (focus on varying cardinality, i.e.
number of buckets per attribute)

Develop Query Engine to execute selection queries


(i.e. Attribute 1 between 3 and 5 AND Attribute 2 =
4)
Identify correct bitmaps to use, appropriate bitmap
AND, OR bitmap operations
Try query execution over both non-compressed
bitmaps and whichever compression you used. If
using BBC then try decompressing appropriate
bitmaps and executing. If using WAH, then
implement query execution over compressed
bitmaps.
Compare query execution times to sequential scan

Bitmap indexes
Must be non-unique
Include NULL values (B*Tree indexes do not)
Maximum 30 columns (B*Tree indexes max 32
columns)
Can be locally partitioned
Cannot be globally partitioned
Can be function-based indexes (built-in or userdefined)
Can be created on global temporary tables
Cannot be reversed

Compact size.
Efficient hardware support for bitmap
operations (AND, OR, XOR, NOT).
Fast search.
Multiple differentiate bitmap indexes
for different kind of queries.

If 3 or more sessions simultaneously


working to INSERT into a Bitmap Indexed
Table, a deadlock situation can arise.
This may result in corruption of the Index.
The solution for this is to REBUILD the
Bitmap Index
(Note: Unlike B Tree Indexes, Bitmap Indexes
cannot be rebuilt online.
<BMI> rebuild)

We need to use Alter

CPU overhead - Require more CPU


cycles
Modification requires more effort than
B Tree and hence not suitable for
large concurrent DML operations
Deadlock

Вам также может понравиться