Вы находитесь на странице: 1из 8

Bitmap Indexes

Bitmap indexes were added to Oracle in version 7.3 of the database. They are currently
available with the Oracle Enterprise and Personal Editions, but not the Standard Edition.
Bitmap indexes are designed for data warehousing/ad hoc query environments where the full
set of queries that may be asked of the data is not totally known at system implementation
time. They are specifically not designed for OLTP systems or systems where data is
frequently updated by many concurrent sessions.
Bitmap indexes are structures that store pointers to many rows with a single index key
entry, as compared to a B*Tree structure where there is parity between the index keys and the
rows in a table. In a bitmap index, there will be a very small number of index entries, each of
which points to many rows. In a conventional B*Tree, one index entry points to a single row.
Lets say we are creating a bitmap index on the JOB column in the EMP table as follows:
Ops$tkyte@ORA10G> create BITMAP index job_idx on emp(job);
Index created.
Oracle will store something like what is shown in Table 11-6 in the index.
Table 11-6. A representation of how Oracle would store the JOB-IDX bitmp index.
Value/Row 1 2 3 4 5 6 7 8 9 10
11 12 13 14
ANALYST 0 0 0 0 0 0 0 1 0 1
0 0 1 0
CLERK 1 0 0 0 0 0 0 0 0 0 1
1 0 1
MANAGER 0 0 0 1 0 1 1 0 0 0
0 0 0 0
PRESIDENT 0 0 0 0 0 0 0 0 1 0
0 0 0 0
SALESMAN 0 1 1 0 1 0 0 0 0 0
0 0 0 0
Table 11-6 shows that rows 8, 10, and 13 have the value ANALYST, whereas rows 4, 6,
and 7 have the value MANAGER. It also shows us that no rows are null (bitmap indexes store
null entries; the lack of a null entry in the index implies there are no null rows). If we wanted
to count the rows that have the value MANAGER, the bitmap index would do this very
rapidly. If we wanted to find all the rows such that the JOB was CLERK or MANAGER, we
could simply combine their bitmaps from the index as, shown in Table 11-7.
Table 11-7. Representation of a bitwise OR.
Value/Row 1 2 3 4 5 6 7 8 9 10
11 12 13 14
CLERK 1 0 0 0 0 0 0 0 0 0 1
1 0 1
MANAGER 0 0 0 1 0 1 1 0 0 0
0 0 0 0
CLERK or MANAGER 1 0 0 1 0 1 1 0 0
0 1 1 0 1
Table 11-7 rapidly shows us that rows 1, 4, 6, 7, 11, 12, and 14 satisfy our criteria. The
bitmap Oracle stores with each key value is set up so that each position represents a rowid in
the underlying table, if we need to actually retrieve the row for further processing. Queries
such as the following:
select count(*) from emp where job = 'CLERK' or job = 'MANAGER'
will be answered directly from the bitmap index. A query such as this:
select * from emp where job = 'CLERK' or job = 'MANAGER'
on the other hand, will need to get to the table. Here, Oracle will apply a function to turn the
fact that the ith bit is on in a bitmap, into a rowid that can be used to access the table.
When Should You Use a Bitmap Index?
Bitmap indexes are most appropriate on low distinct cardinality data (i.e., data with relatively
few discrete values when compared to the cardinality of the entire set). It is not really
possible to put a value on thisin other words, it is difficult to define what low distinct
cardinality is truly. In a set of a couple thousand records, 2 would be low distinct cardinality,
but 2 would not be low distinct cardinality in a two-row table. In a table of tens or hundreds
of millions records, 100,000 could be low distinct cardinality. So, low distinct cardinality is
relative to the size of the resultset. This is data where the number of distinct items in the set
of rows divided by the number of rows is a small number (near zero). For example, a
GENDER column might take on the values M, F, and NULL. If you have a table with 20,000
employee records in it, then you would find that 3/20000 = 0.00015. Likewise, 100,000
unique values out of 10,000,000 results in a ratio of 0.01again, very small. These columns
would be candidates for bitmap indexes. They probably would not be candidates for a having
B*Tree indexes, as each of the values would tend to retrieve an extremely large percentage of
the table. B*Tree indexes should be selective in general, as outlined earlier. Bitmap indexes
should not be selectiveon the contrary, they should be very unselective in general.
Bitmap indexes are extremely useful in environments where you have lots of ad hoc
queries, especially queries that reference many columns in an ad hoc fashion or produce
aggregations such as COUNT. For example, suppose you have a large table with three
columns: GENDER, LOCATION, and AGE_GROUP. In this table, GENDER has a value of M or
F, LOCATION can take on the values 1 through 50, and AGE_GROUP is a code representing
18 and under, 19-25, 26-30, 31-40, and 41 and over. You have to support a large number of ad
hoc queries that take the following form:
Select count(*)
from T
where gender = 'M'
and location in ( 1, 10, 30 )
and age_group = '41 and over';

select *
from t
where ( ( gender = 'M' and location = 20 )
or ( gender = 'F' and location = 22 ))
and age_group = '18 and under';

select count(*) from t where location in (11,20,30);

select count(*) from t where age_group = '41 and over' and gender = 'F';
You would find that a conventional B*Tree indexing scheme would fail you. If you
wanted to use an index to get the answer, you would need at least three and up to six
combinations of possible B*Tree indexes to access the data via the index. Since any of the
three columns or any subset of the three columns may appear, you would need large
concatenated B*Tree indexes on
* GENDER, LOCATION, AGE_GROUP: For queries that used all three, or GENDER with
LOCATION, or GENDER alone
* LOCATION, AGE_GROUP: For queries that used LOCATION and AGE_GROUP or
LOCATION alone
* AGE_GROUP, GENDER: For queries that used AGE_GROUP with GENDER or
AGE_GROUP alone
To reduce the amount of data being searched, other permutations might be reasonable as
well, to decrease the size of the index structure being scanned. This is ignoring the fact that a
B*Tree index on such low cardinality data is not a good idea.
Here the bitmap index comes into play. With three small bitmap indexes, one on each of
the individual columns, you will be able to satisfy all of the previous predicates efficiently.
Oracle will simply use the functions AND, OR, and NOT, with the bitmaps of the three
indexes together, to find the solution set for any predicate that references any set of these
three columns. It will take the resulting merged bitmap, convert the 1s into rowids if
necessary, and access the data (if you are just counting rows that match the criteria, Oracle
will just count the 1 bits). Lets take a look at an example. First, well generate test data that
matches our specified distinct cardinalitiesindex it and gather statistics. Well make use of
the DBMS_RANDOM package to generate random data fitting our distribution:
ops$tkyte@ORA10G> create table t
2 ( gender not null,
3 location not null,
4 age_group not null,
5 data
6 )
7 as
8 select decode( ceil(dbms_random.value(1,2)),
9 1, 'M',
10 2, 'F' ) gender,
11 ceil(dbms_random.value(1,50)) location,
12 decode( ceil(dbms_random.value(1,5)),
13 1,'18 and under',
14 2,'19-25',
15 3,'26-30',
16 4,'31-40',
17 5,'41 and over'),
18 rpad( '*', 20, '*')
19 from big_table.big_table
20 where rownum <= 100000;
Table created.

ops$tkyte@ORA10G> create bitmap index gender_idx on t(gender);
Index created.

ops$tkyte@ORA10G> create bitmap index location_idx on t(location);
Index created.

ops$tkyte@ORA10G> create bitmap index age_group_idx on t(age_group);
Index created.

ops$tkyte@ORA10G> exec dbms_stats.gather_table_stats( user, 'T', cascade=>true );
PL/SQL procedure successfully completed.
Now well take a look at the plans for our various ad hoc queries from earlier:
ops$tkyte@ORA10G> Select count(*)
2 from T
3 where gender = 'M'
4 and location in ( 1, 10, 30 )
5 and age_group = '41 and over';

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=1 Bytes=13)
1 0 SORT (AGGREGATE)
2 1 BITMAP CONVERSION (COUNT) (Cost=5 Card=1 Bytes=13)
3 2 BITMAP AND
4 3 BITMAP INDEX (SINGLE VALUE) OF 'GENDER_IDX' (INDEX (BITMAP))
5 3 BITMAP OR
6 5 BITMAP INDEX (SINGLE VALUE) OF 'LOCATION_IDX' (INDEX (BITMAP))
7 5 BITMAP INDEX (SINGLE VALUE) OF 'LOCATION_IDX' (INDEX (BITMAP))
8 5 BITMAP INDEX (SINGLE VALUE) OF 'LOCATION_IDX' (INDEX (BITMAP))
9 3 BITMAP INDEX (SINGLE VALUE) OF 'AGE_GROUP_IDX' (INDEX (BITMAP))
This example shows the power of the bitmap indexes. Oracle is able to see the location in
(1,10,30) and knows to read the index on location for these three values and logically OR
together the bits in the bitmap. It then takes that resulting bitmap and logically ANDs that
with the bitmaps for AGE_GROUP='41 AND OVER' and GENDER='M'. Then a simple count of
1s and the answer is ready.
ops$tkyte@ORA10G> select *
2 from t
3 where ( ( gender = 'M' and location = 20 )
4 or ( gender = 'F' and location = 22 ))
5 and age_group = '18 and under';

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=77 Card=507 Bytes=16731)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' (TABLE) (Cost=77 Card=507
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP AND
4 3 BITMAP INDEX (SINGLE VALUE) OF 'AGE_GROUP_IDX' (INDEX (BITMAP))
5 3 BITMAP OR
6 5 BITMAP AND
7 6 BITMAP INDEX (SINGLE VALUE) OF 'LOCATION_IDX' (INDEX (BITMAP))
8 6 BITMAP INDEX (SINGLE VALUE) OF 'GENDER_IDX' (INDEX (BITMAP))
9 5 BITMAP AND
10 9 BITMAP INDEX (SINGLE VALUE) OF 'GENDER_IDX' (INDEX (BITMAP))
11 9 BITMAP INDEX (SINGLE VALUE) OF 'LOCATION_IDX' (INDEX (BITMAP))
This shows similar logic: the plan shows the ORd conditions are each evaluated by AND-
ing together the appropriate bitmaps and then OR-ing together those results. Throw in another
AND to satisfy the AGE_GROUP='18 AND UNDER' and we have it all. Since we asked for the
actual rows this time, Oracle will convert each bitmap 1 and 0 into rowids to retrieve the
source data.
In a data warehouse or a large reporting system supporting many ad hoc SQL queries,
this ability to use as many indexes as make sense simultaneously comes in very handy
indeed. Using conventional B*Tree indexes here would not be nearly as usual or usable, and
as the number of columns that are to be searched by the ad hoc queries increases, the number
of combinations of B*Tree indexes you would need increases as well.
However, there are times when bitmaps are not appropriate. They work well in a read-
intensive environment, but they are extremely ill suited for a write-intensive environment.
The reason is that a single bitmap index key entry points to many rows. If a session modifies
the indexed data, then all of the rows that index entry points to are effectively locked in most
cases. Oracle cannot lock an individual bit in a bitmap index entry; it locks the entire bitmap
index entry. Any other modifications that need to update that same bitmap index entry will be
locked out. This will seriously inhibit concurrency, as each update will appear to lock
potentially hundreds of rows preventing their bitmap columns from being concurrently
updated. It will not lock every row as you might thinkjust many of them. Bitmaps are
stored in chunks, so using the earlier EMP example we might find that the index key
ANALYST appears in the index many times, each time pointing to hundreds of rows. An
update to a row that modifies the JOB column will need to get exclusive access to two of
these index key entries: the index key entry for the old value and the index key entry for the
new value. The hundreds of rows these two entries point to will be unavailable for
modification by other sessions until that UPDATE commits.
Bitmap Join Indexes
Oracle9i introduced a new index type: the bitmap join index. Normally an index is created on
a single table, using only columns from that table. A bitmap join index breaks that rule and
allows you to index a given table using columns from some other table. In effect, this allows
you to denormalize data in an index structure instead of in the tables themselves.
Consider the simple EMP and DEPT tables. EMP has a foreign key to DEPT (the DEPTNO
column). The DEPT table has the DNAME attribute (the name of the department). The end
users will frequently ask questions such as How many people work in sales?, Who works
in sales?, Can you show me the top N performing people in sales? Note that they do not
ask, How many people work in DEPTNO 30? They dont use those key values; rather, they
use the human-readable department name. Therefore, they end up running queries such as the
following:
select count(*)
from emp, dept
where emp.deptno = dept.deptno
and dept.dname = 'SALES'
/
select emp.*
from emp, dept
where emp.deptno = dept.deptno
and dept.dname = 'SALES'
/
Those queries almost necessarily have to access the DEPT table and the EMP table using
conventional indexes. We might use an index on DEPT.DNAME to find the SALES row(s) and
retrieve the DEPTNO value for SALES, and then using an INDEX on EMP.DEPTNO find the
matching rows, but by using a bitmap join index we can avoid all of that. The bitmap join
index allows us to index the DEPT.DNAME column, but have that index point not at the DEPT
table, but at the EMP table. This is a pretty radical conceptto be able to index attributes
from other tablesand it might change the way to implement your data model in a reporting
system. You can, in effect, have your cake and eat it, too. You can keep your normalized data
structures intact, yet get the benefits of denormalization at the same time.
Heres the index we would create for this example:
ops$tkyte@ORA10G> create bitmap index emp_bm_idx
2 on emp( d.dname )
3 from emp e, dept d
4 where e.deptno = d.deptno
5 /
Index created.
Note how the beginning of the CREATE INDEX looks normal and creates the index
INDEX_NAME on the table. But from there on, it deviates from normal. We see a reference
to a column in the DEPT table: D.DNAME. We see a FROM clause, making this CREATE
INDEX statement resemble a query. We have a join condition between multiple tables. This
CREATE INDEX statement indexes the DEPT.DNAME column, but in the context of the EMP
table. If we ask those questions mentioned earlier, we would find the database never accesses
the DEPT at all, and it need not do so because the DNAME column now exists in the index
pointing to rows in the EMP table. For purposes of illustration, we will make the EMP and
DEPT tables appear large (to avoid having the CBO think they are small and full scanning
them instead of using indexes):
ops$tkyte@ORA10G> begin
2 dbms_stats.set_table_stats( user, 'EMP',
3 numrows => 1000000, numblks => 300000 );
4 dbms_stats.set_table_stats( user, 'DEPT',
5 numrows => 100000, numblks => 30000 );
6 end;
7 /
PL/SQL procedure successfully completed.
and then well perform our queries:
ops$tkyte@ORA10G> set autotrace traceonly explain
ops$tkyte@ORA10G> select count(*)
2 from emp, dept
3 where emp.deptno = dept.deptno
4 and dept.dname = 'SALES'
5 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=13)
1 0 SORT (AGGREGATE)
2 1 BITMAP CONVERSION (COUNT) (Cost=1 Card=10000 Bytes=130000)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'EMP_BM_IDX' (INDEX (BITMAP))
As you can see, to answer this particular question, we did not have to actually access
either the EMP or DEPT tablethe entire answer came from the index itself. All the
information needed to answer the question was available in the index structure.
Further, we were able to skip accessing the DEPT table and, using the index on EMP that
incorporated the data we needed from DEPT, gain direct access to the required rows:
ops$tkyte@ORA10G> select emp.*
2 from emp, dept
3 where emp.deptno = dept.deptno
4 and dept.dname = 'SALES'
5 /
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=6145 Card=10000 Bytes=870000)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'EMP' (TABLE) (Cost=6145 Card=10000
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'EMP_BM_IDX' (INDEX (BITMAP))
Bitmap join indexes do have a prerequisite. The join condition must join to a primary or
unique key in the other table. In the preceding example, DEPT.DEPTNO is the primary key of
the DEPT table, and the primary key must be in place, otherwise an error will occur:
ops$tkyte@ORA10G> create bitmap index emp_bm_idx
2 on emp( d.dname )
3 from emp e, dept d
4 where e.deptno = d.deptno
5 /
from emp e, dept d
*
ERROR at line 3:
ORA-25954: missing primary key or unique constraint on dimension
Bitmap Indexes Wrap-up
When in doubt, try it out. It is trivial to add a bitmap index to a table (or a bunch of them) and
see what it does for you. Also, you can usually create bitmap indexes much faster than
B*Tree indexes. Experimentation is the best way to see if they are suited for your
environment. I am frequently asked, What defines low cardinality? There is no cut-and-
dried answer for this. Sometimes it is 3 values out of 100,000. Sometimes it is 10,000 values
out of 1,000,000. Low cardinality doesnt imply single-digit counts of distinct values.
Experimentation is the way to discover if a bitmap is a good idea for your application. In
general, if you have a large, mostly read-only environment with lots of ad hoc queries, a set
of bitmap indexes may be exactly what you need.

Вам также может понравиться