Вы находитесь на странице: 1из 29

Bitmap Index vs. B-tree Index: Which and When?

by Vivek Sharma

Understanding the proper application of each index can have a big impact on performance.

Published 2005

Conventional wisdom holds that bitmap indexes are most appropriate for columns having low distinct values--such as GENDER,
MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable
for systems in which data is not frequently updated by many concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a
column with 100-percent unique values (a column candidate for primary key) is as efficient as a B-tree index.

In this article I'll provide some examples, along with optimizer decisions, that are common for both types of indexes on a low-cardinality
column as well as a high-cardinality one. These examples will help DBAs understand that the usage of bitmap indexes is not in fact
cardinality dependent but rather application dependent.

Comparing the Indexes


There are several disadvantages to using a bitmap index on a unique column--one being the need for sufficient space (and Oracle does not
recommend it). However, the size of the bitmap index depends on the cardinality of the column on which it is created as well as the data
distribution. Consequently, a bitmap index on the GENDER column will be smaller than a B-tree index on the same column. In contrast, a
bitmap index on EMPNO (a candidate for primary key) will be much larger than a B-tree index on this column. But because fewer users
access decision-support systems (DSS) systems than would access transaction-processing (OLTP) ones, resources are not a problem for
these applications.

To illustrate this point, I created two tables, TEST_NORMAL and TEST_RANDOM. I inserted one million rows into the TEST_NORMAL
table using a PL/SQL block, and then inserted these rows into the TEST_RANDOM table in random order:

Create table test_normal (empno number(10), ename varchar2(30), sal


number(10));

Begin
For i in 1..1000000
Loop
Insert into test_normal
values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
If mod(i, 10000) = 0 then
Commit;
End if;
End loop;
End;
/

Create table test_random


as
select /*+ append */ * from test_normal order by dbms_random.random;

SQL> select count(*) "Total Rows" from test_normal;

Total Rows
----------
1000000

Elapsed: 00:00:01.09

SQL> select count(distinct empno) "Distinct Values" from test_normal;

Distinct Values
---------------
1000000
Elapsed: 00:00:06.09
SQL> select count(*) "Total Rows" from test_random;

Total Rows
----------
1000000

Elapsed: 00:00:03.05
SQL> select count(distinct empno) "Distinct Values" from test_random;

Distinct Values
---------------
1000000

Elapsed: 00:00:12.07

Note that the TEST_NORMAL table is organized and that the TEST_RANDOM table is randomly created and hence has disorganized data. In the
above table, column EMPNO has 100-percent distinct values and is a good candidate to become a primary key. If you define this column as a primary
key, you will create a B-tree index and not a bitmap index because Oracle does not support bitmap primary key indexes.
To analyze the behavior of these indexes, we will perform the following steps:

1. On TEST_NORMAL:

a. Create a bitmap index on the EMPNO column and execute some queries with equality predicates.

b. Create a B-tree index on the EMPNO column, execute some queries with equality predicates, and compare the
logical and physical I/Os done by the queries to fetch the results for different sets of values.

2. On TEST_RANDOM:

a. Same as Step 1A.

b. Same as Step 1B.

2. On TEST_NORMAL:

a. Same as Step 1A, except that the queries are executed within a range of predicates.

b. Same as Step 1B, except that the queries are executed within a range of predicates. Now compare the statistics.

2. On TEST_RANDOM:

a. Same as Step 3A.

b. Same as Step 3B.

2. On TEST_NORMAL:

a. Create a bitmap index on the SAL column, and then execute some queries with equality predicates and some with
range predicates.

b. Create a B-tree index on the SAL column, and then execute some queries with equality predicates and some with
range predicates (same set of values as in Step 5A). Compare the I/Os done by the queries to fetch the results.

2. Add a GENDER column to both of the tables, and update the column with three possible values: M for male, F for female, and
null for N/A. This column is updated with these values based on some condition.

3. Create a bitmap index on this column, and then execute some queries with equality predicates.

4. Create a B-tree index on the GENDER column, and then execute some queries with equality predicates. Compare to results
from Step 7.

Steps 1 to 4 involve a high-cardinality (100-percent distinct) column, Step 5 a normal-cardinality column, and Steps 7 and 8 a low-
cardinality column.

Step 1A (on TEST_NORMAL)


In this step, we will create a bitmap index on the TEST_NORMAL table and then check for the size of this index, its clustering factor, and
the size of the table. Then we will run some queries with equality predicates and note the I/Os of these queries using this bitmap index.
SQL> create bitmap index normal_empno_bmx on test_normal(empno);

Index created.

Elapsed: 00:00:29.06
SQL> analyze table test_normal compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

Elapsed: 00:00:19.01
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024
"Size in MB"
2 from user_segments
3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX');

SEGMENT_NAME Size in MB
------------------------------------ ---------------
TEST_NORMAL 50
NORMAL_EMPNO_BMX 28

Elapsed: 00:00:02.00
SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
------------------------------ ---------------------------------
NORMAL_EMPNO_BMX 1000000

Elapsed: 00:00:00.00

You can see in the preceding table that the size of the index is 28MB and that the clustering factor is equal to the number of rows in the
table. Now let's execute the queries with equality predicates for different sets of values:

SQL> set autotrace only


SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old 1: select * from test_normal where empno=&empno
new 1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
d=1 Bytes=34)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Step 1B (on TEST_NORMAL)


Now we will drop this bitmap index and create a B-tree index on the EMPNO column. As before, we will check for the size of the index and
its clustering factor and execute the same queries for the same set of values, to compare the I/Os.

SQL> drop index NORMAL_EMPNO_BMX;

Index dropped.

SQL> create index normal_empno_idx on test_normal(empno);

Index created.

SQL> analyze table test_normal compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024


"Size in MB"
2 from user_segments
3 where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');

SEGMENT_NAME Size in MB
---------------------------------- ---------------
TEST_NORMAL 50
NORMAL_EMPNO_IDX 18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
---------------------------------- ----------------------------------
NORMAL_EMPNO_IDX 6210

It is clear in this table that the B-tree index is smaller than the bitmap index on the EMPNO column. The clustering factor of the B-tree index
is much nearer to the number of blocks in a table; for that reason, the B-tree index is efficient for range predicate queries.

Now we'll run the same queries for the same set of values, using our B-tree index.

SQL> set autot trace


SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old 1: select * from test_normal where empno=&empno
new 1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
d=1 Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
ost=3 Card=1)

Statistics
----------------------------------------------------------
29 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

As you can see, when the queries are executed for different set of values, the number of consistent gets and physical reads are identical for
bitmap and B-tree indexes on a 100-percent unique column.

BITMAP B-TREE
EMPNO
Consistent Reads Physical Reads Consistent Reads Physical Reads
5 0 1000 5 0
5 2 2398 5 2
5 2 8545 5 2
5 2 98008 5 2
5 2 85342 5 2
5 2 128444 5 2
5 2 858 5 2

Step 2A (on TEST_RANDOM)


Now we'll perform the same experiment on TEST_RANDOM:

SQL> create bitmap index random_empno_bmx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024


"Size in MB"
2 from user_segments
3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');
SEGMENT_NAME Size in MB
------------------------------------ ---------------
TEST_RANDOM 50
RANDOM_EMPNO_BMX 28

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
------------------------------ ---------------------------------
RANDOM_EMPNO_BMX 1000000

Again, the statistics (size and clustering factor) are identical to those of the index on the TEST_NORMAL table:

SQL> select * from test_random where empno=&empno;


Enter value for empno: 1000
old 1: select * from test_random where empno=&empno
new 1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4
Card=1 Bytes=34)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Step 2B (on TEST_RANDOM)


Now, as in Step 1B, we will drop the bitmap index and create a B-tree index on the EMPNO column.

SQL> drop index RANDOM_EMPNO_BMX;

Index dropped.

SQL> create index random_empno_idx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all
indexes for all indexed columns;
Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024


"Size in MB"
2 from user_segments
3 where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');

SEGMENT_NAME Size in MB
---------------------------------- ---------------
TEST_RANDOM 50
RANDOM_EMPNO_IDX 18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
---------------------------------- ----------------------------------
RANDOM_EMPNO_IDX 999830

This table shows that the size of the index is equal to the size of this index on TEST_NORMAL table but the clustering factor is much
nearer to the number of rows, which makes this index inefficient for range predicate queries (which we'll see in Step 4). This clustering
factor will not affect the equality predicate queries because the rows have 100-percent distinct values and the number of rows per key is 1.

Now let's run the queries with equality predicates and the same set of values.

SQL> select * from test_random where empno=&empno;


Enter value for empno: 1000
old 1: select * from test_random where empno=&empno
new 1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4
Card=1 Bytes=34)
2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE)
(Cost=3 Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
5 consistent gets
0 physical reads
0 redo size
515 bytes sent via SQL*Net to client
499 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Again, the results are almost identical to those in Steps 1A and 1B. The data distribution did not affect the amount of consistent gets and
physical reads for a unique column.
Step 3A (on TEST_NORMAL)
In this step, we will create the bitmap index (similar to Step 1A). We know the size and the clustering factor of the index, which equals the
number of rows in the table. Now let's run some queries with range predicates.

SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old 1: select * from test_normal where empno between &range1 and
&range2
new 1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299
Bytes=78166)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451
Card=2299 Bytes=78166)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
331 consistent gets
0 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed

Step 3B (on TEST_NORMAL)


In this step, we'll execute the queries against the TEST_NORMAL table with a B-tree index on it.

SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old 1: select * from test_normal where empno between &range1 and
&range2
new 1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299
Bytes=78166)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23
Card=2299 Bytes=78166)
2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE)
(Cost=8 Card=2299)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
329 consistent gets
15 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed

When these queries are executed for different sets of ranges, the results below show:

BITMAP B-TREE
EMPNO (Range)
Consistent Reads Physical Reads Consistent Reads Physical Reads
331 0 1-2300 329 0
285 0 8-1980 283 0
346 19 1850-4250 344 16
427 31 28888-31850 424 28
371 27 82900-85478 367 23
2157 149 984888-1000000 2139 35
As you can see, the number of consistent gets and physical reads with both indexes is again nearly identical. The last range (984888-
1000000) returned almost 15,000 rows, which was the maximum number of rows fetched for all the ranges given above. So when we asked
for a full table scan (by giving the hint /*+ full(test_normal) */ ), the consistent read and physical read counts were 7,239 and
5,663, respectively.

Step 4A (on TEST_RANDOM)


In this step, we will run the queries with range predicates on the TEST_RANDOM table with bitmap index and check for consistent gets and
physical reads. Here you'll see the impact of the clustering factor.

SQL>select * from test_random where empno between &range1 and &range2;


Enter value for range1: 1
Enter value for range2: 2300
old 1: select * from test_random where empno between &range1 and
&range2
new 1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:08.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299
Bytes=78166)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453
Card=2299 Bytes=78166)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2463 consistent gets
1200 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed

Step 4B (on TEST_RANDOM)


In this step, we will execute the range predicate queries on TEST_RANDOM with a B-tree index on it. Recall that the clustering factor of
this index was very close to the number of rows in a table (and thus inefficient). Here's what the optimizer has to say about that:

SQL> select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old 1: select * from test_random where empno between &range1 and
&range2
new 1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:03.04

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299
Bytes=78166)
1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299
Bytes=78166)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6415 consistent gets
4910 physical reads
0 redo size
111416 bytes sent via SQL*Net to client
2182 bytes received via SQL*Net from client
155 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2300 rows processed
The optimizer opted for a full table scan rather than using the index because of the clustering factor:

BITMAP B-TREE
EMPNO (Range)
Consistent Reads Physical Reads Consistent Reads Physical Reads
2463 1200 1-2300 6415 4910
2114 31 8-1980 6389 4910
2572 1135 1850-4250 6418 4909
3173 1620 28888-31850 6456 4909
2762 1358 82900-85478 6431 4909
7254 3329 984888-1000000 7254 4909
For the last range (984888-1000000) only, the optimizer opted for a full table scan for the bitmap index, whereas for all ranges, it opted for
a full table scan for the B-tree index. This disparity is due to the clustering factor: The optimizer does not consider the value of the clustering
factor when generating execution plans using a bitmap index, whereas for a B-tree index, it does. In this scenario, the bitmap index
performs more efficiently than the B-tree index.

The following steps reveal more interesting facts about these indexes.

Step 5A (on TEST_NORMAL)


Create a bitmap index on the SAL column of the TEST_NORMAL table. This column has normal cardinality.

SQL> create bitmap index normal_sal_bmx on test_normal(sal);

Index created.

SQL> analyze table test_normal compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

Now let's get the size of the index and the clustering factor.

SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024


"Size in MB"
2* from user_segments
3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');

SEGMENT_NAME Size in MB
------------------------------ --------------
TEST_NORMAL 50
NORMAL_SAL_BMX 4

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
------------------------------
----------------------------------
NORMAL_SAL_BMX 6001

Now for the queries. First run them with equality predicates:

SQL> set autot trace


SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old 1: select * from test_normal where sal=&sal
new 1: select * from test_normal where sal=1869

164 rows selected.

Elapsed: 00:00:00.08

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168
Bytes=4032)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39
Card=168 Bytes=4032)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
165 consistent gets
0 physical reads
0 redo size
8461 bytes sent via SQL*Net to client
609 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
164 rows processed

and then with range predicates:

SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old 1: select * from test_normal where sal between &sal1 and &sal2
new 1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:05.00

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
=2001024)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
Bytes=2001024)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11778 consistent gets
5850 physical reads
0 redo size
4123553 bytes sent via SQL*Net to client
61901 bytes received via SQL*Net from client
5584 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
83743 rows processed

Now drop the bitmap index and create a B-tree index on TEST_NORMAL.

SQL> create index normal_sal_idx on test_normal(sal);

Index created.

SQL> analyze table test_normal compute statistics for table for all
indexes for all indexed columns;

Table analyzed.

Take a look at the size of the index and the clustering factor.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024


"Size in MB"
2 from user_segments
3 where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');

SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50
NORMAL_SAL_IDX 17

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME CLUSTERING_FACTOR
------------------------------ ----------------------------------
NORMAL_SAL_IDX 986778

In the above table, you can see that this index is larger than the bitmap index on the same column. The clustering factor is also near the number of
rows in this table.
Now for the tests; equality predicates first:

SQL> set autot trace


SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old 1: select * from test_normal where sal=&sal
new 1: select * from test_normal where sal=1869

164 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168
Bytes=4032)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169
Card=168 Bytes=4032)
2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE)
(Cost=3 Card=168)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
177 consistent gets
0 physical reads
0 redo size
8461 bytes sent via SQL*Net to client
609 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
164 rows processed

...and then, range predicates:

SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old 1: select * from test_normal where sal between &sal1 and &sal2
new 1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:04.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
=2001024)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
Bytes=2001024)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11778 consistent gets
3891 physical reads
0 redo size
4123553 bytes sent via SQL*Net to client
61901 bytes received via SQL*Net from client
5584 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
83743 rows processed

When the queries were executed for different set of values, the resulting output, as shown in the tables below, reveals that the numbers of consistent
gets and physical reads are identical.

BITMAP B-TREE
SAL (Equality) Rows Fetched
Consistent Reads Physical Reads Consistent Reads Physical Reads
165 0 1869 177 164
169 163 3548 181 167
174 166 6500 187 172
75 69 7000 81 73
177 163 2500 190 175

BITMAP B-TREE
SAL (Range) Rows Fetched
Consistent Reads Physical Reads Consistent Reads Physical Reads
11778 5850 1500-2000 11778 3891 83743
11765 5468 2000-2500 11765 3879 83328
11753 5471 2500-3000 11753 3884 83318
17309 5472 3000-4000 17309 3892 166999
39398 5454 4000-7000 39398 3973 500520
For range predicates the optimizer opted for a full table scan for all the different set of values—it didn't use the indexes at all—whereas for
equality predicates, the optimizer used the indexes. Again, the consistent gets and physical reads are identical.

Consequently, you can conclude that for a normal-cardinality column, the optimizer decisions for the two types of indexes were the same
and there were no significant differences between the I/Os.

Step 6 (add a GENDER column)


Before performing the test on a low-cardinality column, let's add a GENDER column to this table and update it with M, F, and null values.

SQL> alter table test_normal add GENDER varchar2(1);

Table altered.

SQL> select GENDER, count(*) from test_normal group by GENDER;

S COUNT(*)
- ----------
F 333769
M 499921
166310

3 rows selected.

The size of the bitmap index on this column is around 570KB, as indicated in the table below:

SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);

Index created.

Elapsed: 00:00:02.08

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024


"Size in MB"
2 from user_segments
3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');

SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50
NORMAL_GENDER_BMX .5625
2 rows selected.

In contrast, the B-tree index on this column is 13MB in size, which is much bigger than the bitmap index on this column.

SQL> create index normal_GENDER_idx on test_normal(GENDER);

Index created.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024


"Size in MB"
2 from user_segments
3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');

SEGMENT_NAME Size in MB
------------------------------ ---------------
TEST_NORMAL 50
NORMAL_GENDER_IDX 13

2 rows selected.

Now, if we execute a query with equality predicates, the optimizer will not make use of this index, be it a bitmap or a B-tree. Rather, it will prefer a full
table scan.

SQL> select * from test_normal where GENDER is null;

166310 rows selected.

Elapsed: 00:00:06.08

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310
Bytes=4157750)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310
Bytes=4157750)

SQL> select * from test_normal where GENDER='M';

499921 rows selected.

Elapsed: 00:00:16.07

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921
Bytes=12498025)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601
Card=499921Bytes=12498025)

SQL>select * from test_normal where GENDER='F'


/

333769 rows selected.

Elapsed: 00:00:12.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte
s=8344225)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769
Bytes=8344225)

Conclusions
Now that we understood how the optimizer reacts to these techniques, let's examine a scenario that clearly demonstrates the best
respective applications of bitmap indexes and B-tree indexes.

With a bitmap index on the GENDER column in place, create another bitmap index on the SAL column and then execute some queries.
The queries will be re-executed with B-tree indexes on these columns.

From the TEST_NORMAL table, you need the employee number of all the male employees whose monthly salaries equal any of the
following values:

1000
1500
2000
2500
3000
3500
4000
4500
Thus:

SQL>select * from test_normal


where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and
GENDER='M';

This is a typical data warehouse query, which, of course, you should never execute on an OLTP system. Here are the results with the bitmap index in
place on both columns:

SQL>select * from test_normal


where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and
GENDER='M';

1453 rows selected.

Elapsed: 00:00:02.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754
Bytes=18850)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198
Card=754 Bytes=18850)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP AND
4 3 BITMAP OR
5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1353 consistent gets
920 physical reads
0 redo size
75604 bytes sent via SQL*Net to client
1555 bytes received via SQL*Net from client
98 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1453 rows processed

And with the B-tree index in place:

SQL>select * from test_normal


where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and
GENDER='M';

1453 rows selected.

Elapsed: 00:00:03.01

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754
Bytes=18850)
1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754
Bytes=18850)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6333 consistent gets
4412 physical reads
0 redo size
75604 bytes sent via SQL*Net to client
1555 bytes received via SQL*Net from client
98 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1453 rows processed

As you can see here, with the B-tree index, the optimizer opted for a full table scan, whereas in the case of the bitmap index, it used the
index to answer the query. You can deduce performance by the number of I/Os required to fetch the result.

In summary, bitmap indexes are best suited for DSS regardless of cardinality for these reasons:

• With bitmap indexes, the optimizer can efficiently answer queries that include AND, OR, or XOR. (Oracle supports dynamic B-
tree-to-bitmap conversion, but it can be inefficient.)
• With bitmaps, the optimizer can answer queries when searching or counting for nulls. Null values are also indexed in bitmap
indexes (unlike B-tree indexes).

• Most important, bitmap indexes in DSS systems support ad hoc queries, whereas B-tree indexes do not. More specifically, if
you have a table with 50 columns and users frequently query on 10 of them—either the combination of all 10 columns or
sometimes a single column—creating a B-tree index will be very difficult. If you create 10 bitmap indexes on all these columns,
all the queries can be answered by these indexes, whether they are queries on all 10 columns, on 4 or 6 columns out of the 10,
or on a single column. The AND_EQUAL hint provides this functionality for B-tree indexes, but no more than five indexes can be
used by a query. This limit is not imposed with bitmap indexes.

In contrast, B-tree indexes are well suited for OLTP applications in which users' queries are relatively routine (and well tuned before
deployment in production), as opposed to ad hoc queries, which are much less frequent and executed during nonpeak business hours.
Because data is frequently updated in and deleted from OLTP applications, bitmap indexes can cause a serious locking problem in these
situations.

The data here is fairly clear. Both indexes have a similar purpose: to return results as fast as possible. But your choice of which one to use
should depend purely on the type of application, not on the level of cardinality.

Bit Map Index

Oracle bitmap indexes are very different from standard b-tree indexes. In bitmap
structures, a two-dimensional array is created with one column for every row in the table
being indexed. Each column represents a distinct value within the bitmapped index. This
two-dimensional array represents each value within the index multiplied by the number
of rows in the table.

At row retrieval time, Oracle decompresses the bitmap into the RAM data buffers so it
can be rapidly scanned for matching values. These matching values are delivered to
Oracle in the form of a Row-ID list, and these Row-ID values may directly access the
required information.

The real benefit of bitmapped indexing occurs when one table includes multiple
bitmapped indexes. Each individual column may have low cardinality. The creation of
multiple bitmapped indexes provides a very powerful method for rapidly answering
difficult SQL queries.

A bitmap merge operation build ROWID lists


Using this bitmap merge methodology, Oracle can provide sub-second response time
when working against multiple low-cardinality columns.

Also see these important notes on Oracle bitmap index maximum distinct values

For example, assume there is a motor vehicle database with numerous low-cardinality
columns such as car_color, car_make, car_model, and car_year. Each column contains
less than 100 distinct values by themselves, and a b-tree index would be fairly useless in
a database of 20 million vehicles.

However, combining these indexes together in a query can provide blistering response
times a lot faster than the traditional method of reading each one of the 20 million rows in
the base table. For example, assume we wanted to find old blue Toyota Corollas
manufactured in 1981:

select
license_plat_nbr
from
vehicle
where
color = ‘blue’
and
make = ‘toyota’
and
year = 1981;

Oracle uses a specialized optimizer method called a bitmapped index merge to service
this query. In a bitmapped index merge, each Row-ID, or RID, list is built independently
by using the bitmaps, and a special merge routine is used in order to compare the RID
lists and find the intersecting values.

As the number if distinct values increases, the size of the bitmap increases exponentially,
such that an index with 100 values may perform thousands of times faster than a bitmap
index on 1,000 distinct column values.

Also, remember that bitmap indexes are only suitable for static tables and materialized
views which are updated at nigh and rebuilt after batch row loading. If your tables are
not read-only during query time, DO NOT consider using bitmap indexes!

• 1 - 7 distinct key values - Queries against bitmap indexes with a low cardinality
are very fast.
• 8-100 distinct key values - As the number if distinct values increases,
performance decreases proportionally.
• 100 - 10,000 distinct key values - Over 100 distinct values, the bitmap indexes
become huge and SQL performance drops off rapidly.
• Over 10,000 distinct key values - At this point, performance is ten times slower
than an index with only 100 distinct values.
Oracle Bitmap indexes are a very powerful Oracle feature, but they can be tricky!

You will want a bitmap index when:

1 - Table column is low cardinality - As a ROUGH guide, consider a bitmap for any
index with less than 100 distinct values

select region, count(*) from sales group by region;

2 - The table has LOW DML - You must have low insert./update/delete activity.
Updating bitmapped indexes take a lot of resources, and bitmapped indexes are best for
largely read-only tables and tables that are batch updated nightly.

3 - Multiple columns - Your SQL queries reference multiple, low cardinality values in
there where clause. Oracle cost-based SQL optimizer (CBO) will scream when you have
bitmap indexes on .

Troubleshooting Oracle bitmap indexes:

Some of the most common problems when implementing bitmap indexes include:

1. Small table - The CBO may force a full-table scan if your table is small!

2. Bad stats - Make sure you always analyze the bitmap with dbms_stats right after
creation:

CREATE BITMAP INDEX


emp_bitmap_idx
ON index_demo (gender);

exec dbms_stats.gather_index_stats(OWNNAME=>'SCOTT',
INDNAME=>'EMP_BITMAP_IDX');

3. Test with a hint - To force the use of your new bitmap index, just use a Oracle
INDEX hint:

select /*+ index(emp emp_bitmap_idx) */


count(*)
from
emp, dept
where
emp.deptno = dept.deptno;
Question: I know that bitmap indexes are for columns with a small number of distinct
values, but what is the rule for cardinality. Is 500 distinct values too many for a bitmap
index? Does the length of the index key play a role in the maximum number of distinct
values? How do I tell when m y bitmap index has too many values?

Answer: You are correct that the maximum cardinality for bitmap indexes is a function
of the number of distinct values and the size of the key, but far and away it is the number
of distinct values that is the largest contributing factor to the maximum number of distinct
values you can have in a bitmap index.

As the number if distinct values increases, the size of the bitmap increases exponentially,
such that an index with 100 values may perform thousands of times faster than a bitmap
index on 1,000 distinct column values. The answer, as with all Oracle tuning questions is
"It depends".

Also, remember that bitmap indexes are only suitable for static tables and materialized
views which are updated at nigh and rebuilt after batch row loading. If your tables are
not read-only during query time, DO NOT consider using bitmap indexes!

So, which is the right maximum values for your bitmap index? Who knows? You will
need to run performance benchmarks on your unique database to see!

Benchmarks have shown these trends in SQL performance and the number of distinct
bitmap index values (as a percentage of total table rows), and the clustering factor for the
index column. All else being equal, cardinality is one of the most important factors in
deciding to use a bitmap index, but there are rare cases where a bitmap on a column with
10,000 key values might be appropriate.

So, what is the maximum threshold for unique values in a bitmap index? The answer (as
with all Oracle tuning questions) is "It depends". To understand how a bitmap works,
remember that a bitmap is a two dimensional matrix with two axis. The number of rows
in the table is one axis, and the number of distinct key values is the other axis

Conceptualize a bitmap index as a two-dimensional array

Hence, a bitmap index on a million row table with ten distinct values in the index column
will have ten million cells in the bitmap. Conversely, a bitmap on a million rows table
with an index column with 1,000 distinct values will be much larger, with a billion cells.

At this point, let’s note that Oracle performs bitmap index compression, and that is why
the clustering_factor is important. If a bitmap index column has a “good” clustering
factor (e.g. close to the number of blocks in the table) then the index values are adjacent,
and Oracle will be able to compress the index far better than a bitmap on an un-clustered
column.

Bitmap compression, row clustering, and column cardinality

It should be clear that it’s not exclusively the number of distinct values in the index that
governs our choice of using a bitmap, it’s also a function of the percentage the distinct
values as a percentage of rows in the table and the clustering of the column values (as per
the clustering_factor column in the dba_indexes view). Hence, a decision for a bitmap is
a function of these criteria:

• The ratio of rows to distinct values: A more accurate estimate of the suitability
of a bitmap index is the ratio of distinct rows to the number of total rows in a
table.
• Clustering factor: The clustering of the index keys to the physical data blocks
has an influence on bitmap indexes.

In a table where the bitmap columns appear in the same physical order as the rows,
related column values will be adjacent on each data block. The clustering_factor will be
close to the value of the blocks column in dba_data_files, and Oracle will be able to
compress the bitmap index.

When a bitmap index sees repeating column values, the matching rows can be omitted
from the bitmap. As a bitmap is being created or updated, Oracle looks to the previous
index entry. If it is the same, Oracle replaces the bitmap entry with a “same as prior”
flag, and continues until a new key value is detected.

At an extreme case, consider a million rows table where the column has 100,000 distinct
values and clustering_factor ~= blocks.

Because all adjacent columns values are grouped together, the bitmap axis goes from
1,000,000 rows down to 100,000, making for a highly compressed bitmap.

In the compressed example below, all “East:” rows appear adjacent in rows 1-5, all
“North” values resides on rows 6-13, all “South values” are in rows 14-15, and all
“West” rows reside on rows 16-18:
Again, in most cases, there will not be a lot of adjacent index values, so it quite rare to
see extensive compression.

Exceptions to the rule! High cardinality bitmap indexes

Oracle is not a science and for every rule we see exceptions! At first blush, the Oracle
11g documentation can appear contradictory. In some places we see excellent rules-of-
thumb for bitmap index cardinality.

“In a data warehouse, B-tree indexes should only be used for unique columns or
other columns with very high cardinalities (that is, columns that are almost unique).”

“You typically want to use bitmap indexes on low degree of cardinality columns and
B-tree indexes on high degree of cardinality columns.

As a general rule, a cardinality of under 1% makes a good candidate for a bitmap


index.”

However, there are rare cases where high cardinality columns can be used in bitmap
indexes, and Oracle suggests that these conditions are acceptable for a bitmap index:

• If the number of distinct values of a column is less than 1% of the number of rows
in the table.
• If the values in a column are repeated more than 100 times, then the column is a
candidate for a bitmap index.

Now that we understand bitmap index compression, it should be clear that there are rare
cases where high cardinality columns might be candidates for bitmap indexes.
So, which are the right maximum values for your bitmap index? Who knows? You will
need to run performance benchmarks on your unique database to see!

The only way to know for sure is to perform real-world benchmark testing your real SQL
workloads. You can use the dbms_workload_capture procedure to grab a representative
SQL workload from your production database and replay the workload in a test instance
using your new bitmap index.

Bitmap index usage

Remember how bitmap indexes are generally used. Because they are low cardinality,
they have very little value by themselves, and the power comes when Oracle combines
multiple bitmaps in a bitmap merge operation:

We also have to deal with the volatility of low cardinality columns. For example, some
hippie states have adopted a five gender system, male, female, male becoming female,
female becoming male, and eunuch, changes that cost billions of dollars to implement.

B-Tree Index

B-Tree Index

By default, the Oracle creates a b_tree index. In a b-tree, you walk the branches until you
get to the node that has the data you want to use. In the classic b-tree structure, there are
branches from the top that lead to leaf nodes that contain the data. If I wanted to find the
rowid for the number 28 in the b-tree defined in Figure 5.3, I would start at the top or
header block.

Is my number greater or less that 50? Well 28 is less than 50, so I move to the branch
marked 25. Is 28 greater or less that 25? Since 28 is greater than 25, I move to the leaf
node marked 26-49. I scan this node for the rowid of the number 28. The key to the b-
tree in Figure 5.3 is that I can find any number from one to 100 by reading no more than
three nodes.

The Oracle database implements the b-tree index in a little different manner. An Oracle
b-tree starts with only two nodes, one header and one leaf. The header contains a pointer
to the leaf block and the values stored in the leaf block. As the index grows leaf bocks
are added to the index (Figure 5.4).

To find a specific row, we look at the header to find the range of values in each leaf and
then go directly to the leaf node that contains the value we are looking for. In the index
in Figure 5.4, any row can be found by reading two nodes. Since the header contains
only pointers to leaf blocks, a single header node can support a very large number
(hundreds) of leaf nodes.

If the header block fills, then a new header block is established, and the former header
node becomes a branch node. This is called a three level b-tree (Figure 5.5).

In Figure 5.5, you can find any value in any leaf node by reading no more than three
blocks. I can also create a multicolumn index, also called a concatenated or complex
index.

SQL> create index sales_keys


2 on sales (book_key, store_key, order_number);

Index created.

Here, we created an index called sales_keys on three columns of the SALES table. A
multicolumn index can be used by the database but only from the first or lead column.
Our sales_keys index can be used in the following query.

select
order_number,
quantity
from
sales
where
book_key = 'B103';

Note that the lead column of the index is the book_key, so the database can use the index
in the query above. I can also use the sales_keys index in the queries below.
select
order_number,
quantity
from
sales
where
book_key = 'B103'
and
store_key = 'S105'
and
order_number = 'O168';

However, the database cannot use that index in the following query because the WHERE
clause does not contain the index lead column.

select
order_number,
quantity
from
sales
where
store_key = 'S105'
and
order_number = 'O168';

Also, note that in the query below, the database can answer the query from the index and
so will not access the table at all.

select
order_number
from
sales
where
store_key = 'S105'
and
book_key = 'B108';

As you can see, b-tree indexes are very powerful. You must remember that a
multicolumn index cannot skip over columns, so the lead index column must be in the
WHERE clause filters. Oracle has used b-tree indexes for many years, and they are
appropriate from most of your indexing needs. However, the Oracle database provides
specialized indexes that can provide additional capabilities; the bit-mapped index and the
function-based index.

Creating Oracle Indexes

Once you have decided you need to create an index you use the create index command.
The command is pretty straightforward as seen in this example:
CREATE INDEX
ix_emp_01
ON
emp (deptno)
TABLESPACE
index_tbs;

This statement creates an index called IX_EMP_01. This index is built on the DEPTNO
column of the EMP table. We also defined the tablespace that the index should be created
in using the tablespace keyword, in this case we created it in the INDEX_TBS tablespace.

You can create indexes on multiple columns in a table. Say, for example, we wanted an
index on the EMP table columns EMPNO and DEPTNO. This is known as a
concatenated index, and it’s created this way:
CREATE INDEX ix_emp_01 ON emp (empno, deptno) TABLESPACE index_tbs;

Altering Oracle Indexes

If we create indexes, there may be times that we will want to change some attribute of
that index, such as where it is stored. Also, sometimes an index needs to be rebuilt to help
with performance. For cases like these, the alter index command is what we want.

Let’s look at an example of the use of the alter index command:


ALTER INDEX ix_emp_01 REBUILD TABLESPACE new_index;

In this example we use the alter index command to rebuild an index. The rebuild keyword
is what tells Oracle to rebuild the index. When we use the tablespace keyword, followed
by a tablespace name, we are telling Oracle which tablespace to recreate the rebuilt index
in. By default Oracle will create the rebuilt index in the same tablespace.

The alter index command allows you to rename a tablespace using the rename to
keyword as seen in this example:
ALTER INDEX ix_emp_01 RENAME TO ix_emp_01_old;

In this case we have renamed the ix_emp_01 index to ix_emp_01_old. All of the data
dictionary entries will be changed by Oracle to reflect the new name (we will discuss
indexes and the data dictionary in a moment)

Dropping Oracle Indexes

Sometimes what we create we must destroy. When it’s time to remove an index, the drop
index command is what is needed. The drop index command is pretty straight forward as
seen in this example:
DROP INDEX ix_emp_01_old;

Generate create index syntax


You can punch create index DDL syntax is using the dbms_metadata package, something
like this:
select
dbms_metadata.get_ddl('INDEX',index_name)
from
user_indexes
where
xxxx;

Вам также может понравиться