Академический Документы
Профессиональный Документы
Культура Документы
, r belongs here.
Else, r could belong to bucket h
Level
(r) or bucket h
Level
(r) + N
R
;
must apply h
Level+1
(r) to find out.
Silberschatz, Korth and Sudarshan 12.83 Database System Concepts
Overview of LH File
In the middle of a round.
Level
h
Buckets that existed at the
beginning of this round:
this is the range of
Next
Bucket to be split
of other buckets) in this round
Level
h search key value ) (
search key value ) (
Buckets split in this round:
If
is in this range, must use
h
Level+1
`split image' bucket.
to decide if entry is in
created (through splitting
`split image' buckets:
Silberschatz, Korth and Sudarshan 12.84 Database System Concepts
Linear Hashing (Contd.)
Insert: Find bucket by applying h
Level
/ h
Level+1
:
If bucket to insert into is full:
Add overflow page and insert data entry.
(Maybe) Split Next bucket and increment Next.
Can choose any criterion to `trigger split, usually but not necessarily
when an overflow occurs.
Since buckets are split round-robin, long overflow chains dont develop!
Doubling of directory in Extendible Hashing is similar; switching of hash
functions is implicit in how the # of bits examined is increased.
Silberschatz, Korth and Sudarshan 12.85 Database System Concepts
Example of Linear Hashing
On split, h
Level+1
is used to re-
distribute entries.
0
h h
1
(This info
is for illustration
only!)
Level=0, N=4
00
01
10
11
000
001
010
011
(The actual contents
of the linear hashed
file)
Next=0
PRIMARY
PAGES
Data entry r
with h(r)=5
Primary
bucket page
44* 36*
32*
25* 9* 5*
14* 18* 10* 30*
31* 35* 11* 7*
0
h h
1
Level=0
00
01
10
11
000
001
010
011
Next=1
PRIMARY
PAGES
44* 36*
32*
25* 9* 5*
14* 18* 10* 30*
31* 35* 11* 7*
OVERFLOW
PAGES
43*
00
100
Silberschatz, Korth and Sudarshan 12.86 Database System Concepts
Example: End of a Round
0
h h
1
22*
00
01
10
11
000
001
010
011
00
100
Next=3
01
10
101
110
Level=0
PRIMARY
PAGES
OVERFLOW
PAGES
32*
9*
5*
14*
25*
66* 10* 18* 34*
35* 31* 7* 11* 43*
44* 36*
37* 29*
30*
0
h h
1
37*
00
01
10
11
000
001
010
011
00
100
10
101
110
Next=0
Level=1
111
11
PRIMARY
PAGES
OVERFLOW
PAGES
11
32*
9* 25*
66* 18* 10* 34*
35* 11*
44* 36*
5* 29*
43*
14* 30* 22*
31* 7*
50*
page = 2K
Silberschatz, Korth and Sudarshan 12.87 Database System Concepts
Comparison of Ordered Indexing and Hashing
Cost of periodic re-organization
Relative frequency of insertions and deletions
Is it desirable to optimize average access time at the expense of
worst-case access time?
Expected type of queries:
Hashing is generally better at retrieving records having a specified
value of the key.
If range queries are common, ordered indices are to be preferred
Silberschatz, Korth and Sudarshan 12.88 Database System Concepts
Index Definition in SQL
Create an index
create index <index-name> on <relation-name>
<attribute-list>)
E.g.: create index b-index on branch(branch-name)
Use create unique index to indirectly specify and enforce the
condition that the search key is a candidate key is a candidate
key.
Not really required if SQL unique integrity constraint is supported
To drop an index
drop index <index-name>
Silberschatz, Korth and Sudarshan 12.89 Database System Concepts
Multiple-Key Access
Use multiple indices for certain types of queries.
Example:
select account-number
from account
where branch-name = Perryridge and balance - 1000
Possible strategies for processing query using indices on single
attributes:
1. Use index on branch-name to find accounts with balances of $1000; test
branch-name = Perryridge.
2. Use index on balance to find accounts with balances of $1000; test
branch-name = Perryridge.
3. Use branch-name index to find pointers to all records pertaining to the
Perryridge branch. Similarly use index on balance. Take intersection of
both sets of pointers obtained.
Silberschatz, Korth and Sudarshan 12.90 Database System Concepts
Indices on Multiple Attributes
With the where clause
where branch-name = Perryridge and balance = 1000
the index on the combined search-key will fetch only records
that satisfy both conditions.
Using separate indices in less efficient we may fetch many
records (or pointers) that satisfy only one of the conditions.
Can also efficiently handle
where branch-name - Perryridge and balance < 1000
But cannot efficiently handle
where branch-name < Perryridge and balance = 1000
May fetch many records that satisfy the first but not the
second condition.
Suppose we have an index on combined search-key
(branch-name, balance).
Silberschatz, Korth and Sudarshan 12.91 Database System Concepts
Indices on Multiple Attributes
Example: find all objects within the following boundaries:
0 <= X <= 10
0 <= Y <= 20
0 <= Z <= 30
Using an Index?
Solution: Multi-dimensional indexes e.g. R-trees
How to handle the symmetric case?, the case of several
attributes with ranges?
Silberschatz, Korth and Sudarshan 12.92 Database System Concepts
Why Sort?
A classic problem in computer science!
Data requested in sorted order
e.g., find students in increasing gpa order
Sorting is first step in bulk loading B+ tree index.
Sorting useful for eliminating duplicate copies in a collection of records
(Why?)
Sort-merge join algorithm involves sorting.
Problem: sort 1Gb of data with 1Mb of RAM.
why not virtual memory?
Silberschatz, Korth and Sudarshan 12.93 Database System Concepts
2-Way Sort: Requires 3 Buffers
Pass 1: Read a page, sort it, write it.
only one buffer page is used
Pass 2, 3, , etc.:
three buffer pages used.
Main memory buffers
INPUT 1
INPUT 2
OUTPUT
Disk
Disk
Silberschatz, Korth and Sudarshan 12.94 Database System Concepts
Two-Way External Merge Sort
Each pass we read + write each
page in file.
N pages in the file => the number of
passes
So toal cost is:
Idea: Divide and conquer: sort
subfiles and merge
(
= + log
2
1 N
(
( )
2 1
2
N N log +
Input file
1-page runs
2-page runs
4-page runs
8-page runs
PASS 0
PASS 1
PASS 2
PASS 3
9
3,4 6,2 9,4 8,7 5,6 3,1 2
3,4 5,6 2,6 4,9 7,8 1,3 2
2,3
4,6
4,7
8,9
1,3
5,6 2
2,3
4,4
6,7
8,9
1,2
3,5
6
1,2
2,3
3,4
4,5
6,6
7,8
Silberschatz, Korth and Sudarshan 12.95 Database System Concepts
General External Merge Sort
To sort a file with N pages using B buffer pages:
Pass 0: use B buffer pages. Produce sorted runs of B pages each.
Pass 2, , etc.: merge B-1 runs.
(
N B /
B Main memory buffers
INPUT 1
INPUT B-1
OUTPUT
Disk
Disk
INPUT 2
. . .
. . . . . .
More than 3 buffer pages. How can we utilize them?
Silberschatz, Korth and Sudarshan 12.96 Database System Concepts
Cost of External Merge Sort
Number of passes:
Cost = 2N * (# of passes)
E.g., with 5 buffer pages, to sort 108 page file:
Pass 0: = 22 sorted runs of 5 pages each (last run is only
3 pages)
Pass 1: = 6 sorted runs of 20 pages each (last run is only
8 pages)
Pass 2: 2 sorted runs, 80 pages and 28 pages
Pass 3: Sorted file of 108 pages
For example: 0 passes (main memory) = 1N (writing output not
considered)
1 pass = 3N
2passes = 5N
(
(
1
1
+
log /
B
N B
(
108 5 /
(
22 4 /