Академический Документы
Профессиональный Документы
Культура Документы
1
Outline
• Storing data: disks and files - 9.5-9.7
• Types of Indexes - Chapter 8.3
• B-trees - Chapter 10
2
Managing Free Blocks
• By the OS
3
Files of Records
Types of files:
• Heap file - unordered
• Sorted file
• Clustered file - sorted, plus a B-tree
Header
page Full pages
Header Data
page
Data
page
Directory Data
page 6
Page Formats
Issues to consider
• 1 page = fixed size (e.g. 8KB)
• Records:
– Fixed length
– Variable length
• Record id = RID
– Typically RID = (PageID, SlotNumber)
Why do we need RID’s in a relational DBMS ? 7
Page Formats
Fixed-length records: packed representation
Problems ?
8
Page Formats
Free space
Slot directory
Variable-length records
9
Record Formats
Fixed-length records --> all fields have fixed length
10
Record Formats
Variable length records
Record header
block block
header header
R1 R2 R2 R3
13
Modifications: Insertion
• File is unsorted (= heap file)
– add it to the end (easy )
• File is sorted:
– Is there space in the right block ?
• Yes: we are lucky, store it there
– Is there space in a neighboring block ?
• Look 1-2 blocks to the left/right, shift records
– If anything else fails, create overflow block
14
Overflow Blocks
Overflow
16
Modifications: Updates
• If new record is shorter than previous, easy
• If it is longer, need to shift records, create
overflow blocks
17
Record Formats: Fixed Length
F1 F2 F3 F4
L1 L2 L3 L4
20
Primary Index
• File is sorted on the index attribute
• Dense index: sequence of (key,pointer) pairs
10 10
20 20
30
40 30
40
50
60
50
70
80
60
70
80 21
Primary Index
• Sparse index
10 10
30 20
50
70 30
40
90
110
50
130
150
60
70
80
22
Primary Index with Duplicate
Keys
• Dense index:
10 10
20 10
30
40 10
20
50
60
20
70
80
20
30
40
23
Primary Index with Duplicate
Keys
• Sparse index: pointer to lowest search key
in each block: ...but
need to
search
here too
10 10
10 10
20 is 20
here... 30 10
20
20
• Search for 20 20
30
40 24
Primary Index with Duplicate
Keys
• Better: pointer to lowest new search key in
each block: 10
• Search for 20 10
10 10
20 20
20 is 30
...ok to
here... 40
30 search
50
30 from here
60
30
70
80
30
• Search for 15 ? 35 ? 40
50 25
Secondary Indexes
• To index other attributes than primary key
• Always dense (why ?)
10 20
10 30
20
20 30
20
20
30
10
30
30
20
10
30 26
Clustered/Unclustered
• Primary indexes = usually clustered
• Secondary indexes = usually unclustered
27
Clustered vs. Unclustered Index
Data entries
Data entries
(Index File)
(Data file)
CLUSTERED UNCLUSTERED
Secondary Indexes
• index other attributes than primary key
• index unsorted files (heap files)
• index clustered data
29
Applications of Secondary Indexes
• Secondary indexes needed for heap files
• Also for Clustered data:
Company(name, city), Product(pid, maker)
Select
Selectcity
city Select
Selectpid
pid
From
FromCompany,
Company,Product
Product From
FromCompany,
Company,Product
Product
Where
Wherename=maker
name=maker Where
Wherename=maker
name=maker
and
andpid=“p045”
pid=“p045” and
andcity=“Seattle”
city=“Seattle”
Products of company 1 Products of company 2 Products of company 3
32
B+ Trees Basics
• Parameter d = the degree
• Each node has >= d and <= 2d keys (except root)
30 120 240
Keys k < 30
Keys 30<=k<120 Keys 120<=k<240 Keys 240<=k
40 50 60
33
B+ Tree Example
d=2 Find the key 40
80
40 80
20 < 40 60
10 15 18 20 30 40 50 60 65 80 85 90
30 < 40 40
10 15 18 20 30 40 50 60 65 80 85 90
34
Searching a B+ Tree
• Exact key values:
Select
Selectname
name
– Start at the root From
Frompeople
people
– Proceed down, to the leaf Where
Whereageage==25
25
35
B+ Tree Design
• How large d ?
• Example:
– Key size = 4 bytes
– Pointer size = 8 bytes
– Block size = 4096 byes
• 2d x 4 + (2d+1) x 8 <= 4096
• d = 170
36
B+ Trees in Practice
K1 K2 K3 K4 K5 K1 K2 K4 K5
P0 P1 P2 P3 P4 p5 P0 P1 P2 P3 P4 p5
• If leaf, keep K3 too in right node
• When root splits, new root has 1 key only
38
Insertion in a B+ Tree
Insert K=19
80
10 15 18 20 30 40 50 60 65 80 85 90
10 15 18 20 30 40 50 60 65 80 85 90
39
Insertion in a B+ Tree
After insertion
80
10 15 18 19 20 30 40 50 60 65 80 85 90
10 15 18 19 20 30 40 50 60 65 80 85 90
40
Insertion in a B+ Tree
Now insert 25
80
10 15 18 19 20 30 40 50 60 65 80 85 90
10 15 18 19 20 30 40 50 60 65 80 85 90
41
Insertion in a B+ Tree
After insertion
80
10 15 18 19 20 2 30 4 50 60 65 80 85 90
5 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
42
Insertion in a B+ Tree
But now have to split !
80
10 15 18 19 20 2 30 4 50 60 65 80 85 90
5 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
43
Insertion in a B+ Tree
After the split
80
10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
44
Deletion from a B+ Tree
Delete 30
80
10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
45
Deletion from a B+ Tree
After deleting 30
May change to 80
40, or not
10 15 18 19 20 2 40 5 60 65 80 85 90
5 0
10 15 18 19 20 25 40 50 60 65 80 85 90
46
Deletion from a B+ Tree
Now delete 25
80
10 15 18 19 20 2 40 5 60 65 80 85 90
5 0
10 15 18 19 20 25 40 50 60 65 80 85 90
47
Deletion from a B+ Tree
After deleting 25
Need to rebalance
80
Rotate
10 15 18 19 20 40 5 60 65 80 85 90
0
10 15 18 19 20 40 50 60 65 80 85 90
48
Deletion from a B+ Tree
Now delete 40
80
10 15 18 19 2 40 5 60 65 80 85 90
0 0
10 15 18 19 20 40 50 60 65 80 85 90
49
Deletion from a B+ Tree
After deleting 40
Rotation not possible
80
Need to merge nodes
10 15 18 19 2 50 60 65 80 85 90
0
10 15 18 19 20 50 60 65 80 85 90
50
Deletion from a B+ Tree
Final tree
80
10 15 18 19 2 5 60 65 80 85 90
0 0
10 15 18 19 20 50 60 65 80 85 90
51
In Class
• Suppose the B+ tree has depth 4 and degree d=200
52