Вы находитесь на странице: 1из 52

CSE 544: Lecture 11

Storing Data, Indexes


Monday, 5/1/2006

1
Outline
• Storing data: disks and files - 9.5-9.7
• Types of Indexes - Chapter 8.3
• B-trees - Chapter 10

2
Managing Free Blocks
• By the OS

• By the RDBMS (typical: why ?)


– Linked list of free blocks
– Bit map

3
Files of Records
Types of files:
• Heap file - unordered
• Sorted file
• Clustered file - sorted, plus a B-tree

Will discuss heap files only; the others are


similar, only sorted by the key
4
Heap Files
Linked list of pages:
Data Data Data Data
page page page page

Header
page Full pages

Data Data Data Data


page page page page

Pages with some free space 5


Heap Files
Better: directory of pages

Header Data
page

Data
page

Directory Data
page 6
Page Formats
Issues to consider
• 1 page = fixed size (e.g. 8KB)
• Records:
– Fixed length
– Variable length
• Record id = RID
– Typically RID = (PageID, SlotNumber)
Why do we need RID’s in a relational DBMS ? 7
Page Formats
Fixed-length records: packed representation

Slot 1 Slot 2 Slot N


Free space N

Problems ?
8
Page Formats

Free space

Slot directory

Variable-length records

9
Record Formats
Fixed-length records --> all fields have fixed length

Field 1 Field 2 ... ... Field K

10
Record Formats
Variable length records

Field 1 Field 2 ... ... Field K

Record header

Remark: NULLS require no space at all (why ?)


11
Spanning Records Across Blocks

block block
header header

R1 R2 R2 R3

• When records are very large


• Or even medium size: saves space in blocks
• Commercial RDBMS avoid this
12
LOB
• Large objects
– Binary large object: BLOB
– Character large object: CLOB
• Supported by modern database systems
• E.g. images, sounds, texts, etc.
• Storage: attempt to cluster blocks together

13
Modifications: Insertion
• File is unsorted (= heap file)
– add it to the end (easy )
• File is sorted:
– Is there space in the right block ?
• Yes: we are lucky, store it there
– Is there space in a neighboring block ?
• Look 1-2 blocks to the left/right, shift records
– If anything else fails, create overflow block
14
Overflow Blocks

Blockn-1 Blockn Blockn+1

Overflow

• After a while the file starts being dominated


by overflow blocks: time to reorganize
15
Modifications: Deletions
• Free space in block, shift records
• Maybe be able to eliminate an overflow
block
• Can never really eliminate the record,
because others may point to it
– Place a tombstone instead (a NULL record)

16
Modifications: Updates
• If new record is shorter than previous, easy 
• If it is longer, need to shift records, create
overflow blocks

17
Record Formats: Fixed Length
F1 F2 F3 F4

L1 L2 L3 L4

Base address (B) Address = B+L1+L2

• Information about field types same for all


records in a file; stored in system catalogs.
• Finding i’th field requires scan of record.
• Note the importance of schema information!
Indexes
• Search key = can be any set of fields
– not the same as the primary key, nor a key

• Index = collection of data entries

• Data entry for key k can be:


– The actual record with key k
– (k, RID)
– (k, list-of-RIDs)
Index Classification
• Primary/secondary
– Primary = may reorder data according to index
– Secondary = cannot reorder data
• Clustered/unclustered
– Clustered = records close in the index are close in the data
– Unclustered = records close in the index may be far in the data
• Dense/sparse
– Dense = every key in the data appears in the index
– Sparse = the index contains only some keys
• B+ tree / Hash table / …

20
Primary Index
• File is sorted on the index attribute
• Dense index: sequence of (key,pointer) pairs

10 10
20 20
30
40 30
40
50
60
50
70
80
60

70
80 21
Primary Index
• Sparse index

10 10
30 20
50
70 30
40
90
110
50
130
150
60

70
80

22
Primary Index with Duplicate
Keys
• Dense index:

10 10
20 10
30
40 10
20
50
60
20
70
80
20

30
40
23
Primary Index with Duplicate
Keys
• Sparse index: pointer to lowest search key
in each block: ...but
need to
search
here too
10 10
10 10
20 is 20

here... 30 10
20

20

• Search for 20 20

30
40 24
Primary Index with Duplicate
Keys
• Better: pointer to lowest new search key in
each block: 10

• Search for 20 10

10 10
20 20
20 is 30
...ok to
here... 40
30 search
50
30 from here
60
30
70
80
30

• Search for 15 ? 35 ? 40
50 25
Secondary Indexes
• To index other attributes than primary key
• Always dense (why ?)
10 20
10 30
20
20 30
20
20
30
10
30
30
20

10
30 26
Clustered/Unclustered
• Primary indexes = usually clustered
• Secondary indexes = usually unclustered

27
Clustered vs. Unclustered Index

Data entries
Data entries
(Index File)
(Data file)

Data Records Data Records

CLUSTERED UNCLUSTERED
Secondary Indexes
• index other attributes than primary key
• index unsorted files (heap files)
• index clustered data

29
Applications of Secondary Indexes
• Secondary indexes needed for heap files
• Also for Clustered data:
Company(name, city), Product(pid, maker)
Select
Selectcity
city Select
Selectpid
pid
From
FromCompany,
Company,Product
Product From
FromCompany,
Company,Product
Product
Where
Wherename=maker
name=maker Where
Wherename=maker
name=maker
and
andpid=“p045”
pid=“p045” and
andcity=“Seattle”
city=“Seattle”
Products of company 1 Products of company 2 Products of company 3

Company 1 Company 2 Company 3


30
Composite Search Keys
• Composite Search Keys: Search Examples of composite key
on a combination of fields. indexes using lexicographic order.
– Equality query: Every field
11,80 11
value is equal to a constant 12
12,10
value. E.g. wrt <sal,age> 12,20 name age sal 12
index: 13,75 bob 12 10 13
• age=20 and sal =75 <age, sal> cal 11 80 <age>
joe 12 20
– Range query: Some field
10,12 sue 13 75 10
value is not a constant. E.g.: 20
20,12 Data records
• age =20; or age=20 and 75,13 sorted by name 75
sal > 10 80,11 80
<sal, age> <sal>
Data entries in index Data entries
sorted by <sal,age> sorted by <sal>
B+ Trees
• Search trees
• Idea in B Trees:
– make 1 node = 1 block
• Idea in B+ Trees:
– Make leaves into a linked list (range queries are
easier)

32
B+ Trees Basics
• Parameter d = the degree
• Each node has >= d and <= 2d keys (except root)
30 120 240

Keys k < 30
Keys 30<=k<120 Keys 120<=k<240 Keys 240<=k

• Each leaf has >=d and <= 2d keys:


40 50 60
Next leaf

40 50 60
33
B+ Tree Example
d=2 Find the key 40

80
40  80

20 60 100 120 140

20 < 40  60

10 15 18 20 30 40 50 60 65 80 85 90

30 < 40  40

10 15 18 20 30 40 50 60 65 80 85 90
34
Searching a B+ Tree
• Exact key values:
Select
Selectname
name
– Start at the root From
Frompeople
people
– Proceed down, to the leaf Where
Whereageage==25
25

• Range queries: Select


Selectname
name
From
Frompeople
people
– As above
Where
Where20 20<=
<=age
age
– Then sequential traversal and
and age
age<=
<=30
30

35
B+ Tree Design
• How large d ?
• Example:
– Key size = 4 bytes
– Pointer size = 8 bytes
– Block size = 4096 byes
• 2d x 4 + (2d+1) x 8 <= 4096
• d = 170
36
B+ Trees in Practice

• Typical order: 100. Typical fill-factor: 67%.


– average fanout = 133
• Typical capacities:
– Height 4: 1334 = 312,900,700 records
– Height 3: 1333 = 2,352,637 records
• Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbyte
– Level 3 = 17,689 pages = 133 MBytes
Insertion in a B+ Tree
Insert (K, P)
• Find leaf where K belongs, insert
• If no overflow (2d keys or less), halt
• If overflow (2d+1 keys), split node, insert in parent:
parent parent
K3

K1 K2 K3 K4 K5 K1 K2 K4 K5
P0 P1 P2 P3 P4 p5 P0 P1 P2 P3 P4 p5
• If leaf, keep K3 too in right node
• When root splits, new root has 1 key only
38
Insertion in a B+ Tree
Insert K=19
80

20 60 100 120 140

10 15 18 20 30 40 50 60 65 80 85 90

10 15 18 20 30 40 50 60 65 80 85 90
39
Insertion in a B+ Tree
After insertion
80

20 60 100 120 140

10 15 18 19 20 30 40 50 60 65 80 85 90

10 15 18 19 20 30 40 50 60 65 80 85 90
40
Insertion in a B+ Tree
Now insert 25
80

20 60 100 120 140

10 15 18 19 20 30 40 50 60 65 80 85 90

10 15 18 19 20 30 40 50 60 65 80 85 90
41
Insertion in a B+ Tree
After insertion
80

20 60 100 120 140

10 15 18 19 20 2 30 4 50 60 65 80 85 90
5 0

10 15 18 19 20 25 30 40 50 60 65 80 85 90
42
Insertion in a B+ Tree
But now have to split !
80

20 60 100 120 140

10 15 18 19 20 2 30 4 50 60 65 80 85 90
5 0

10 15 18 19 20 25 30 40 50 60 65 80 85 90
43
Insertion in a B+ Tree
After the split
80

20 30 60 100 120 140

10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0

10 15 18 19 20 25 30 40 50 60 65 80 85 90
44
Deletion from a B+ Tree
Delete 30
80

20 30 60 100 120 140

10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0

10 15 18 19 20 25 30 40 50 60 65 80 85 90
45
Deletion from a B+ Tree
After deleting 30
May change to 80
40, or not

20 30 60 100 120 140

10 15 18 19 20 2 40 5 60 65 80 85 90
5 0

10 15 18 19 20 25 40 50 60 65 80 85 90
46
Deletion from a B+ Tree
Now delete 25
80

20 30 60 100 120 140

10 15 18 19 20 2 40 5 60 65 80 85 90
5 0

10 15 18 19 20 25 40 50 60 65 80 85 90
47
Deletion from a B+ Tree
After deleting 25
Need to rebalance
80
Rotate

20 30 60 100 120 140

10 15 18 19 20 40 5 60 65 80 85 90
0

10 15 18 19 20 40 50 60 65 80 85 90
48
Deletion from a B+ Tree
Now delete 40
80

19 30 60 100 120 140

10 15 18 19 2 40 5 60 65 80 85 90
0 0

10 15 18 19 20 40 50 60 65 80 85 90
49
Deletion from a B+ Tree
After deleting 40
Rotation not possible
80
Need to merge nodes

19 30 60 100 120 140

10 15 18 19 2 50 60 65 80 85 90
0

10 15 18 19 20 50 60 65 80 85 90
50
Deletion from a B+ Tree
Final tree
80

19 60 100 120 140

10 15 18 19 2 5 60 65 80 85 90
0 0

10 15 18 19 20 50 60 65 80 85 90
51
In Class
• Suppose the B+ tree has depth 4 and degree d=200

• How many records does the relation have (maximum) ?

• How many index blocks do we need to read and/or write


during:
– A key lookup
– An insertion
– A deletion

52

Вам также может понравиться