CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006

CSE 544: Lecture 11
Storing Data, Indexes

Monday, 5/1/2006
1
Outline
• Storing data: disks and files - 9.5-9.7
• Types of Indexes - Chapter 8.3
• B-trees - Chapter 10
2
Managing Free Blocks
• By the OS
• By the RDBMS (typical: why ?)

– Linked list of free blocks
– Bit map
3
Files of Records
Types of files:
• Heap file - unordered
• Sorted file
• Clustered file - sorted, plus a B-tree
Will discuss heap files only; the others are

similar, only sorted by the key
4
Heap Files
Linked list of pages:
Data Data Data Data
page page page page
Header
page Full pages
Data Data Data Data

page page page page
Pages with some free space 5

Heap Files
Better: directory of pages
Header Data
page
Data
page
Directory Data
page 6
Page Formats
Issues to consider
• 1 page = fixed size (e.g. 8KB)
• Records:
– Fixed length
– Variable length
• Record id = RID
– Typically RID = (PageID, SlotNumber)
Why do we need RID’s in a relational DBMS ? 7
Page Formats
Fixed-length records: packed representation
Slot 1 Slot 2 Slot N

Free space N
Problems ?
8
Page Formats
Free space
Slot directory
Variable-length records
9
Record Formats
Fixed-length records --> all fields have fixed length
Field 1 Field 2 ... ... Field K
10
Record Formats
Variable length records
Field 1 Field 2 ... ... Field K
Record header
Remark: NULLS require no space at all (why ?)

11
Spanning Records Across Blocks
block block
header header
R1 R2 R2 R3
• When records are very large

• Or even medium size: saves space in blocks
• Commercial RDBMS avoid this
12
LOB
• Large objects
– Binary large object: BLOB
– Character large object: CLOB
• Supported by modern database systems
• E.g. images, sounds, texts, etc.
• Storage: attempt to cluster blocks together
13
Modifications: Insertion
• File is unsorted (= heap file)
– add it to the end (easy )
• File is sorted:
– Is there space in the right block ?
• Yes: we are lucky, store it there
– Is there space in a neighboring block ?
• Look 1-2 blocks to the left/right, shift records
– If anything else fails, create overflow block
14
Overflow Blocks
Blockn-1 Blockn Blockn+1
Overflow
• After a while the file starts being dominated

by overflow blocks: time to reorganize
15
Modifications: Deletions
• Free space in block, shift records
• Maybe be able to eliminate an overflow
block
• Can never really eliminate the record,
because others may point to it
– Place a tombstone instead (a NULL record)
16
Modifications: Updates
• If new record is shorter than previous, easy 
• If it is longer, need to shift records, create
overflow blocks
17
Record Formats: Fixed Length
F1 F2 F3 F4
L1 L2 L3 L4
Base address (B) Address = B+L1+L2
• Information about field types same for all

records in a file; stored in system catalogs.
• Finding i’th field requires scan of record.
• Note the importance of schema information!
Indexes
• Search key = can be any set of fields
– not the same as the primary key, nor a key
• Index = collection of data entries
• Data entry for key k can be:

– The actual record with key k
– (k, RID)
– (k, list-of-RIDs)
Index Classification
• Primary/secondary
– Primary = may reorder data according to index
– Secondary = cannot reorder data
• Clustered/unclustered
– Clustered = records close in the index are close in the data
– Unclustered = records close in the index may be far in the data
• Dense/sparse
– Dense = every key in the data appears in the index
– Sparse = the index contains only some keys
• B+ tree / Hash table / …
20
Primary Index
• File is sorted on the index attribute
• Dense index: sequence of (key,pointer) pairs
10 10
20 20
30
40 30
40
50
60
50
70
80
60
70
80 21
Primary Index
• Sparse index
10 10
30 20
50
70 30
40
90
110
50
130
150
60
70
80
22
Primary Index with Duplicate
Keys
• Dense index:
10 10
20 10
30
40 10
20
50
60
20
70
80
20
30
40
23
Keys
• Sparse index: pointer to lowest search key
in each block: ...but
need to
search
here too
10 10
10 10
20 is 20
here... 30 10
20
20
• Search for 20 20
30
40 24
Keys
• Better: pointer to lowest new search key in
each block: 10
• Search for 20 10
10 10
20 20
20 is 30
...ok to
here... 40
30 search
50
30 from here
60
30
70
80
30
• Search for 15 ? 35 ? 40
50 25
Secondary Indexes
• To index other attributes than primary key
• Always dense (why ?)
10 20
10 30
20
20 30
20
20
30
10
30
30
20
10
30 26
Clustered/Unclustered
• Primary indexes = usually clustered
• Secondary indexes = usually unclustered
27
Clustered vs. Unclustered Index
Data entries
Data entries
(Index File)
(Data file)
Data Records Data Records
CLUSTERED UNCLUSTERED
Secondary Indexes
• index other attributes than primary key
• index unsorted files (heap files)
• index clustered data
29
Applications of Secondary Indexes
• Secondary indexes needed for heap files
• Also for Clustered data:
Company(name, city), Product(pid, maker)
Select
Selectcity
city Select
Selectpid
pid
From
FromCompany,
Company,Product
Product From
FromCompany,
Company,Product
Product
Where
Wherename=maker
name=maker Where
Wherename=maker
name=maker
and
andpid=“p045”
pid=“p045” and
andcity=“Seattle”
city=“Seattle”
Products of company 1 Products of company 2 Products of company 3
Company 1 Company 2 Company 3

30
Composite Search Keys
• Composite Search Keys: Search Examples of composite key
on a combination of fields. indexes using lexicographic order.
– Equality query: Every field
11,80 11
value is equal to a constant 12
12,10
value. E.g. wrt <sal,age> 12,20 name age sal 12
index: 13,75 bob 12 10 13
• age=20 and sal =75 <age, sal> cal 11 80 <age>
joe 12 20
– Range query: Some field
10,12 sue 13 75 10
value is not a constant. E.g.: 20
20,12 Data records
• age =20; or age=20 and 75,13 sorted by name 75
sal > 10 80,11 80
<sal, age> <sal>
Data entries in index Data entries
sorted by <sal,age> sorted by <sal>
B+ Trees
• Search trees
• Idea in B Trees:
– make 1 node = 1 block
• Idea in B+ Trees:
– Make leaves into a linked list (range queries are
easier)
32
B+ Trees Basics
• Parameter d = the degree
• Each node has >= d and <= 2d keys (except root)
30 120 240
Keys k < 30
Keys 30<=k<120 Keys 120<=k<240 Keys 240<=k
• Each leaf has >=d and <= 2d keys:

40 50 60
Next leaf
40 50 60
33
B+ Tree Example
d=2 Find the key 40
80
40  80
20 60 100 120 140
20 < 40  60
10 15 18 20 30 40 50 60 65 80 85 90
30 < 40  40
10 15 18 20 30 40 50 60 65 80 85 90
34
Searching a B+ Tree
• Exact key values:
Select
Selectname
name
– Start at the root From
Frompeople
people
– Proceed down, to the leaf Where
Whereageage==25
25
• Range queries: Select

Selectname
name
From
Frompeople
people
– As above
Where
Where20 20<=
<=age
age
– Then sequential traversal and
and age
age<=
<=30
30
35
B+ Tree Design
• How large d ?
• Example:
– Key size = 4 bytes
– Pointer size = 8 bytes
– Block size = 4096 byes
• 2d x 4 + (2d+1) x 8 <= 4096
• d = 170
36
B+ Trees in Practice
• Typical order: 100. Typical fill-factor: 67%.

– average fanout = 133
• Typical capacities:
– Height 4: 1334 = 312,900,700 records
– Height 3: 1333 = 2,352,637 records
• Can often hold top levels in buffer pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbyte
– Level 3 = 17,689 pages = 133 MBytes
Insertion in a B+ Tree
Insert (K, P)
• Find leaf where K belongs, insert
• If no overflow (2d keys or less), halt
• If overflow (2d+1 keys), split node, insert in parent:
parent parent
K3
K1 K2 K3 K4 K5 K1 K2 K4 K5
P0 P1 P2 P3 P4 p5 P0 P1 P2 P3 P4 p5
• If leaf, keep K3 too in right node
• When root splits, new root has 1 key only
38
Insert K=19
80
20 60 100 120 140
10 15 18 20 30 40 50 60 65 80 85 90
10 15 18 20 30 40 50 60 65 80 85 90
39
After insertion
80
20 60 100 120 140
10 15 18 19 20 30 40 50 60 65 80 85 90
10 15 18 19 20 30 40 50 60 65 80 85 90
40
Now insert 25
80
20 60 100 120 140
10 15 18 19 20 30 40 50 60 65 80 85 90
10 15 18 19 20 30 40 50 60 65 80 85 90
41
After insertion
80
20 60 100 120 140
10 15 18 19 20 2 30 4 50 60 65 80 85 90
5 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
42
But now have to split !
80
20 60 100 120 140
10 15 18 19 20 2 30 4 50 60 65 80 85 90
5 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
43
After the split
80
20 30 60 100 120 140
10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
44
Deletion from a B+ Tree
Delete 30
80
20 30 60 100 120 140
10 15 18 19 20 2 30 4 5 60 65 80 85 90
5 0 0
10 15 18 19 20 25 30 40 50 60 65 80 85 90
45
After deleting 30
May change to 80
40, or not
20 30 60 100 120 140
10 15 18 19 20 2 40 5 60 65 80 85 90
5 0
10 15 18 19 20 25 40 50 60 65 80 85 90
46
Now delete 25
80
20 30 60 100 120 140
10 15 18 19 20 2 40 5 60 65 80 85 90
5 0
10 15 18 19 20 25 40 50 60 65 80 85 90
47
After deleting 25
Need to rebalance
80
Rotate
20 30 60 100 120 140
10 15 18 19 20 40 5 60 65 80 85 90
0
10 15 18 19 20 40 50 60 65 80 85 90
48
Now delete 40
80
19 30 60 100 120 140
10 15 18 19 2 40 5 60 65 80 85 90
0 0
10 15 18 19 20 40 50 60 65 80 85 90
49
After deleting 40
Rotation not possible
80
Need to merge nodes
19 30 60 100 120 140
10 15 18 19 2 50 60 65 80 85 90
0
10 15 18 19 20 50 60 65 80 85 90
50
Final tree
80
19 60 100 120 140
10 15 18 19 2 5 60 65 80 85 90
0 0
10 15 18 19 20 50 60 65 80 85 90
51
In Class
• Suppose the B+ tree has depth 4 and degree d=200
• How many records does the relation have (maximum) ?
• How many index blocks do we need to read and/or write

during:
– A key lookup
– An insertion
– A deletion
52

CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CSE 544: Lecture 11 Storing Data, Indexes: Monday, 5/1/2006

Загружено:

Авторское право:

Доступные форматы

CSE 544: Lecture 11

Storing Data, Indexes

• By the RDBMS (typical: why ?)

Will discuss heap files only; the others are

Data Data Data Data

Pages with some free space 5

Slot 1 Slot 2 Slot N

Field 1 Field 2 ... ... Field K

Field 1 Field 2 ... ... Field K

Remark: NULLS require no space at all (why ?)

• When records are very large

Blockn-1 Blockn Blockn+1

• After a while the file starts being dominated

Base address (B) Address = B+L1+L2

• Information about field types same for all

• Index = collection of data entries

• Data entry for key k can be:

Data Records Data Records

Company 1 Company 2 Company 3

• Each leaf has >=d and <= 2d keys:

20 60 100 120 140

• Range queries: Select

• Typical order: 100. Typical fill-factor: 67%.

20 60 100 120 140

20 60 100 120 140

20 60 100 120 140

20 60 100 120 140

20 60 100 120 140

20 30 60 100 120 140

20 30 60 100 120 140

20 30 60 100 120 140

20 30 60 100 120 140

20 30 60 100 120 140

19 30 60 100 120 140

19 30 60 100 120 140

19 60 100 120 140

• How many records does the relation have (maximum) ?

• How many index blocks do we need to read and/or write

Вам также может понравиться