Академический Документы
Профессиональный Документы
Культура Документы
Advanced Database
Organization
Advanced Database
Organization?
=Database Implementation
=How to implement a database system
and have fun doing it ;-)
01: Introduction
Boris Glavic
Slides: adapted from a course taught by
Hector Garcia-Molina, Stanford InfoLab
CS 525
Notes 1 - Introduction
CS 525
Isnt Implementing a
Database System Simple?
Relations
Statements
Notes 1 - Introduction
Introducing the
Results
Database Management System
CS 525
Notes 1 - Introduction
CS 525
Megatron 3000
Implementation Details
Notes 1 - Introduction
Megatron 3000
Implementation Details
Relations stored in files (ASCII)
e.g., relation R is in /usr/db/R
CS 525
Notes 1 - Introduction
Smith # 123 # CS
Jones # 522 # EE
..
.
CS 525
Notes 1 - Introduction
Megatron 3000
Implementation Details
Megatron 3000
Sample Sessions
R1 # A # INT # B # STR
R2 # C # STR # A # INT
..
.
CS 525
Notes 1 - Introduction
& quit
%
CS 525
Notes 1 - Introduction
Megatron 3000
Sample Sessions
Megatron 3000
Sample Sessions
& select *
from R #
Relation R
A
B
SMITH 123
C
CS
A
123
522
B
CAR
CAT
&
&
CS 525
Notes 1 - Introduction
CS 525
Megatron 3000
Sample Sessions
Notes 1 - Introduction
10
Megatron 3000
Sample Sessions
& select *
from R | LPR #
&
& select *
from R
where R.A < 100 | T #
&
CS 525
Notes 1 - Introduction
11
CS 525
Notes 1 - Introduction
12
Megatron 3000
To execute select * from R where condition:
(1) Read dictionary to get R attributes
(2) Read R file, for each line:
(a) Check condition
(b) If OK, display
CS 525
Notes 1 - Introduction
13
Megatron 3000
To execute select * from R
where condition | T:
(1) Process select as before
(2) Write results to new file T
(3) Append new line to dictionary
CS 525
Notes 1 - Introduction
14
Megatron 3000
To execute select A,B from R,S where condition:
(1) Read dictionary to get R,S attributes
(2) Read R file, for each line:
(a) Read S file, for each line:
(i) Create join tuple
(ii) Check condition
(iii) Display if OK
CS 525
Notes 1 - Introduction
15
CS 525
CS 525
16
Notes 1 - Introduction
Notes 1 - Introduction
17
e.g.,
CS 525
Notes 1 - Introduction
18
No buffer manager
e.g.,
e.g.,
select *
from R,S
where R.A = S.A and S.B > 1000
- Do select first?
- More efficient join?
CS 525
Notes 1 - Introduction
19
CS 525
Notes 1 - Introduction
Notes 1 - Introduction
20
CS 525
Need caching
21
CS 525
Notes 1 - Introduction
22
No security
e.g.,
e.g.,
CS 525
Notes 1 - Introduction
23
CS 525
Notes 1 - Introduction
24
CS 525
CS 525
Notes 1 - Introduction
25
CS 525
26
Notes 1 - Introduction
27
CS 525
Course Overview
File & System Structure
28
Concurrency Control
Correctness, locks,
Transaction Processing
B-Trees, hashing,
Logs, deadlocks,
Query Processing
Authorization, encryption,
Crash Recovery
Advanced Topics
Notes 1 - Introduction
Course Overview
CS 525
Notes 1 - Introduction
CS 525
Notes 1 - Introduction
30
System Structure
Strategy Selector
User Transaction
Concurrency Control
Lock Table
Query Parser
User
Transaction Manager
Buffer Manager
File Manager
Statistical Data
Recovery Manager
M.M. Buffer
Log
Database system
Transaction processing system
File access system
Information retrieval system
Indexes
User Data
CS 525
Some Terms
System Data
Notes 1 - Introduction
31
CS 525
Course Information
Webpage: http://www.cs.iit.edu/~cs525/
Instructor: Boris Glavic
http://www.cs.iit.edu/~glavic/
DBGroup: http://www.cs.iit.edu/~dbgroup/
Office Hours: Thurdays, 1pm-2pm
Office: Stuart Building, Room 226 C
Notes 1 - Introduction
32
Google Group
https://groups.google.com/forum/#!forum/cs525-2014-springgroup
Mailing-list for announcements
Discussion forum
Student - Instructor/TA
Student Student
->please join the group to keep up to date
Notes 1 - Introduction
33
CS 525
Notes 1 - Introduction
34
Textbooks
Elmasri and Navathe , Fundamentals of Database Systems,
6th Edition , Addison-Wesley , 2003
Garcia-Molina, Ullman, and Widom, Database Systems: The
Complete Book, 2nd Edition, Prentice Hall, 2008
Ramakrishnan and Gehrke , Database Management
Systems, 3nd Edition , McGraw-Hill , 2002
Silberschatz, Korth, and Sudarshan , Database System
Concepts, 6th Edition , McGraw Hill , 2010
Quizzes (10%)
Mid Term (20%) and Final Exam (20%)
CS 525
Notes 1 - Introduction
35
CS 525
Notes 1 - Introduction
36
Programming Assignments
Next:
Hardware
CS 525
CS 525
Notes 1 - Introduction
37
Notes 1 - Introduction
38
Outline
CS 525: Advanced Database
Organization
02: Hardware
Boris Glavic
Hardware: Disks
Access Times
Example - Megatron 747
Optimizations
Other Topics:
Storage costs
Using secondary storage
Disk failures
Notes 2 - Hardware
CS 525
Notes 2 - Hardware
Hardware
DBMS
...
Typical
Computer
...
Data Storage
CS 525
Notes 2 - Hardware
Secondary
Storage
Processor
Fast, slow, reduced instruction set,
with cache, pipelined
Speed: 100 500 1000 MIPS
Memory
Fast, slow, non-volatile, read-only,
Access time: 10-6 10-9 sec.
1 s 1 ns
CS 525
Notes 2 - Hardware
CS 525
Notes 2 - Hardware
Secondary storage
Many flavors:
- Disk: Floppy (hard, soft)
Removable Packs
Winchester
Ram disks
Optical, CD-ROM
Arrays
- Tape Reel, cartridge
Robots
CS 525
Notes 2 - Hardware
Top View
Focus on: Typical Disk
Notes 2 - Hardware
Typical Numbers
1 inch 15 inches
Diameter:
Cylinders:
100 2000
Surfaces:
1 (CDs)
(Tracks/cyl)
2 (floppies) 30
Sector Size: 512B 50K
360 KB (old floppy)
Capacity:
1 TB (I use)
CS 525
Notes 2 - Hardware
CS 525
Notes 2 - Hardware
I want
block X
CS 525
Notes 2 - Hardware
10
Seek Time
3 or 5x
Time
x
1
Cylinders Traveled
CS 525
Notes 2 - Hardware
11
CS 525
Notes 2 - Hardware
12
i=1
S=
SEEKTIME (i j)
j=1
ji
i=1
S=
N(N-1)
SEEKTIME (i j)
j=1
ji
N(N-1)
Typical S: 10 ms 40 ms
CS 525
Notes 2 - Hardware
13
Rotational Delay
CS 525
Notes 2 - Hardware
14
Head Here
Block I Want
CS 525
Notes 2 - Hardware
15
CS 525
Notes 2 - Hardware
Transfer Rate: t
Other Delays
CS 525
CS 525
Notes 2 - Hardware
17
Notes 2 - Hardware
16
18
Other Delays
Typical Value: 0
CS 525
Notes 2 - Hardware
19
CS 525
Notes 2 - Hardware
If we do things right
20
Blocks)
Notes 2 - Hardware
Rule of
Thumb
Ex:
21
Notes 2 - Hardware
Notes 2 - Hardware
22
1 KB Block
Random I/O: 20 ms.
Sequential I/O: 1 ms.
CS 525
CS 525
23
CS 525
Notes 2 - Hardware
24
To Modify a Block?
To Modify a Block?
To Modify Block:
(a) Read Block
(b) Modify in Memory
(c) Write Block
[(d) Verify?]
CS 525
Notes 2 - Hardware
25
26
Messy to handle
May map via software to
integer sequence
1
2
.
Map Actual Block Addresses
.
m
Physical Device
Cylinder #
Surface #
Sector
Notes 2 - Hardware
Notes 2 - Hardware
Block Address:
CS 525
CS 525
27
CS 525
Notes 2 - Hardware
28
Notes 2 - Hardware
29
1 KB blocks = sectors
10% overhead between blocks
capacity = 16 MB = (220)16 = 224
# cylinders = 128 = 27
bytes/cyl = 224/27 = 217 = 128 KB
blocks/cyl = 128 KB / 1 KB = 128
CS 525
Notes 2 - Hardware
30
3600 RPM
60 revolutions / sec
1 rev. = 16.66 msec.
One track:
3600 RPM
...
60 revolutions / sec
1 rev. = 16.66 msec.
One track:
...
Notes 2 - Hardware
31
CS 525
Notes 2 - Hardware
Sustained bandwith
Burst Bandwith
32
(over track)
1 KB in 0.117 ms.
BB = 1/0.117 = 8.54 KB/ms.
or
or
CS 525
Notes 2 - Hardware
33
CS 525
Notes 2 - Hardware
34
...
1 block
T4 = 25 + (16.66/2) + (.117) x 1
+ (.130) X 3 = 33.83 ms
[Compare to T1 = 33.45 ms]
CS 525
Notes 2 - Hardware
35
CS 525
Notes 2 - Hardware
36
CS 525
Notes 2 - Hardware
37
CS 525
8 GB Disk
If all tracks have 256 sectors
Outermost density: 100,000 bits/inch
Inner density: 250,000 bits/inch
Notes 2 - Hardware
38
CS 525
Notes 2 - Hardware
39
CS 525
(Ex 2.3)
Notes 2 - Hardware
40
Outline
Hardware: Disks
Access Times
Example: Megatron 747
Optimizations
Other Topics
here
Storage Costs
Using Secondary Storage
Disk Failures
CS 525
Notes 2 - Hardware
41
CS 525
Notes 2 - Hardware
42
Optimizations
Double Buffering
Problem: Have a File
Notes 2 - Hardware
...
Have a Program
Process B1
Process B2
Process B3
43
Read B1 Buffer
Process Data in Buffer
Read B2 Buffer
Process Data in Buffer ...
CS 525
Notes 2 - Hardware
Notes 2 - Hardware
45
Double Buffering
CS 525
Notes 2 - Hardware
46
Double Buffering
process
process
Memory:
Disk:
44
CS 525
Memory:
A B C D E F G
Disk:
A B C D E F G
done
CS 525
Notes 2 - Hardware
47
CS 525
Notes 2 - Hardware
48
Double Buffering
Double Buffering
process
Memory:
Disk:
A
C
Memory:
A B C D E F G
Disk:
Say P R
process
C
A
A B C D E F G
done
CS 525
process
donedone
Notes 2 - Hardware
49
P = Processing time/block
R = IO time/block
n = # blocks
CS 525
Notes 2 - Hardware
Say P R
50
P = Processing time/block
R = IO time/block
n = # blocks
CS 525
Notes 2 - Hardware
51
CS 525
= n(R+P)
Notes 2 - Hardware
52
On Disk Cache
Disk Arrays
P
...
...
cache
Notes 2 - Hardware
53
cache
CS 525
Notes 2 - Hardware
54
Trend
Trend
Unfortunately...
CS 525
Notes 2 - Hardware
55
Storage Cost
electronic
secondary
electronic
main
magnetic
optical
disks
online
tape
nearline offline
tape
tape &
optical
disks
electronic
main
online
tape
100
magnetic
optical
disks
nearline
tape &
optical
disks
10-2
offline
tape
cache
10-9
10-4
10-6
10-3
Notes 2 - Hardware
10-0
103
access time (sec)
57
10-9
CS 525
Notes 2 - Hardware
59
10-6
10-3
Notes 2 - Hardware
10-0
103
access time (sec)
58
CS 525
electronic
secondary
103
CS 525
56
cache
102
107
105
104
dollars/MB
1013
109
Notes 2 - Hardware
Storage Cost
1015
1011
CS 525
Notes 2 - Hardware
60
10
CS 525
Notes 2 - Hardware
61
$M = cost of 1 MB of RAM
P = numbers of pages in 1 MB RAM
So CM = $M / P
CS 525
Notes 2 - Hardware
62
Using 97 Numbers
Notes 2 - Hardware
63
Notes 2 - Hardware
64
Disk Failures
Partial Total
Intermittent Permanent
Detection
e.g. Checksum
Correction
Redundancy
CS 525
Notes 2 - Hardware
65
CS 525
Notes 2 - Hardware
66
11
Operating System
Single Disk
e.g., Error Correcting Codes
Disk Array
Logical Block
Logical
CS 525
Copy A
Copy B
Physical
Notes 2 - Hardware
67
Database System
CS 525
Notes 2 - Hardware
68
Summary
Summary
e.g.,
Log
Current DB
CS 525
Last weeks DB
Notes 2 - Hardware
69
Outline
CS 525
Notes 2 - Hardware
70
Outlook - Hardware
Hardware: Disks
Access Times
Example: Megatron 747
Optimizations
Other Topics
Storage Costs
Using Secondary Storage
Disk Failures
Notes 2 - Hardware
here
71
CS 525
Notes 2 - Hardware
72
12
Memory Hierarchy
CPU Register
( < 1KB, 1
cycle)
Memory Hierarchy
Compare: Disk vs. Main Memory
Reduce accesses to main memory
Cache conscious algorithms
L1 Cache ( 10
KBs, few cycles)
L2 Cache (e.g., 512 KB,
2-10 x L1)
L3 Cache (MB)
Notes 2 - Hardware
73
Increasing Amount of
Parallelism
Contention on, e.g., Memory
NUMA
Algorithmic Challenges
How to parallelize algorithms?
Sometime: Completely different approach
required
-> Rewrite large parts of DBMS
CS 525
Notes 2 - Hardware
75
CS 525
Notes 2 - Hardware
74
New Trend:
Software/Hardware Co-design
Actually, revived trend: database
machines (80s)
New goals: power consumption
Design specific hardware and write
special software for it
E.g., Oracle Exadata, Oracle Labs
CS 525
Notes 2 - Hardware
76
13
Boris Glavic
Notes 3
salary
name
date
picture
CS 525
Notes 3
salary
name
date
picture
Notes 3
To represent:
CS 525
Notes 3
To represent:
Characters
various coding schemes suggested,
e.g., 35 is
00000000
00100011
00000000
Example:
A: 1000001
a:
1100001
5:
0110101
LF: 0001010
00100011
CS 525
Notes 3
CS 525
Notes 3
To represent:
To represent:
Boolean
1111 1111
e.g., TRUE
FALSE
0000 0000
Application specific
e.g., enumeration
RED 1
GREEN 3
BLUE 2 YELLOW 4
Boolean
e.g., TRUE
FALSE
1111 1111
0000 0000
Application specific
e.g., RED 1 GREEN 3
BLUE 2 YELLOW 4
Can we use less than 1 byte/code?
Yes, but only if desperate...
CS 525
Notes 3
To represent:
Notes 3
To represent:
Dates
e.g.: - Integer, # days since Jan 1, 1900
- 8 characters, YYYYMMDD
- 7 characters, YYYYDDD
(not YYMMDD! Why?)
Time
e.g. - Integer, seconds since midnight
- characters, HHMMSSFF
CS 525
CS 525
Notes 3
String of characters
Null terminated
e.g.,
c a
Length given
e.g.,
3 c
- Fixed length
CS 525
Notes 3
10
To represent:
Key Point
Bag of bits
Length
Bits
CS 525
Notes 3
11
CS 525
Notes 3
12
Overview
Also
Data Items
Records
Blocks
Files
Memory
CS 525
Notes 3
13
Notes 3
14
Types of records:
15
Fixed format
CS 525
Notes 3
16
CS 525
Notes 3
Main choices:
CS 525
CS 525
Notes 3
17
Employee record
(1) E#, 2 byte integer
(2) E.name, 10 char.
(3) Dept, 2 byte code
CS 525
55 s m i t h
02
83 j o n e s
01
Notes 3
Schema
Records
18
Variable format
# Fields
Code identifying
field as E#
Integer type
46 4 S 4
F O RD
2 5 I
CS 525
Notes 3
19
CS 525
Notes 3
20
sparse records
repeating fields
evolving formats
E_name: Fred
CS 525
Notes 3
21
Sailing
Chess
--
CS 525
Notes 3
22
Sailing
Chess
--
CS 525
Notes 3
23
CS 525
Notes 3
24
27
....
record type
record length
tells me what
to expect
(i.e. points to schema)
CS 525
Notes 3
25
26
Easier solution
Encryption
Splitting of large records
E.g., image field, store pointer
CS 525
Notes 3
Notes 3
27
Notes 3
28
Rationale
Large fields mostly not used in search
conditions
Benefit from smaller records
CS 525
Notes 3
29
blocks
...
a file
CS 525
Notes 3
30
assume fixed
length blocks
blocks
a file
separating records
spanned vs. unspanned
sequencing
indirection
Notes 3
CS 525
R1
R2
CS 525
R3
33
Notes 3
R3
(a)
R3
R4
(b)
block 2
R2
R3
R4 R5
...
Spanned
block 1
R1
R2
...
block 2
R3
(a)
R3
R4
(b)
Notes 3
R5
R6
R7
(a)
34
R5
R6 (a)
need indication
need indication
of partial record
pointer to rest
of continuation
(+ from where?)
Notes 3
block 1
R1
CS 525
32
Notes 3
CS 525
(1)
(2)
(3)
(4)
...
CS 525
R1
35
CS 525
Notes 3
36
(3) Sequencing
Why sequencing?
CS 525
37
Notes 3
CS 525
38
Notes 3
Sequencing Options
Sequencing Options
(c)
Overflow area
Next (R1)
Records
in sequence
...
(b) Linked
R1
Notes 3
39
CS 525
Overflow area
Records
in sequence
header
R1
R2
Rx
R2.1
R1.3
R3
R4
R5
CS 525
40
Notes 3
(4) Indirection
Sequencing Options
(c)
R2
R3
R4
R5
Next (R1)
CS 525
R1
R4.7
Notes 3
41
CS 525
Notes 3
42
(4) Indirection
Purely Physical
Rx
Many options:
Physical
CS 525
Indirect
Notes 3
43
Fully Indirect
CS 525
44
Notes 3
Tradeoff
Flexibility
Cost
to move records
map
rec ID
r
Device ID
Cylinder #
Block ID
Track #
Block #
Offset in block
of indirection
CS 525
Notes 3
address
a
45
CS 525
Notes 3
46
Indirect
May contain:
- File ID (or RELATION or DB ID)
- This block ID
- Record directory
- Pointer to free space
- Type of block (e.g. contains recs type 4;
is overflow, )
- Pointer to other blocks like it
- Timestamp ...
Many options
in between
CS 525
Notes 3
47
CS 525
Notes 3
48
TID is
Free space
A block:
R3
R4
R1
R2
CS 525
49
Notes 3
Page identifier
Slot number
CS 525
50
Notes 3
TID Operations
Block 1
Block 2
Insertion
Set TID to record location (page, slot)
Moving record
e.g., update variable-size or reorganization
Case 1: TID points to record
Replace record with pointer (new TID)
51
Notes 3
CS 525
Block 1
Block 2
Block 1
Block 2, Slot 3
CS 525
52
Notes 3
Notes 3
Block 2
Block 2, Slot 2
53
CS 525
Notes 3
54
TID Properties
TID of record never changes
Can be used safely as pointer to record
(e.g., in index)
Notes 3
55
Other Topics
separating records
spanned vs. unspanned
sequencing
indirection
CS 525
Notes 3
Deletion
Block
(1) Insertion/Deletion
(2) Buffer Management
(3) Comparison of Schemes
CS 525
56
Notes 3
Rx
57
CS 525
Notes 3
58
Options:
Options:
(for re-use)
Need a way to mark:
special characters
delete field
in map
CS 525
Notes 3
59
CS 525
Notes 3
60
10
CS 525
61
Notes 3
CS 525
Notes 3
62
CS 525
63
Notes 3
CS 525
Notes 3
64
Logical IDs
map
A block
ID
This space
never re-used
CS 525
Notes 3
65
LOC
Never reuse
ID 7788 nor
space in map...
7788
CS 525
Notes 3
66
11
Insert
Insert
CS 525
Notes 3
67
CS 525
Notes 3
68
Notes 3
70
Interesting problems:
How much free space to leave in each
block, track, cylinder?
How often do I reorganize file + overflow?
CS 525
Notes 3
69
Free
space
CS 525
Buffer Manager
Buffer Management
For Caching of Disk Blocks
Buffer Replacement Strategies
E.g., LRU, clock
Pinned blocks
Forced output
Double buffering
Swizzling
CS 525
in Notes02
Notes 3
71
Notes 3
72
12
Goals
Bookkeeping
Replacement strategy
If page is requested but buffer is full
Which page to emit remove from buffer
CS 525
Notes 3
73
CS 525
Notes 3
FIFO
LRU
Notes 3
74
75
Notes 3
Clock
76
Clock Example
Reference bit
Page 0
Page 1
Page 2
Page 3
Page 4
Notes 3
77
CS 525
Notes 3
78
13
Memory
Disk
block 1
block 1
Rec A
CS 525
79
Notes 3
CS 525
block 2
80
Notes 3
Swizzling
Memory
Disk
block 1
block 1
block 2
Swizzling
Rec A
CS 525
Rec A
block 2
81
Notes 3
CS 525
82
Notes 3
Row Store
Column Store
id1
cust1
prod1
store1
price1
date1
qty1
id2
cust2
prod2
store2
price2
date2
qty2
id3
cust3
prod3
store3
price3
date3
qty3
id1
id2
id3
id4
...
cust1
cust2
cust3
cust4
...
id1
id2
id3
id4
...
prod1
prod2
prod3
prod4
...
id1
id2
id3
id4
...
price1
price2
price3
price4
...
qty1
qty2
qty3
qty4
...
Notes 3
83
CS 525
Notes 3
84
14
Comparison
Notes 3
85
Issues:
Flexibility
Complexity
CS 525
Space Utilization
CS 525
87
Example
86
Performance
Notes 3
Notes 3
CS 525
Notes 3
88
Summary
Records
Blocks
Files
Memory
DBMS
CS 525
Notes 3
89
CS 525
Notes 3
90
15
Next
How to find a record quickly,
given a key
CS 525
Notes 3
91
16
Part 04
04: Indexing
record
value
Boris Glavic
value
Notes 4 - Indexing
CS 525
Notes 4 - Indexing
Query Types:
Index Considerations:
Point queries:
Range queries:
Input: value interval [low,high] of attr A
Output: all tuples with a value
low <= v < high in attribute A
CS 525
Notes 4 - Indexing
Complexity of Operations
E.g., insert is O(log(n)) worst-case
Sequential File
Topics
10
20
Conventional indexes
B-trees
Hashing schemes
Advanced Index Techniques
CS 525
Notes 4 - Indexing
Notes 4 - Indexing
30
40
50
60
70
80
90
100
CS 525
Notes 4 - Indexing
Dense Index
10
20
30
40
Notes 4 - Indexing
Sequential File
330
410
490
570
10
30
50
70
90
110
130
150
170
190
210
230
CS 525
Notes 4 - Indexing
10
20
30
40
30
40
90
110
130
150
70
80
90
100
50
60
70
80
90
100
170
190
210
230
CS 525
Sequential File
10
20
10
30
50
70
50
60
90
100
110
120
10
90
170
250
Sparse Index
30
40
50
60
70
80
CS 525
Sequential File
10
20
Notes 4 - Indexing
Comment:
{FILE,INDEX} may be contiguous
or not (blocks chained)
50
60
70
80
90
100
CS 525
Notes 4 - Indexing
10
Question:
Notes on pointers:
CS 525
Notes 4 - Indexing
11
CS 525
Notes 4 - Indexing
12
Notes on pointers:
R1
K1
R2
K2
K3
R3
K4
R4
CS 525
Notes 4 - Indexing
13
CS 525
R1
K1
R2
say:
1024 B
per block
R3
K4
R4
if we want K3 block:
get it at offset
(3-1)1024
= 2048 bytes
CS 525
Notes 4 - Indexing
15
Terms
CS 525
Notes 4 - Indexing
16
Next:
14
K2
K3
Notes 4 - Indexing
Notes 4 - Indexing
17
Duplicate keys
Deletion/Insertion
Secondary indexes
CS 525
Notes 4 - Indexing
18
Duplicate keys
Duplicate keys
19
Notes 4 - Indexing
10
20
20
30
20
30
30
30
30
30
40
45
CS 525
10
10
10
10
10
20
CS 525
30
30
40
45
Duplicate keys
Duplicate keys
10
20
30
40
CS 525
10
10
10
10
20
30
10
20
20
Notes 4 - Indexing
10
20
20
30
20
30
30
30
30
30
40
45
40
45
21
Notes 4 - Indexing
CS 525
22
Notes 4 - Indexing
Duplicate keys
Duplicate keys
careful if looking
for 20 or 30!
CS 525
10
10
10
10
20
30
10
20
Notes 4 - Indexing
10
10
10
20
30
30
10
20
20
30
20
30
30
30
30
30
40
45
40
45
23
CS 525
Notes 4 - Indexing
24
Duplicate keys
Duplicate values,
primary index
Summary
should
this be
40?
10
10
10
20
30
30
10
20
20
30
CS 525
.
.
b
25
Notes 4 - Indexing
30
30
40
45
CS 525
26
Notes 4 - Indexing
10
20
10
30
50
70
90
110
130
150
30
40
50
60
CS 525
90
110
130
150
70
80
27
Notes 4 - Indexing
90
110
130
150
CS 525
70
80
28
Notes 4 - Indexing
delete record 30
10
20
10
30
50
70
30
40
50
60
90
110
130
150
70
80
Notes 4 - Indexing
50
60
CS 525
10
20
30
40
delete record 40
10
30
50
70
10
20
10
30
50
70
29
CS 525
30
40
50
60
70
80
Notes 4 - Indexing
30
delete record 30
10
20
10
40 30
50
70
90
110
130
150
30 40
40
50
60
CS 525
90
110
130
150
70
80
31
Notes 4 - Indexing
90
110
130
150
70
80
32
Notes 4 - Indexing
50
60
90
110
130
150
70
80
33
Notes 4 - Indexing
10
20
10
50 30
70 50
70
30
40
CS 525
50
60
CS 525
10
20
30
40
10
20
10
30
50
70
30
40
50
60
CS 525
70
80
34
Notes 4 - Indexing
10
20
30
40
50
60
70
80
CS 525
30
40
50
60
50
60
70
80
70
80
Notes 4 - Indexing
10
20
10
20
30
40
35
CS 525
30
40
50
60
70
80
Notes 4 - Indexing
36
delete record 30
10
20
10
20
30
40
50
60
70
80
CS 525
delete record 30
30 40
40
50
60
50
60
70
80
70
80
37
Notes 4 - Indexing
10
20
10
20
40 30
40
CS 525
30 40
40
50
60
70
80
38
Notes 4 - Indexing
10
20
10
30
40
60
CS 525
30
40
50
60
60
39
CS 525
CS 525
insert record 15
10
20
10
20
10
30
40
60
30
34
Notes 4 - Indexing
40
Notes 4 - Indexing
insert record 34
30
40
50
Notes 4 - Indexing
10
30
40
60
10
20
10
30
40
60
30
40
50
40
50
60
60
41
CS 525
Notes 4 - Indexing
42
insert record 15
insert record 15
10
20 15
10
20 30
40
60
10
20 15
10
20 30
40
60
30 20
30
30 20
30
40
50
40
50
60
CS 525
Notes 4 - Indexing
43
CS 525
insert record 25
10
20
10
30
40
60
30
10
20
40
50
60
60
45
CS 525
25
30
40
50
Notes 4 - Indexing
44
Notes 4 - Indexing
insert record 25
10
30
40
60
CS 525
overflow blocks
(reorganize later...)
46
Notes 4 - Indexing
Secondary indexes
Sequence
field
30
50
20
70
80
40
100
10
90
60
CS 525
Notes 4 - Indexing
47
CS 525
Notes 4 - Indexing
48
Secondary indexes
Secondary indexes
Sequence
field
Sparse index
30
50
30
20
80
100
90
...
30
50
20
70
30
20
80
100
80
40
90
...
80
40
100
10
90
60
100
10
90
60
CS 525
Sequence
field
Sparse index
49
Notes 4 - Indexing
Secondary indexes
CS 525
20
70
Secondary indexes
Sequence
field
Dense index
Sequence
field
Dense index
30
50
30
50
10
20
30
40
20
70
20
70
50
60
70
...
80
40
100
10
80
40
100
10
90
60
CS 525
50
Notes 4 - Indexing
Notes 4 - Indexing
90
60
51
CS 525
Sequence
field
Notes 4 - Indexing
52
Secondary indexes
Dense index
10
50
90
...
sparse
high
level
CS 525
30
50
10
20
30
40
20
70
50
60
70
...
80
40
100
10
90
60
Notes 4 - Indexing
53
CS 525
Notes 4 - Indexing
54
20
10
20
40
10
40
30
40
Notes 4 - Indexing
55
20
40
20
30
40
40
10
40
CS 525
20
10
10
10
10
20
10
40
10
40
30
40
40
40
...
CS 525
Problem:
excess overhead!
disk space
search time
10
10
10
20
20
30
40
40
20
10
10
20
10
20
40
20
20
40
10
40
Notes 4 - Indexing
57
10
40
30
40
10
40
30
40
40
40
...
CS 525
CS 525
10
40
30
40
CS 525
10
20
10
20
20
40
10
40
30
40
Notes 4 - Indexing
59
20
10
10
20
30
40
10
40
10
40
30
40
Another idea:
Chain records with same key?
CS 525
20
40
50
60
...
10
40
30
40
58
Notes 4 - Indexing
another option...
Problem:
variable size
records in
index!
56
Notes 4 - Indexing
Notes 4 - Indexing
60
10
10
20
30
40
20
40
10
40
50
60
...
10
40
30
40
Notes 4 - Indexing
CS 525
CS 525
63
EMP
Notes 4 - Indexing
62
EMP
Floor index
Toy
CS 525
2nd
Notes 4 - Indexing
64
10
40
30
40
buckets
61
Records
EMP (name,dept,floor,...)
Notes 4 - Indexing
10
40
20
40
50
60
...
Problems:
Need to add fields to records
Need to follow chain to know records
CS 525
20
10
10
20
30
40
Floor index
Toy
Documents
...the cat is
fat ...
2nd
...was raining
cats and dogs...
...Fido the
dog ...
Notes 4 - Indexing
65
CS 525
Notes 4 - Indexing
66
11
...the cat is
fat ...
dog
IR QUERIES
Find articles with cat and dog
Find articles with cat or dog
Find articles with cat and not dog
...was raining
cats and dogs...
...Fido the
dog ...
Inverted lists
CS 525
Notes 4 - Indexing
67
Conventional index
Basic Ideas: sparse, dense, multi-level
Duplicate Keys
Deletion/Insertion
Secondary indexes
Buckets of Postings List
Notes 4 - Indexing
Example
Index
69
Advantage:
- Simple
- Index is sequential file
good for scans
Disadvantage:
- Inserts expensive, and/or
- Lose sequentiality & balance
CS 525
Example
(sequential)
continuous
40
50
60
free space
Index
(sequential)
10
20
30
33
40
50
60
39
31
35
36
32
38
34
free space
70
80
90
CS 525
70
Notes 4 - Indexing
10
20
30
continuous
68
Notes 4 - Indexing
Conventional indexes
Summary so far
CS 525
CS 525
Notes 4 - Indexing
70
80
90
71
CS 525
Notes 4 - Indexing
overflow area
(not sequential)
72
12
Outline:
Conventional indexes
B-Trees
NEXT
Hashing schemes
Advanced Index Techniques
CS 525
Notes 4 - Indexing
73
CS 525
B+-tree Motivation
B+-tree Properties
Large nodes:
However
Balance:
74
Notes 4 - Indexing
Notes 4 - Indexing
75
B+Tree Example
n=3
Notes 4 - Indexing
76
Sample non-leaf
CS 525
Notes 4 - Indexing
77
180
200
150
156
179
120
130
100
101
110
30
35
3
5
11
30
95
81
120
150
180
57
100
Root
to keys
to keys
to keys
< 57
57 k<81
81k<95
CS 525
Notes 4 - Indexing
to keys
95
78
13
In textbooks notation
n=3
95
81
57
Leaf:
30 35
30
35
to next leaf
in sequence
CS 525
To record
with key 85
Notes 4 - Indexing
30
30
To record
with key 81
To record
with key 57
Non-leaf:
79
CS 525
Notes 4 - Indexing
80
CS 525
n+1 pointers
(fixed)
n keys
Notes 4 - Indexing
81
(n+1)/2 pointers
Leaf:
CS 525
n=3
Notes 4 - Indexing
82
Notes 4 - Indexing
30
83
CS 525
30
35
Leaf
3
5
11
120
150
180
Full node
Non-leaf
Non-leaf:
CS 525
Notes 4 - Indexing
84
14
Search Algorithm
Min
keys
n+1
(n+1)/2 (n+1)/2- 1
n+1
(n+1)/2
(n+1)/2
n+1
Notes 4 - Indexing
Search Example
85
k= 120
CS 525
n=3
Root
87
180
200
150
156
179
120
150
180
Notes 4 - Indexing
86
Remarks Search
If n is large, e.g., 500
Keys inside node are sorted
-> use binary search to find I
Performance considerations
100
120
130
100
101
110
30
30
35
3
5
11
CS 525
Notes 4 - Indexing
CS 525
Notes 4 - Indexing
88
n=3
100
CS 525
Notes 4 - Indexing
89
CS 525
30
31
3
5
11
30
Notes 4 - Indexing
90
15
n=3
91
Notes 4 - Indexing
CS 525
Notes 4 - Indexing
CS 525
94
Notes 4 - Indexing
n=3
Notes 4 - Indexing
95
150
156
179
CS 525
Notes 4 - Indexing
180
200
120
150
180
100
150
156
179
180
200
120
150
180
100
n=3
30
CS 525
3
57
11
3
5
93
160
179
CS 525
30
31
3
5
3
57
11
30
100
n=3
100
n=3
92
Notes 4 - Indexing
30
31
CS 525
30
31
3
5
11
30
31
32
3
5
11
30
30
100
100
n=3
96
16
Notes 4 - Indexing
CS 525
Notes 4 - Indexing
101
40
45
n=3
Notes 4 - Indexing
40
45
40
20
25
10
20
30
CS 525
10
12
1
2
3
40
45
40
Notes 4 - Indexing
100
30
32
40
n=3
30
32
40
20
25
10
20
30
10
12
n=3
30
32
40
20
25
10
12
CS 525
new root
1
2
3
180
200
1
2
3
99
CS 525
98
10
20
30
Notes 4 - Indexing
30
32
40
20
25
10
12
1
2
3
CS 525
160
179
Notes 4 - Indexing
10
20
30
n=3
180
120
150
180
150
156
179
180
200
180
97
n=3
30
CS 525
160
179
150
156
179
120
150
180
100
100
n=3
160
102
17
Insertion Algorithm
Split parent
Insert median key into parent
Root is full
Split root
Create new root with two pointers and single
key
Leaf is full
Split leaf into two nodes (new leaf)
Insert new leafs smallest key into parent
CS 525
Notes 4 - Indexing
103
104
Notes 4 - Indexing
Notes 4 - Indexing
105
CS 525
n=4
n=4
Notes 4 - Indexing
107
CS 525
Notes 4 - Indexing
40
50
10
20
30
35
10
40
100
Delete 50
40
50
10
20
30
40
CS 525
106
Notes 4 - Indexing
10
40
100
Delete 50
40
50
10
20
30
10
40
100
CS 525
n=4
Delete 50
108
18
n=4
n=4
Delete 37
109
Notes 4 - Indexing
CS 525
n=4
30
40
40
45
30
37
n=4
111
Notes 4 - Indexing
CS 525
n=4
Delete 37
Notes 4 - Indexing
40
45
30
37
30
40
40
20
22
10
14
1
3
40
45
30
37
10
20
25
Delete 37
30
40
20
22
25
26
30
10
20
10
14
1
3
CS 525
25
26
20
22
25
Delete 37
110
Notes 4 - Indexing
25
26
30
CS 525
10
14
1
3
10
20
30
35
35
40
50
10
20
10
40 35
100
25
Delete 50
112
Deletion Algorithm
25
new root
30
40
10
20
25
40
CS 525
Notes 4 - Indexing
113
40
45
30
37
25
26
30
20
22
10
14
1
3
Notes 4 - Indexing
114
19
CS 525
115
Notes 4 - Indexing
CS 525
Ref # 1 claims:
- Concurrency control harder in B-Trees
- B-tree consumes more space
CS 525
117
Notes 4 - Indexing
CS 525
k1
k1
k1
127 keys
1 data
block
63 keys
k2
k2
k63
k3
k3
k3
1 data
block
k2
...
k2
118
Notes 4 - Indexing
116
Notes 4 - Indexing
next
(127+1)4 = 512 Bytes
-> pointers in index implicit!
CS 525
Notes 4 - Indexing
up to 127
blocks
119
Notes 4 - Indexing
up to 63
blocks
120
20
Size comparison
Ref. #1
Static Index
# data
blocks
height
2 -> 127
128 -> 16,129
16,130 -> 2,048,383
CS 525
2
3
4
B-tree
# data
blocks
height
2 -> 63
64 -> 3968
3969 -> 250,047
250,048 -> 15,752,961
Notes 4 - Indexing
2
3
4
5
121
123
B-trees better!!
CS 525
CS 525
Notes 4 - Indexing
122
B-trees better!!
Notes 4 - Indexing
Ref. #2 conclusion
Notes 4 - Indexing
125
CS 525
Notes 4 - Indexing
Ref. #2 conclusion
124
B-trees better!!
Buffering
B-tree: has fixed buffer requirements
Static index: must read several overflow
blocks to be efficient
(large & variable
size
buffers
needed for this)
CS 525
Notes 4 - Indexing
126
21
Speaking of buffering
Is LRU a good policy for B+tree buffers?
Speaking of buffering
Is LRU a good policy for B+tree buffers?
Of course not!
Should try to keep root in memory
at all times
(and perhaps some nodes from second level)
CS 525
Notes 4 - Indexing
127
CS 525
Notes 4 - Indexing
128
Sample assumptions:
Interesting problem:
For B+tree, how large should n be?
CS 525
Notes 4 - Indexing
129
Sample assumptions:
CS 525
Notes 4 - Indexing
130
Sample assumptions:
Notes 4 - Indexing
131
CS 525
Notes 4 - Indexing
132
22
Can get:
nopt
CS 525
133
Notes 4 - Indexing
CS 525
Notes 4 - Indexing
134
Idea:
Avoid duplicate keys
Have record pointers in non-leaf nodes
135
Notes 4 - Indexing
CS 525
Notes 4 - Indexing
136
n=2
65
125
B-tree example
CS 525
Notes 4 - Indexing
137
85
105
25
45
CS 525
Notes 4 - Indexing
170
180
150
160
130
140
110
120
90
100
70
80
to keys
>k3
50
60
to record
to record
to record
with K1
with K2
with K3
to keys
to keys
K1<x<K2
K2<x<k3
145
165
K3 P3
10
20
to keys
< K1
K2 P2
30
40
K1 P1
138
23
B-tree example
n=2
sequence pointers
not useful now!
leaf
170
180
150
160
130
140
110
120
145
165
85
105
90
100
70
80
25
45
50
60
30
40
CS 525
139
Notes 4 - Indexing
CS 525
Note on inserts
10
20
30
Non-leaf
non-root
Leaf
non-root
Root
non-leaf
Root
Leaf
25
30
10
CS 525
MAX
n=3
20
Afterwards:
140
Notes 4 - Indexing
n=3
10
20
30
65
125
10
20
Note on inserts
Notes 4 - Indexing
141
Tradeoffs:
MIN
Tree
Ptrs
Rec
Ptrs
Keys
Tree
Ptrs
Rec Keys
Ptrs
n+1
n/2
n+1
CS 525
Notes 4 - Indexing
n/2
142
Tradeoffs:
CS 525
Notes 4 - Indexing
143
CS 525
Notes 4 - Indexing
144
24
But note:
If blocks are fixed size
(due to disk and buffering restrictions)
CS 525
Notes 4 - Indexing
145
Example:
- Pointers
- Keys
- Blocks
- Look at full
CS 525
4 bytes
4 bytes
100 bytes (just example)
2 level tree
Notes 4 - Indexing
146
B-tree:
B-tree:
CS 525
Notes 4 - Indexing
147
CS 525
Notes 4 - Indexing
148
B-tree:
B+tree:
Notes 4 - Indexing
149
CS 525
Notes 4 - Indexing
150
25
B+tree:
B+tree:
CS 525
Notes 4 - Indexing
151
So...
CS 525
8 records
B+
Notes 4 - Indexing
152
So...
8 records
B+
ooooooooooooo
ooooooooo
ooooooooooooo
ooooooooo
156 records
108 records
Total = 116
156 records
108 records
Total = 116
Conclusion:
For fixed block size,
B+ tree is better because it is bushier
CS 525
Notes 4 - Indexing
153
CS 525
Notes 4 - Indexing
154
An Interesting Problem...
CS 525
Notes 4 - Indexing
155
CS 525
Notes 4 - Indexing
156
26
days indexed
I1
1,2,3,4,5
11,2,3,4,5
11,12,3,4,5
11,12,13,4,5
days indexed
I2
6,7,8,9,10
6,7,8,9,10
6,7,8,9,10
6,7,8,9,10
I1
1,2,3
1,2,3
1,2,3
13
13,14
13,14,15
13,14,15
Notes 4 - Indexing
157
CS 525
159
12
12
12
12
12
158
Read
Write
Read
CS 525
Lock Nodes
Notes 4 - Indexing
160
Improvements?
Reading
Writing
Lock each node on search for key
Release all locks on parents of node if the
node is safe
CS 525
Notes 4 - Indexing
One solution
Simple
Unnecessary restrictive
Not feasible for high concurrency
Notes 4 - Indexing
I4
10
10,11
10,11,
10,11,
10,11,
10,11,
10,11,
Lock Nodes
CS 525
I3
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
7,8,9
advantage: no deletions
disadvantage: approximate windows
I2
4,5,6
4,5,6
4,5,6
4,5,6
4,5,6
4,5,6
16
Notes 4 - Indexing
161
Notes 4 - Indexing
162
27
Outline/summary
Conventional Indexes
Sparse vs. dense
Primary vs. secondary
B trees
B+trees vs. B-trees
B+trees vs. indexed sequential
Hashing schemes
--> Next
Advanced Index Techniques
CS 525
Notes 4 - Indexing
163
28
Hashing
key h(key)
Boris Glavic
.
.
.
Notes 5 - Hashing
Buckets
(typically 1
disk block)
CS 525
Two alternatives
Notes 5 - Hashing
Two alternatives
.
.
.
(2) key h(key)
(1) key h(key)
CS 525
Notes 5 - Hashing
key 1
record
records
.
.
.
Index
CS 525
Two alternatives
Notes 5 - Hashing
key 1
Index
record
Notes 5 - Hashing
CS 525
Notes 5 - Hashing
CS 525
Notes 5 - Hashing
CS 525
EXAMPLE 2 records/bucket
INSERT:
h(a) = 1
h(b) = 2
h(c) = 1
h(d) = 0
h(K)
Notes 5 - Hashing
Notes 5 - Hashing
Within a bucket:
CS 525
Expected number of
keys/bucket is the
same for all buckets
CS 525
2
3
10
EXAMPLE 2 records/bucket
INSERT:
h(a) = 1
h(b) = 2
h(c) = 1
h(d) = 0
Notes 5 - Hashing
a
c
h(e) = 1
CS 525
Notes 5 - Hashing
11
CS 525
Notes 5 - Hashing
12
EXAMPLE 2 records/bucket
INSERT:
h(a) = 1
h(b) = 2
h(c) = 1
h(d) = 0
0
1
EXAMPLE: deletion
Delete:
e
f
d
a
c
b
c
f
g
h(e) = 1
CS 525
Notes 5 - Hashing
13
CS 525
EXAMPLE: deletion
Delete:
e
f
c
0
1
2
3
CS 525
b
c
e
Notes 5 - Hashing
14
EXAMPLE: deletion
f
g
Notes 5 - Hashing
Delete:
e
f
c
b
c d
e
2
3
maybe move
g up
15
CS 525
f
g
Notes 5 - Hashing
maybe move
g up
16
Rule of thumb:
Rule of thumb:
CS 525
Notes 5 - Hashing
17
CS 525
Notes 5 - Hashing
18
CS 525
Notes 5 - Hashing
19
CS 525
Notes 5 - Hashing
20
h(K)[i ]
to bucket
.
.
.
CS 525
Notes 5 - Hashing
21
1
0001
CS 525
i= 1
1
0001
1
1001
1010 1100
Insert 1010
Insert 1010
Notes 5 - Hashing
22
1
1001
1100
CS 525
Notes 5 - Hashing
23
CS 525
1
1100
Notes 5 - Hashing
24
Example continued
i= 2
i=2
1
0001
00
00
01
01
Insert 1010
10
10
1 2
1001
1010 1100
11
11
Insert:
New directory
1 2
1100
1
0001
0111
2
1001
1010
2
1100
0000
CS 525
Notes 5 - Hashing
25
Example continued
i= 2
CS 525
Notes 5 - Hashing
Example continued
0000
0001
i= 2
00
01
10
11
0111
01
10
2
1001
11
0111
00
27
CS 525
i= 2
00
0111 2
01
10
10
11
11
1001 2
1001
CS 525
2
1100
Notes 5 - Hashing
28
Example continued
0000 2
0001
01
Insert:
2
1001
0000
Notes 5 - Hashing
Example continued
i= 2
1 2
0001 0111
0111
1010
Insert:
2
1100
0000
CS 525
2
0000
0001
00
1
0001 0111
0111
1010
Insert:
26
0111 2
1001
1001
1010 1001 2
1010
Insert:
1100 2
Notes 5 - Hashing
0000 2
0001
1001
29
CS 525
1010
1100 2
Notes 5 - Hashing
30
Example continued
i=3
0000 2
0001
i= 2
00
000
0111 2
011
10
1001 3
11
1001
100
1010 1001 2 3
1010
101
110
1100 2
1001
CS 525
Notes 5 - Hashing
No merging of blocks
Merge blocks
and cut directory if possible
(Reverse insert procedure)
010
01
Insert:
001
111
31
CS 525
Notes 5 - Hashing
32
Deletion example:
if we split:
insert 1100
1
1101
1100
2
1100
1100
CS 525
Notes 5 - Hashing
33
CS 525
Summary
+
insert 1100
1
1101
1101
1
1101
1100
CS 525
Notes 5 - Hashing
35
Notes 5 - Hashing
34
Extensible hashing
1100
CS 525
Notes 5 - Hashing
36
Summary
+
Linear hashing
Extensible hashing
Two ideas:
b
01110101
grows
Indirection
CS 525
Notes 5 - Hashing
37
CS 525
Notes 5 - Hashing
Linear hashing
38
i =2, 2 keys/bucket
0000
0101
1010
1111
01
10
11
m = 01 (max used block)
00
CS 525
Notes 5 - Hashing
Future
growth
buckets
39
CS 525
i =2, 2 keys/bucket
Notes 5 - Hashing
40
i =2, 2 keys/bucket
insert 0101
0000
0101
1010
1111
00
Future
growth
buckets
CS 525
If h(k)[i ] m, then
look at bucket h(k)[i ]
else, look at bucket h(k)[i ] - 2i -1
Notes 5 - Hashing
0101
1010
1111
00
01
10
11
m = 01 (max used block)
Rule
0000
41
Future
growth
buckets
01
10
11
m = 01 (max used block)
Rule
CS 525
If h(k)[i ] m, then
look at bucket h(k)[i ]
else, look at bucket h(k)[i ] - 2i -1
Notes 5 - Hashing
42
i =2, 2 keys/bucket
insert 0101
0101
0000
1010
00
Note
In textbook, n is used instead of m
n=m+1
Future
growth
buckets
0101
1111
n=10
01
10
11
m = 01 (max used block)
0000
0101
1010
1111
00
Rule
If h(k)[i ] m, then
look at bucket h(k)[i ]
else, look at bucket h(k)[i ] - 2i -1
CS 525
Notes 5 - Hashing
0000
0101
1010
1111
00
i =2, 2 keys/bucket
Future
growth
buckets
Notes 5 - Hashing
0000
0101
1010
1111
CS 525
45
Notes 5 - Hashing
0000
0101
1010
1111
00
Future
growth
buckets
0000
0101
1010
1111
00
01
10
11
m = 01 (max used block)
10
47
CS 525
Future
growth
buckets
1010
Notes 5 - Hashing
0101
1010
i =2, 2 keys/bucket
insert 0101
44
01
10
11
m = 01 (max used block)
10
CS 525
i =2, 2 keys/bucket
Notes 5 - Hashing
01
10
11
m = 01 (max used block)
CS 525
01
10
11
m = 01 (max used block)
CS 525
00
43
Future
growth
buckets
46
i =2, 2 keys/bucket
insert 0101
Future
growth
buckets
1010
01
10
11
m = 01 (max used block)
10
11
Notes 5 - Hashing
48
i =2, 2 keys/bucket
insert 0101
i=2
0000
1010
00
0101
0101
1111
1010
Future
growth
buckets
1111
0000
00
01
10
11
m = 01 (max used block)
10
11
CS 525
0 01
101
1010
1111
0000
11
...
50
Notes 5 - Hashing
0101
1010
1111
0101
010
110
0 11
111
...
0 00
100
CS 525
10
i=23
0101
0 00
100
01
i=23
0101
1111
CS 525
0000
1010
49
Notes 5 - Hashing
0101
0101
0 01
101
010
110
100
0 11
111
...
51
Notes 5 - Hashing
CS 525
Notes 5 - Hashing
52
i=23
0101
1010
0101
0101
1111
0101
0 00
100
0 01
101
010
110
100
0 11
111
# used slots
total # of slots
=U
101
...
Notes 5 - Hashing
53
CS 525
Notes 5 - Hashing
54
Summary
# used slots
total # of slots
=U
CS 525
Notes 5 - Hashing
55
CS 525
Need to move
m here
Would waste
space...
Notes 5 - Hashing
57
Next:
Notes 5 - Hashing
56
Hashing
- How it works
- Dynamic hashing
- Extensible
- Linear
CS 525
Notes 5 - Hashing
58
Indexing vs Hashing
Hashing good for probes given key
e.g.,
SELECT
FROM R
WHERE R.A = 5
-> Point Queries
Indexing vs Hashing
Index definition in SQL
Multiple key access
CS 525
CS 525
Summary
Very full
Very empty
Linear Hashing
Notes 5 - Hashing
59
CS 525
Notes 5 - Hashing
60
10
Indexing vs Hashing
INDEXING (Including B Trees) good for
Range Searches:
e.g.,
SELECT
FROM R
WHERE R.A > 5
->
Range Queries
CS 525
Notes 5 - Hashing
61
OR PARAMETERS
CS 525
Notes 5 - Hashing
62
Notes 5 - Hashing
63
CS 525
Notes 5 - Hashing
64
Multi-key Index
Strategy I:
I1
CS 525
Notes 5 - Hashing
65
CS 525
Notes 5 - Hashing
66
11
Strategy II:
Strategy III:
Toy
Sal
> 50k
CS 525
67
Notes 5 - Hashing
I3
Notes 5 - Hashing
68
Example
Record
Name=Joe
DEPT=Sales
SAL=15k
12k
15k
15k
19k
Dept
Index
I1
CS 525
Example
Art
Sales
Toy
One idea:
Find
Find
Find
Find
RECs
RECs
RECs
RECs
Dept = Sales
Dept = Sales
Dept = Sales
SAL = 20k
SAL=20k
SAL > 20k
Salary
Index
CS 525
69
Notes 5 - Hashing
Interesting application:
Notes 5 - Hashing
70
Queries:
Geographic Data
y
DATA:
<X1,Y1, Attributes>
<X2,Y2, Attributes>
...
CS 525
CS 525
Notes 5 - Hashing
71
CS 525
Notes 5 - Hashing
72
12
Example
Example
h
b
n
l
10
j
k
20
l
j
k
c
g
10
CS 525
73
Notes 5 - Hashing
Example
40
25
20
Example
20
40
10
j
k
10
o
m
20
c
g
10
25
15 35
20
30
b
f
20
74
Notes 5 - Hashing
h
n
20
15 35
CS 525
30
10
20
10
j
k
o
m
20
c
g
10
20
15
15
CS 525
75
Notes 5 - Hashing
Example
40
25
20
20
h i
CS 525
d e
10
j
k
o
m
c
g
25
n o
20
CS 525
15
l
d e
10
j
k
o
m
10
h i
j k
77
15 35
20
20
5
a b
20
i
h
30
15
Notes 5 - Hashing
40
10
15
15
j k
Example
10
5
d
b
20
15 35
76
Notes 5 - Hashing
30
10
CS 525
c
g
20
a b
78
13
Next
Queries
Find
Find
Find
Find
CS 525
points
points
points
points
with Yi
with Xi
close
close
> 20
<5
to i = <12,38>
to b = <7,24>
Notes 5 - Hashing
79
CS 525
Notes 5 - Hashing
80
14
1/27/2014
Recap
CS 525: Advanced Database
Organization
Wehavediscussed
ConvenKonalIndices
Btrees
Hashing
Tradeos
MulKkeyindices
MulKdimensionalindices
Slides:adaptedfromacoursetaughtby
HectorGarciaMolina,StanfordInfoLab
CS525
Notes6MoreIndices
butnoexample
CS525
Notes6MoreIndices
Grid Index
Today
Key 2
X1 X2
Mul$dimensionalindexstructures
kd-Trees (very similar to example before)
Grid File (Grid Index)
Quad Trees
R Trees
Partitioned Hash
...
Xn
V1
V2
Key 1
Vn
Bitmapindices
Tries
CS525
Notes6MoreIndices
CLAIM
CS 525
Notes 5 - Hashing
CLAIM
key 1 = Vi Key 2 = Xj
key 1 = Vi
key 2 = Xj
key 1 = Vi Key 2 = Xj
key 1 = Vi
key 2 = Xj
Andalsoranges.
E.g.,key1Vikey2<Xj
CS 525
Notes 5 - Hashing
CS 525
Notes 5 - Hashing
1/27/2014
pos(i, j) =
CS 525
Notes 5 - Hashing
i, j
position S+0
0,0
0,1
0,2
0,3
1,0
1,1
1,2
1,3
2,0
2,1
2,2
2,3
3,0
position S+1
position S+3
position S+4
position S+9
CS 525
50K
CS 525
position S+4
position S+9
Notes 5 - Hashing
10
Grid files
LinearScale
CS 525
position S+3
20K50K
position S+2
contains
pointers to
buckets
Grid
1
position S+1
*Grid only
020K
position S+0
Notes 5 - Hashing
Salary
Notes 5 - Hashing
i, j
0,0
0,1
0,2
0,3
1,0
1,1
1,2
1,3
2,0
2,1
2,2
2,3
3,0
With indirection:
Buckets
CS 525
Buckets
pos(i, j) = S + iN + j
X1 X2 X3
max number of
i values N=4
position S+2
Notes 5 - Hashing
Toy
Sales
Personnel
11
CS 525
Notes 5 - Hashing
12
1/27/2014
EX:
ParKKonedhashfuncKon
Idea:
0101101110010
Key2
Key1
h1
h2
h1(toy)
h1(sales)
h1(art)
.
h2(10k)
h2(20k)
h2(30k)
h2(40k)
000
001
010
011
100
101
110
111
=0
=1
=1
=01
=11
=01
=00
.
Insert
CS 525
Notes 5 - Hashing
13
CS 525
EX:
=0
=1
=1
=01
=11
=01
=00
000
001
010
011
100
101
110
111
<Fred>
<Joe><Sally>
h1(toy)
h1(sales)
h1(art)
.
h2(10k)
h2(20k)
h2(30k)
h2(40k)
14
000
001
010
011
100
101
110
111
=0
=1
=1
=01
=11
=01
=00
<Fred>
<Joe><Jan>
<Mary>
<Sally>
<Tom><Bill>
<Andy>
<Fred,toy,10k>,<Joe,sales,10k>
<Sally,art,30k>
CS 525
Notes 5 - Hashing
FindEmp.withDept.=SalesSal=40k
15
CS 525
EX:
Notes 5 - Hashing
16
EX:
h1(toy)
h1(sales)
h1(art)
.
h2(10k)
h2(20k)
h2(30k)
h2(40k)
=0
=1
=1
=01
=11
=01
=00
000
001
010
011
100
101
110
111
<Fred>
<Joe><Jan>
<Mary>
<Sally>
<Tom><Bill>
<Andy>
h1(toy)
h1(sales)
h1(art)
.
h2(10k)
h2(20k)
h2(30k)
h2(40k)
.
FindEmp.withDept.=SalesSal=40k
CS 525
Notes 5 - Hashing
EX:
h1(toy)
h1(sales)
h1(art)
.
h2(10k)
h2(20k)
h2(30k)
h2(40k)
Insert
<Fred,toy,10k>,<Joe,sales,10k>
<Sally,art,30k>
Notes 5 - Hashing
17
CS 525
000
001
010
011
100
101
110
111
=0
=1
=1
=01
=11
=01
=00
<Fred>
<Joe><Jan>
<Mary>
<Sally>
<Tom><Bill>
<Andy>
FindEmp.withSal=30k
Notes 5 - Hashing
18
1/27/2014
EX:
EX:
h1(toy)
h1(sales)
h1(art)
.
h2(10k)
h2(20k)
h2(30k)
h2(40k)
.
000
001
010
011
100
101
110
111
=0
=1
=1
=01
=11
=01
=00
h1(toy)
h1(sales)
h1(art)
.
h2(10k)
h2(20k)
h2(30k)
h2(40k)
<Fred>
<Joe><Jan>
<Mary>
<Sally>
<Tom><Bill>
<Andy>
FindEmp.withSal=30k
CS 525
Notes 5 - Hashing
19
000
001
010
011
100
101
110
111
=0
=1
=1
=01
=11
=01
=00
<Sally>
<Tom><Bill>
<Andy>
CS 525
Notes 5 - Hashing
20
Rtree
h1(toy)
h1(sales)
h1(art)
.
h2(10k)
h2(20k)
h2(30k)
h2(40k)
000
001
010
011
100
101
110
111
=0
=1
=1
=01
=11
=01
=00
<Fred>
NodescanstoreuptoMentries
<Joe><Jan>
<Mary>
Minimumllrequirement(dependsonvariant)
Eachnoderectangleinndimensionalspace
<Sally>
MinimumBoundingRectangle(MBR)ofits
children
<Tom><Bill>
<Andy>
MBRsofsiblingsareallowedtooverlap
DierentfromBtrees
FindEmp.withDept.=Sales
CS 525
Notes 5 - Hashing
balanced
21
CS525
[57]
[915]
[1319]
[2024]
[1216]
[24]
Notes6MoreIndices
22
RtreeSearch
PointSearch
DataSpace
CS525
<Mary>
FindEmp.withDept.=Sales
EX:
<Fred>
<Joe><Jan>
[5]
[6]
[7]
[24]
[20]
[24]
Searchforp=<xi,yi>
KeeplistofpotenKalnodes
[13]
[14]
[18]
[19]
[4]
[2]
[2]
[3]
[9]
[11]
[15]
[15]
[16]
[12]
Notes6MoreIndices
23
Neededbecauseofoverlap
TraversetochildifMBRof
childcontainsp
CS525
Notes6MoreIndices
24
1/27/2014
RtreeSearch
PointSearch
Notes6MoreIndices
25
CS525
RtreeInsert
SimilartoBtree,butmorecomplex
Overlap>mulKplechoiceswheretoaddentry
Splitharderbecausemorechoicehowtosplit
node(compareBtree=1choice)
Chooseoneforinsert(heurisKc,e.g.,theonethe
wouldgrowtheleast)
ConKnueunKlleafisfound
Notes6MoreIndices
27
1)Findleafnodethatcontainsentry
2)Deleteentry
3)Leafnodeunderow?
Removeleafnodeandcacheentries
Adaptparents
Reinsertdeletedentries
Notes6MoreIndices
[24]
29
[5]
[6]
[7]
[24]
[20]
[24]
[13]
[14]
[18]
[19]
[4]
[2]
[2]
[3]
[9]
[11]
[15]
[15]
[16]
[12]
Notes6MoreIndices
26
2)Insertintoleaf
3)Leafisfull?>split
Findbestsplit(minimumoverlapbetweennew
nodes)ishard(O(2M))
UselinearorquadraKcheurisKcs(originalpaper)
4)Adaptparentsifnecessary
CS525
RtreeDelete
CS525
[1319]
RtreeInsert
1)FindpotenKalsubtreesforcurrentnode
CS525
[915]
[1216]
DataSpace
Searchforpointsinregion=
<[xminxmax],[yminymax]>
KeeplistofpotenKalnodes
TraversetochildifMBRof
childoverlapswithquery
region
CS525
[57]
[2024]
Search<5,24>
Notes6MoreIndices
28
BitmapIndex
DomainofvaluesD={d1,,dn}
Gender{male,female}
Age{1,,120?}
Useonevectorofbitsforeachvalue
Onebitforeachrecord
0:recordhasdierentvalueinthisasribute
1:recordhasthisvalue
CS525
Notes6MoreIndices
30
1/27/2014
BitmapIndexExample
Age
Todlers
BitmapIndexExample
Age
Gender
Todlers
Gender
Name
Age
Gender
male female
Name
Age
Gender
male female
Peter
male
Peter
male
Gertrud 2
female
Gertrud 2
female
Joe
male
Joe
male
Marry
female
Marry
female
Findalltodlerswithage2andsexfemale:
Bitwiseandbetweenvectors
0
1
0
0
CS525
Notes6MoreIndices
31
CS525
Notes6MoreIndices
BitmapIndexExample
Age
Todlers
Compression
ObservaKon:
Gender
Name
Age
Gender
male female
Peter
male
Gertrud 2
female
Joe
male
Marry
female
Findalltodlerswithage2orsexfemale:
Bitwiseorbetweenvectors
32
Eachrecordhasonevalueinindexedasribute
ForNrecordsanddomainofsize|D|
Only1/|D|bitsare1
>wasteofspace
0
SoluKon
Compressdata
NeedtomakesurethatandandorissKllfast
1
0
1
CS525
Notes6MoreIndices
33
CS525
Runlengthencoding(RLE)
Insteadofactual01sequenceencodelength
of0or1runs
Onebittoindicatewhether0/1run+several
bitstoencoderunlength
Buthowmanybitstousetoencodearun
length?
Notes6MoreIndices
34
RLEExample
0001000011101111 (2bytes)
3,1,4,3,1,4 (6bytes)
>ifweuseonebytetoencodearunwehave
7bitsforlength=maxrunlengthis128(127)
Gammacodesorsimilartohavevariablenumber
ofbits
CS525
Notes6MoreIndices
35
CS525
Notes6MoreIndices
36
1/27/2014
EliasGammaCodes
HybridEncoding
X=2N+(xmod2N)
Runlengthencoding
WriteNasNzerosfollowedbyone1
Write(xmod2N)asNbitnumber
Canwastespace
And/orrunlengthnotalignedtobyte/word
boundaries
18=24+2=000010010
0001000011101111 (2bytes)
3,1,4,3,1,4 (6bytes)
011100100011100100 (3bytes)
CS525
Notes6MoreIndices
37
Encodesomebytesofsequenceasisandonly
storelongrunsasrunlength
EWAH
BBC(thatswhatOracleuses)
CS525
ExtendedWordalignedHybrid
(EWAH)
Segmentsequenceinmachinewords(64bit)
Usetwotypesofwordstoencode
Literalwords,takendirectlyfrominputsequence
Runwords
Notes6MoreIndices
38
BitmapIndices
Fastforreadintensiveworkloads
Usedalotindatawarehousing
Oxenbuildontheyduringqueryprocessing
Aswewillseelaterinclass
wordisusedtoencodearun
wordisusedtoencodehowmanyliteralsfollow
00000000
00000000
00101000
11111111
00100001
00101000
10010001
11000010
CS525
Notes6MoreIndices
11000010
39
CS525
Trie
Notes6MoreIndices
40
Trie
FromRetrieval
Treeindexstructure
KeysaresequencesofvaluesfromadomainD
Eachnodehaspointersto|D|childnodes
OneforeachvalueofD
Searchingforakeyk=[d1,,dn]
Startattheroot
Followchildforvaluedi
D={0,1}
D={a,b,c,.,z}
Keysizemayormaynotbexed
Store4byteintegersusingD={0,1}(32elements)
StringsusingD={a,,z}(arbitrarylength)
CS525
Notes6MoreIndices
41
CS525
Notes6MoreIndices
42
1/27/2014
TrieExample
TriesImplementaKon
Words:bar,ball,in
b
Searchforbald
a
r
1
n
3
l
l
2
CS525
Notes6MoreIndices
43
1)Eachnodehasanarrayofchildpointers
2)Eachnodehasalistorhashtableofchild
pointers
3)arraycompressionschemesderivedfrom
compressedDFArepresentaKons
Fail!
CS525
Notes6MoreIndices
44
Summary
Discussion:
-
Conventional Indices
B-trees
Hashing (extensible, linear)
SQL Index Definition
Index vs. Hash
Multiple Key Access
Multi Dimensional Indices
Variations: Grid, R-tree,
- Partitioned Hash
- Bitmap indices and compression
- Tries
CS 525
Notes 5 - Hashing
45
Query Processing
Q Query Plan
CS 525
Query Processing
Example
Q Query Plan
Select B,D
From R,S
Where R.A = c
R.C=S.C
S.E = 2
Others?
CS 525
R A
10
S C
CS 525
R A
10
10
20
20
10
30
35
40
45
50
10
20
20
10
30
35
40
45
50
Answer
CS 525
CS 525
S C
B
2
D
x
RXS
One idea
CS 525
Bingo!
10
10
a
.
.
10
20
10
10
10
10
a
.
.
10
20
C
.
.
10
10
Ex: Plan I
B,D
c S.E=2 R.C=S.C
CS 525
R.A=
C
.
.
Got one...
CS 525
RXS
CS 525
10
Another idea:
Ex: Plan I
Plan II
B,D
R.A=
natural join
c S.E=2 R.C=S.C
R.A =
S.E = 2
B,D
S
c S.E=2 R.C = S.C
Notes 7 - Query Processing
(RXS)]
11
CS 525
12
A B C
(R)
a 1 10
A B C
b 1 20
c 2 10
(S)
C D E
C D E
10 x 2
10 x 2
20 y 2
c 2 10
20 y 2
30 z 2
d 2 35
30 z 2
40 x 1
e 3 45
50 y 3
Plan III
Use R.A and S.C Indexes
(1) Use R.A index to select R tuples
with R.A = c
(2) For each R.C value found, use S.C
index to find matching tuples
CS 525
CS 525
13
Plan III
S
A
A B C
A B C
I1
b 1 20
20 y 2
c 2 10
30 z 2
d 2 35
40 x 1
e 3 45
50 y 3
<c,2,10>
16
S
A= c
I2
C D E
10 x 2
a 1 10
b 1 20
I2
I1
S
A= c
a 1 10
CS 525
15
C D E
A B C
10 x 2
a 1 10
20 y 2
b 1 20
I1
C
I2
C D E
10 x 2
<c,2,10> <10,x,2>
20 y 2
c 2 10
30 z 2
c 2 10
d 2 35
40 x 1
d 2 35
40 x 1
e 3 45
50 y 3
e 3 45
50 y 3
CS 525
14
30 z 2
17
CS 525
18
R
A= c
A B C
C D E
A B C
10 x 2
a 1 10
20 y 2
b 1 20
check=2? 30 z 2
c 2 10
40 x 1
d 2 35
50 y 3
e 3 45
I2
I1
a 1 10
b 1 20
c 2 10
d 2 35
<c,2,10> <10,x,2>
output: <2,x>
e 3 45
CS 525
S
A= c
19
C D E
I2
I1
10 x 2
<c,2,10> <10,x,2>
20 y 2
check=2? 30 z 2
40 x 1
output: <2,x>
50 y 3
next tuple:
<c,7,15>
CS 525
20
SQL query
parse
parse tree
convert
answer
execute
statistics
Pi
pick best
{(P1,C1),(P2,C2)...}
estimate costs
21
CS 525
22
<Query>
SELECT title
FROM StarsIn
WHERE starName IN (
SELECT name
FROM MovieStar
WHERE birthdate LIKE %1960
);
<SFW>
SELECT <SelList>
title
StarsIn
<SelList>
<Attribute>
name
CS 525
23
CS 525
<FromList>
<RelName>
SELECT
FROM
<Attribute>
FROM
WHERE
<Condition>
<Tuple> IN <Query>
<Attribute>
<FromList>
<RelName>
MovieStar
Notes 7 - Query Processing
( <Query> )
starName
<SFW>
WHERE
<Condition>
%1960
24
title
title
starName=name
StarsIn
<condition>
<tuple>
IN
name
birthdate LIKE
<attribute>
starName
birthdate LIKE
%1960
MovieStar
25
%1960
MovieStar
name
StarsIn
Example:
26
title
Question:
Push project to
StarsIn?
starName=name
name
birthdate LIKE
StarsIn
%1960
MovieStar
MovieStar
Fig. 7.20: An improvement on fig. 7.18.
CS 525
Example:
27
CS 525
28
Hash join
SEQ scan
StarsIn
Parameters:
Select Condition,...
P1
P2
Pn
C1
C2
Cn
MovieStar
Pick best!
CS 525
29
CS 525
30
SQL query
parse
parse tree
convert
answer
execute
statistics
Pi
pick best
{(P1,C1),(P2,C2)...}
estimate costs
{P1,P2,..}
CS 525
CS 525
3. Conversion
1. Parsing
2. Analysis
Usually intertwined
The internal representation is used to
store analysis information
Create an initial representation and
complete during analysis
CS 525
CS 525
Parsing
SQL -> Parse Tree
Covered in compiler courses and books
Here only short overview
CS 525
SQL Standard
SELECT title!
FROM StarsIn!
WHERE starName IN (!
" "SELECT name!
" "FROM MovieStar!
" "WHERE birthdate LIKE
);!
Standardized language
86, 89, 92, 99, 03, 06, 08, 11
%1960 !
CS 525
CS 525
<Query>
<Query Block>
SELECT <SelList>
FROM
<FromList>
<Attribute>
<RelName>
title
StarsIn
WHERE
<Condition>
<Tuple> IN <Query>
<Attribute>
( <Query> )
starName
SELECT
<SelList>
<Attribute>
name
FROM
<FromList>
<RelName>
WHERE
<Condition>
MovieStar
CS 525
<Query Block>
birthDate
%1960
Query Blocks
10
SELECT clause
SELECT (1 + 2) AS result
result
3
Renaming: (R.a + 2) AS x
CS 525
11
CS 525
12
!
"
SELECT extrChar(p.name) AS n!
FROM person p!
result
person
gender
initial
name
name
gender
Joe
male
Joe
Joe
male
Jim
male
Jim
Jim
male
13
CS 525
gender
gender
Joe
male
male
Jim
male
CS 525
i
m
14
FROM clause
List of table expressions
Access to relations
Subqueries (need alias)
Join expressions
Table functions
Renaming of relations and columns
result
person
name
15
CS 525
16
FROM R!
-access table R
FROM R, S!
-access tables R and S
FROM R JOIN S ON (R.a = S.b)!
-join tables R and S on condition (R.a = S.b)
FROM R x!
FROM R AS x!
-Access table R and assign alias x
FROM R x(c,d)!
FROM R AS x(c,d)!
-using aliases x for R and c,d for its attribues
FROM (R JOIN S t ON (R.a = t.b)), T!
-join R and S, and access T
FROM (R JOIN S ON (R.a = S.b)) JOIN T !
-join tables R and S and result with T
FROM create_sequence(1,100) AS seq(a)!
-call table function
CS 525
CS 525
J
o
e
person
name
CS 525
17
18
FROM !
"(SELECT count(*) FROM employee)
"AS empcnt(cnt)!
SELECT *!
FROM create_sequence(1,3) AS seq(a)!
result
a
1
2
3
CS 525
CS 525
19
name
dep
IT
103
Joe
IT
Support
2506
Jim
Marketing
Correlation
Reference attributes from other FROM
clause item
Attributes of ith entry only available in j > i
Semantics:
For each row in result of ith entry:
Substitute correlated attributes with value from
current row and evaluate query
CS 525
21
Correlation - Example
result
employee
name
dep
Joe
IT
Jim
Marketing
CS 525
chr
SELECT name!
FROM (SELECT max(salary) maxsal!
" FROM employee) AS m,!
" (SELECT name !
" FROM employee x!
" WHERE x.salary = m.maxsal) AS e!
Joe
Joe
Joe
Jim
name
salary
Jim
Joe
20,000
Jim
30,000
23
22
Correlation - Example
20
CS 525
employee
CS 525
result
name
Jim
Notes 8 - Parsing and Analysis
24
WHERE clause
WHERE R.a = 3!
-comparison between attribute and constant
WHERE (R.a > 5) AND (R.a < 10)!
-range query using boolean AND
WHERE R.a = S.b!
-comparison between two attributes
WHERE (R.a * 2) > (S.b 3) !
-using operators
A condition
Attribute references
Constants
Operators (boolean)
Functions
Nested subquery expressions
25
Nested Subqueries
Nesting a query within an expression
Correlation allowed
CS 525
26
27
CS 525
Scalar subquery
Subquery that returns one result tuple
How to check?
-> Runtime error
28
Existential Quantification
<expr> IN <subquery>!
Evaluates to true if <expr> equal to at
least one of the results of the subquery
!
SELECT * !
FROM R!
WHERE R.a = (SELECT count(*) FROM S)!
SELECT * !
FROM users!
WHERE name IN (SELECT name FROM!
blacklist)!
CS 525
CS 525
29
30
Existential Quantification
EXISTS <subquery>!
Existential Quantification
<expr> <op> ANY <subquery>!
SELECT * !
FROM users u!
WHERE EXISTS (SELECT * FROM!
blacklist b!
WHERE b.name = u.name)!
SELECT * !
FROM users!
WHERE name = ANY (SELECT name FROM!
blacklist)!
CS 525
CS 525
31
Universal Quantification
<expr> <op> ALL <subquery>!
Evaluates to true if <expr> <op> <tuple>
evaluates to true for all result tuples
Op is any comparison operator: =, <, >,
SELECT * !
FROM nation!
WHERE nname = ALL (SELECT nation FROM!
blacklist)!
CS 525
33
32
IT
2000
Jim
IT
300
Bob
HR
100
Alice
HR
10000
Patrice
HR
10000
CS 525
GROUP BY clause
A list of expressions
result
dep
Name
IT
Joe
HR
Alice
HR
Patrice
34
GROUP BY restrictions
If group-by is used then
Same as WHERE
No restriction to boolean
DBMS has to know how to compare = for
data type
35
CS 525
36
CS 525
37
HAVING clause
A boolean expression
Applied after grouping and aggregation
Only references aggregation expressions
and group by expressions
CS 525
39
ORDER BY clause
A list of expressions
Semantics: Order the result on these
expressions
CS 525
41
numF
CS 525
Joe
Jim
Joe
Peter
Jim
Joe
Jim
Peter
Peter
Joe
38
CS 525
40
CS 525
42
Window functions
More flexible grouping
Return both aggregated results and input
values
CS 525
43
CS 525
Analysis Goals
SELECT *!
FROM R!
WHERE R.a + 3 > 5!
Table R exists?
Expand *: which attributes in R?
R.a is a column?
Type of constants 3, 5?
Operator + for types of R.a and 3 exists?
Operator > for types of result of + and 5 exists?
Rewriting
Unfolding views
Notes 8 - Parsing and Analysis
45
Database Catalog
Stores information about database
objects
Aliases:
Information Schema
System tables
Data Dictionary
CS 525
44
Semantic checks
Semantic checks
CS 525
CS 525
46
Schema, DB
Hierarchical structuring of data
Data types
Comparison operators
physical representation
Functions to (de)serialize to string
47
CS 525
48
Type Casts
Similar to automatic type conversion in
programming languages
Expression: R.a + 3.0
Say R.a is of type integer
Triggers
Stored Procedures
CS 525
CS 525
Scope checks
50
View Unfolding
CS 525
CS 525
51
52
SELECT *!
FROM totalSalary!
WHERE total > 10000!
SELECT *!
FROM (SELECT name, !
"
"
salary + bonus AS total!
FROM employee) AS totalSalary!
WHERE total > 10000!
CS 525
53
CS 525
54
Analysis Summary
Perform semantic checks
Catalog lookups (tables, functions, types)
Scope checks
View unfolding
Generate internal representation during
analysis
CS 525
55
CS 525
Conversion
Should be useful for analysis
Should be useful optimization
Internal representation
Relational algebra
Query tree/graph models
E.g., QGM (Query Graph Model) in Starburst
Notes 8 - Parsing and Analysis
57
Formal language
Good for studying logical optimization
and query equivalence (containment)
Not informative enough for analysis
No datatype representation in algebra
expressions
No meta-data
CS 525
Other Internal
Representations
Practical implementations
Mostly following structure of SQL query
blocks
Store data type and meta-data (where
necessary)
CS 525
56
Relational Alegbra
CS 525
59
58
Canonical Translation to
Relational Algebra
TEXTBOOK version of conversion
Given an SQL query
Return an equivalent relational algebra
expression
CS 525
60
10
Relation Schema
A set of attribute name-datatype pairs
Relation (instance)
Input(s): relation
Output: relation
-> Composable
Tuple
List of attribute value pairs (or function
from attribute name to value)
CS 525
61
CS 525
62
Set semantics:
Relations are Sets
Used in most theoretical work
Bag semantics
Relations are Multi-Sets
Each element (tuple) can appear more than
once
63
CS 525
Bag
Name
Purchase
Name
Purchase
Peter
Guitar
Peter
Guitar
Joe
Drum
Peter
Guitar
Alice
Bass
Joe
Drum
Alice
Bass
Alice
Bass
CS 525
65
64
Operators
Selection
Renaming
Projection
Joins
Theta, natural, cross-product, outer, anti
Aggregation
Duplicate removal
Set operations
CS 525
66
11
Selection
Selection Example
Syntax:c (R)
a>5 (R)
R is input
C is a condition
Semantics:
Return all tuples that match condition C
Set: { t | t R AND t fulfills C }
Bag: { tn | tnR AND t fulfills C }
CS 525
67
13
14
12
14
CS 525
Renaming
c a (R)
R is input
A is list of attribute renamings b a
Semantics:
Applies renaming from A to inputs
Set: { t.A | t R }
Bag: { (t.A)n | tnR }
69
13
13
12
12
14
14
CS 525
70
Projection Example
Syntax:A (R)
b (R)
R is input
A is list of projection expressions
Standard: only attributes in A
Semantics:
Project all inputs on projection expressions
Set: { t.A | t R }
Bag: { (t.A)n | tnR }
Notes 8 - Parsing and Analysis
Result
Projection
CS 525
68
Renaming Example
Syntax:A (R)
CS 525
Result
71
CS 525
Result
13
13
12
12
14
14
72
12
Cross Product
Syntax: R X S
R X S
CS 525
13
13
12
13
13
12
12
12
CS 525
73
Join
Syntax: R
13
12
12
CS 525
75
76
Result
Semantics:
Result
CS 525
Natural Join
Syntax: R
a=d
Semantics:
74
Join Example
CS 525
Result
Semantics:
13
12
12
CS 525
78
13
Left-outer Join
Syntax: R
R join S
t R without match, fill S attributes with NULL
{ (t,s) | t R AND sS AND (t,s) matches C}
union
{ (t, NULL(S)) | t R AND NOT exists sS: (t,s)
matches C }
Notes 8 - Parsing and Analysis
13
13
NULL
NULL
12
12
CS 525
79
13
NULL
NULL
12
12
NULL
NULL
CS 525
81
82
Result
Semantics:
CS 525
80
Full-outer Join
Syntax: R
a=d
Semantics:
CS 525
R join S
s S without match, fill R attributes with NULL
{ (t,s) | t R AND sS AND (t,s) matches C}
union
{ (NULL(R),s) | s S AND NOT exists tR: (t,s)
matches C }
Result
Right-outer Join
Syntax: R
Semantics:
CS 525
a=d
83
a=d
S
Result
R
a
13
13
NULL
NULL
12
NULL
NULL
CS 525
12
NULL
NULL
84
14
Semijoin
Syntax: R
S and R
Semijoin Example
Semantics:
All tuples from R that have a matching tuple from
relation S on the common attributes A
{ t | t R AND exists sS: t.A = s.A}
CS 525
13
12
12
CS 525
85
Antijoin
R and S are inputs
R S
Semantics:
All tuples from R that have no matching tuple from
relation S on the common attributes A
{ t | t R AND NOT exists sS: t.A = s.A}
87
13
13
12
CS 525
88
Aggregation Example
Syntax:GA (R)
bsum(a) (R)
Semantics:
Build groups of tuples according G and compute
the aggregation functions from each group
{ (t.G, agg(G(t)) | tR }
G(t) = { t | t R AND t.G = t.G }
Notes 8 - Parsing and Analysis
Result
R
a
Aggregation
CS 525
86
Antijoin Example
Syntax: R S
CS 525
Result
R
a
89
CS 525
Result
sum(a)
90
15
Duplicate Removal
Syntax:(R)
R is input
Semantics:
CS 525
91
Result
13
13
13
14
14
CS 525
Set operations
Union
Input: R and S
Syntax: R
U S
R and S are union-compatible inputs
Semantics:
Types
Union
Intersection
Set difference
CS 525
93
CS 525
Union Example
R
92
94
Intersection
Syntax: R S
R
a
Semantics:
Set: { (t) | t R AND tS}
Bag: { (t,s)min(n,m) | tnR AND smS }
1
3
CS 525
95
CS 525
96
16
Intersection Example
Set Difference
Syntax: R - S
R S
R
a
Semantics:
Set: { (t) | t R AND NOT tS}
Bag: { (t,s)n - m | tnR AND smS }
CS 525
97
CS 525
Result
R
a
98
Canonical Translation to
Relational Algebra
CS 525
99
CS 525
100
Canonical Translation
Canonical Translation
101
CS 525
102
17
Set Operations
UNION ALL into union
UNION duplicate removal over union
INTERSECT ALL into intersection
INTERSECT add duplicate removal
EXCEPT ALL into set difference
EXCEPT apply duplicate removal to
inputs and then apply set difference
CS 525
103
SELECT sum(R.a)!
FROM R !
GROUP BY b
sum(a)
Bsum(a)
R
CS 525
104
a =b
a=b
headcnt count(*)
depcount(*)
Employee
CS 525
105
CS 525
106
CS 525
107
18
SQL query
parse
parse tree
convert
answer
execute
statistics
Pi
pick best
{(P1,C1),(P2,C2)...}
estimate costs
{P1,P2,..}
CS 525
CS 525
Query Optimization
Query Optimization
Estimate Costs
without indexes
with indexes
CS 525
CS 525
Query Equivalence
CS 525
CS 525
Rules:
R
(R
S =
S)
T
S
R
=R
(S
Note:
Carry attribute names in results, so
order is not important
Can also write as trees, e.g.:
T)
T
R
CS 525
Rules:
R
(R
S =
S)
T
S
R
=R
(S
T)
CS 525
Rules: Selects
p1p2(R) =
p1vp2(R) =
RxS=SxR
(R x S) x T = R x (S x T)
RUS=SUR
R U (S U T) = (R U S) U T
CS 525
CS 525
10
Rules: Selects
p1p2(R) =
p1vp2(R) =
CS 525
p1 [ p2 (R)]
[ p1 (R)] U [ p2 (R)]
11
R = {a,a,b,b,b,c}
S = {b,b,c,c,d}
RUS = ?
CS 525
12
R = {a,a,b,b,b,c}
S = {b,b,c,c,d}
RUS = ?
Example: R={a,a,b,b,b,c}
P1 satisfied by a,b; P2 satisfied by b,c
Option 1 SUM
RUS = {a,a,b,b,b,b,b,c,c,c,d}
Option 2 MAX
RUS = {a,a,b,b,b,c,c,d}
CS 525
13
15
CS 525
14
Rep ()
T1 =
yr,state Senators;
T2 =
yr,state Reps
T1
Yr
97
99
98
T2
Yr
99
99
98
State
CA
CA
AZ
State
CA
CA
CA
Union?
CS 525
16
Rules: Project
Executive Decision
-> Use SUM option for bag unions
-> Some rules cannot be used for bags
xy (R) =
CS 525
17
CS 525
18
Rules: Project
Rules: Project
xy (R) = x [y (R)]
xy (R) = x [y (R)]
CS 525
Rules: +
19
CS 525
Rules: +
combined
20
combined
(R
S) =
(R
S) =
CS 525
Rules: +
combined
21
(continued)
CS 525
S) =
S) =
S) =
Notes 9 - Logical Optimization
(R
S) =
(R
S) =
CS 525
(R)]
S
q
(S)]
22
Do one:
pq (R S) = [p (R)] [q (S)]
pqm (R S) =
m [(p R) (q S)]
pvq (R S) =
[(p R) S] U [R (q S)]
pq (R
pqm (R
pvq (R
23
CS 525
24
Rules:
pq (R S) =
p [q (R S) ] =
p [ R q (S) ] =
[q (S)]
[p (R)]
CS 525
Rules:
x[p (R) ] =
CS 525
25
, combined
Rules:
{p [ x
x[p (R) ] =
CS 525
Rules:
, combined
(R)
]}
27
combined
26
, combined
xz
x[p (R) ] = x {p [ x
CS 525
Rules:
(R)
]}
28
combined
xy (R
xy (R
S) =
S) =
xy{[xz (R) ]
CS 525
29
CS 525
[yz (S) ]}
30
xy {p (R
S)}
xy {p (R
S)}
xy {p [xz (R)
yz (S)]}
z = z U {attributes used in P
CS 525
31
CS 525
33
S S
Rules
32
, U combined:
p (R X S) = ?
CS 525
35
CS 525
34
Conventional wisdom:
do projects early
Example: R(A,B,C,D,E) x={E}
P: (A=3) (B= cat )
x {p (R)}
CS 525
vs.
E {p{ABE(R)}}
36
Bottom line:
B = cat
A=3
37
CS 525
38
Pushing Selections
More transformations
Eliminate common sub-expressions
Detect constant expressions
Other operations: duplicate elimination
Idea:
Join conditions equate attributes
For parts of algebra tree (scope) store
which attributes have to be the same
Called Equivalence classes
b=3 (R
CS 525
39
CS 525
Outer-Joins
S) =
b=3 (R)
b=c
c=3
(S)
40
Summary Equivalences
Associativity: (R S) T R (S T)
Commutativity: R S S R
Distributivity: (R S) T (R T) (S T)
Difference between Set and Bag Equivalences
Only some equivalence are useful
Not commutative
R S S R
p condition over attributes in A
A list of attributes from R
p (R A=B S) p (R) A=B S
Not p (R A=B S) R A=B p (S)
CS 525
CS 525
b=c
41
42
transformations
good transformations
CS 525
43
CS 525
Example
R
cat 1 10 a
cat 1 20 b
dog 1 30 a
T(R) : # tuples in R
S(R) : # of bytes in each R tuple
B(R): # of blocks to hold all R tuples
V(R, A) : # distinct values in R
for attribute A
CS 525
Example
R
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
45
A: 20 byte string
B: 4 byte integer
C: 8 byte date
D: 5 byte string
dog 1 40 c
44
A: 20 byte string
B: 4 byte integer
C: 8 byte date
D: 5 byte string
bat 1 50 d
CS 525
46
T(R) = 5
S(R) = 37
V(R,A) = 3
V(R,C) = 5
V(R,D) = 4
V(R,B) = 1
CS 525
47
CS 525
48
T(W) =
T(R1) T(R2)
S(W) = S(R)
S(W) =
S(R1) + S(R2)
T(W) = ?
CS 525
Example
R
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
49
V(R,A)=3
V(R,B)=1
V(R,C)=5
V(R,D)=4
CS 525
z=val(R)
CS 525
50
V(R,A)=3
V(R,B)=1
V(R,C)=5
V(R,D)=4
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
W=
T(W) =
Notes 9 - Logical Optimization
Example
R
bat 1 50 d
W=
A=a (R)
51
CS 525
z=val(R)
T(W) =
T(R)
V(R,Z)
52
Assumption:
Alternate Assumption:
CS 525
CS 525
53
54
Example
R
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
Alternate assumption
V(R,A)=3 DOM(R,A)=10
V(R,B)=1 DOM(R,B)=10
V(R,C)=5 DOM(R,C)=10
V(R,D)=4 DOM(R,D)=10
bat 1 50 d
W=
z=val(R)
CS 525
T(W) = ?
Example
R
55
56
Selection cardinality
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
Alternate assumption
V(R,A)=3 DOM(R,A)=10
V(R,B)=1 DOM(R,B)=10
V(R,C)=5 DOM(R,C)=10
V(R,D)=4 DOM(R,D)=10
bat 1 50 d
W=
CS 525
z=val(R)
CS 525
T(W) =
T(R)
DOM(R,Z)
What about W =
z val (R)
57
What about W =
T(W) = ?
z val (R)
58
T(W) = ?
Solution # 1:
T(W) = T(R)/2
CS 525
59
CS 525
60
10
What about W =
z val (R)
?
Example R
T(W) = ?
Z
Min=1
V(R,Z)=10
W= z 15 (R)
Solution # 1:
T(W) = T(R)/2
Max=20
Solution # 2:
T(W) = T(R)/3
CS 525
CS 525
61
Z
Min=1
V(R,Z)=10
W= z 15 (R)
62
Equivalently:
fV(R,Z) = fraction of distinct values
T(W) = [f V(Z,R)] T(R) = f T(R)
V(Z,R)
Max=20
f = 20-15+1 = 6
20-1+1
20
(fraction of range)
T(W) = f T(R)
CS 525
CS 525
63
R2
Let x = attributes of R1
y = attributes of R2
64
R2
Let x = attributes of R1
y = attributes of R2
Case 1
XY=
Same as R1 x R2
CS 525
65
CS 525
66
11
Case 2
R1
W = R1
B
R2
R2
XY=A
W = R1
Case 2
R1
R2
R2
XY=A
Assumption:
V(R1,A) V(R2,A) Every A value in R1 is in R2
V(R2,A) V(R1,A) Every A value in R2 is in R1
CS 525
CS 525
67
Take
1 tuple
R2
Take
1 tuple
Match
68
R2
Match
CS 525
V(R1,A)
V(R2,A)
69
T(W) =
CS 525
In general
W = R1
T(W) =
V(R2,A)
V(R1,A)
T(R2) T(R1)
V(R2, A)
70
R2
T(R2) T(R1)
max{ V(R1,A), V(R2,A) }
V(R1,A)
[A is common attribute]
CS 525
71
CS 525
72
12
In all cases:
R2 A
size of attribute A
73
CS 525
74
AB (R)
R2
A=aB=b (R)
R
S with common attribs. A,B,C
Union, intersection, diff,
Treat as relation U
T(U) = T(R1)/V(R1,A)
S(U) = S(R1)
75
CS 525
Example
R1 A
To estimate Vs
B C D
cat 1 10 10
CS 525
77
cat 1 20 20
dog 1 30 10
dog 1 40 30
bat 1 50 10
CS 525
76
V(R1,A)=3
V(R1,B)=1
V(R1,C)=5
V(R1,D)=3
U=
A=a (R1)
78
13
Example
R1 A
cat 1 10 10
cat 1 20 20
dog 1 30 10
dog 1 40 30
bat 1 50 10
V(R1,A)=3
V(R1,B)=1
V(R1,C)=5
V(R1,D)=3
U=
Possible Guess
V(U,A)
V(U,B)
U=
A=a (R)
=1
= V(R,B)
A=a (R1)
T(R1)
V(R1,A)
For Joins
U = R1(A,B)
79
R2(A,C)
CS 525
Example:
Z = R1(A,B)
CS 525
Partial Result: U = R1
T(U) = 10002000
200
CS 525
R3(C,D)
CS 525
R2
Z=U
83
R2(B,C)
R1
81
V(U,A) = 50
V(U,B) = 100
V(U,C) = 300
80
82
R3
CS 525
84
14
Approximating Distributions
Summarize the distribution
Approximating Distributions
Parameterized distribution
Concerns
Histograms
85
86
Maintaining Statistics
Parameterized
Distribution
Histograms
Use Sampling?
CS 525
87
number of tuples
in R with A value
in given range
20
10
10
20
30
CS 525
88
40
#B
|B|
A=val(R) = ?
CS 525
89
CS 525
90
15
T(R2) T(R1)
max{ V(R1,A), V(R2,A) }
T(W) =
buckets
#B(R2) #B(R1)
|B|
91
CS 525
92
Equi-width
All buckets contain the same number of
values
Easy, but inaccurate
93
Construct Equi-depth
Histograms
Sort input
Determine size of buckets
#bucket / #tuples
Example 3 buckets
1, 5,44, 6,10,12, 3, 6, 7!
1, 3, 5, 6, 6, 7,10,12,44!
[1-5][6-8][9-44]!
CS 525
95
CS 525
94
Advanced Techniques
Wavelets
Approximate Histograms
Sampling Techniques
Compressed Histograms
CS 525
96
16
Summary
Outline
Dont forget:
Statistics must be kept up to date
(cost?)
CS 525
97
done!
next
CS 525
98
17
SQL query
parse
parse tree
convert
apply laws
improved l.q.p
answer
execute
statistics
Pi
pick best
{(P1,C1),(P2,C2)...}
estimate costs
{P1,P2,..}
CS 525
CS 525
Query Execution
Here only:
how to implement operators
what are the costs of implementations
how to implement queries
Data flow between operators
Next part:
How to choose good plan
CS 525
Execution Plan
A tree (DAG) of physical operators that
implement a query
May use indices
May create temporary relations
May create indices on the fly
May use auxiliary operations such as
sorting
CS 525
Estimating IOs:
Count # of disk blocks that must be
read (or written) to execute query plan
If not
Assume fixed memory available for
buffering pages
Count I/O operations
Real systems combine this with CPU
estimations
CS 525
CS 525
CS 525
CS 525
Clustered index
10
15
17
19
35
37
CS 525
(External) Sorting
Joins (Nested Loop, Merge, Hash, )
Aggregation (Sorting, Hash)
Selection, Projection (Index, Scan)
Union, Set Difference
Intersection
Duplicate Elimination
CS 525
Operator Profiles
Algorithm
In-memory complexity: e.g., O(n2)
Memory requirements
Runtime based on available memory
11
10
Execution Strategies
Compiled
Translate into C/C++/Assembler code
Compile, link, and execute code
Interpreted
Generic operator implementations
Generic executor
Operators Overview
12
Iterator Model
CS 525
CS 525
13
14
Open
open
Close
close
next
Iterator
Iterator
open
close
next
open
close
next
Next
Return next result tuple
Iterator
open
CS 525
CS 525
15
close
next
close
next
open
close
close
next
Iterator
Iterator
next
open
close
next
open
Iterator
open
CS 525
close
17
16
Iterator
open
Iterator
Iterator
open
next
Iterator
open
close
close
next
Iterator
next
open
CS 525
close
next
18
Parallelism
Intra-Operator Parallelism
Iterator Model
Merge-Sort
CS 525
19
Scan
Partition tables
Each process scans one partition
CS 525
Inter-Operator Parallelism
Each process executes one or more
operators
Pipelining
Push-based query execution
Chain operators to directly produce results
Pipeline-breakers
Operators that need to consume the whole
input (or large parts) before producing outputs
CS 525
Key
21
20
Pipelining Communication
Queues
Operators push their results to queues
Operators read their inputs from queues
Direct call
Operator calls its parent in the tree with
results
Within one process
CS 525
Pipelines
22
Pipeline-breakers
Append to queue
Dequeue
Direct Call
Sorting
All operators that apply sorting
Aggregation
Set Difference
Some implementations of
Join
Union
CS 525
23
CS 525
24
Operators Overview
Sorting
(External) Sorting
Joins (Nested Loop, Merge, Hash, )
Aggregation (Sorting, Hash)
Selection, Projection (Index, Scan)
Union, Set Difference
Intersection
Duplicate Elimination
CS 525
CS 525
25
In-memory sorting
Algorithms from data structures 101
Quick sort
Merge sort
Heap sort
Intro sort
CS 525
26
External sorting
Problem:
Sort N pages of data with M pages of
memory
Solutions?
27
CS 525
28
First Idea
Merging Runs
Now what?
M
CS 525
29
CS 525
30
Merging Runs
31
merge
write
CS 525
read
R1
read
R2
CS 525
Input
32
2 3 6 5 7 4 10 1 11 12 13 14 20 40 22 21
Pass 0
33
Pass 1
5
6
2356
Pass 2
10
1
1 4 7 10
Pass 3
11
12
13
14
11 12 13
14
CS 525
35
20 21 22
40
34
Merging Runs
R1
R2
CS 525
21
22
1 2 3 4 5 6 7 10 11 12 13 14 20 21 22 40
read
merge
read
write
#I/Os
2 B(R) *(1 + logM-1 (B(R) / M))I/Os
20
40
11 12 13 14 20 21 22 40
1 2 3 4 5 6 7 10
4
7
RM-1
CS 525
read
36
M=17
M=129
M=257
M=513
M=1025
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
CS 525
37
Merge
Scenario
Page size 4KB
1TB of data (250,000,000)
10MB of buffer for sorting (250)
Passes
4 passes
CS 525
38
Merging M runs
To choose next element to output
Have to compare M elements
-> complexity linear in M: O(M)
39
CS 525
Priority Queue
40
Min-Heap
Example: { 1, 4, 7, 10 }
10
7
CS 525
41
CS 525
42
Min-Heap Insertion
Min-Heap Dequeue
insert(e)!
pop-smallest!
CS 525
43
Insertion
44
Dequeue
Insert 3
Insert at first free position
1
4
10
3
CS 525
10
10
3
CS 525
45
Dequeue
10
46
Min/Max-Heap Complexity
Heap is a complete tree
Height is O(log2(n))
Insertion
3
4
7
10
10
10
Dequeue
Maximal height of the tree switches
-> O(log2(n))
CS 525
47
CS 525
48
Min-Heap Implementation
Full tree
1
8
9
Compute positions
Parent(n) = (n1)/2
Children(n) = 2n + 1, 2n + 2
2
1
4
1
CS 525
3
10
7
10
12
1
4
10
7
Notes 10 - Query Execution
11
8
10
12
11
13
13
51
CS 525
Replace elements
Remove smallest element from heap
If larger then last element written of current
run then write to current run
Else create a new run
52
1
6
50
10
12
CS 525
6
11
13
CS 525
49
5
7
2
3
4
12
15
1
2
7
53
CS 525
54
5
7
2
3
4
15
12
1
5
7
2
3
4
15
12
1
CS 525
2
3
4
5
7
15
CS 525
CS 525
12
2
3
4
5
7
1
15
59
56
5
7
2
3
4
15
12
1
CS 525
57
2
3
CS 525
55
3
7
2
3
4
5
7
12
15
58
CS 525
12
2
3
4
5
7
15
60
10
Approach
Traverse from root to first leaf: HT(I)
Follow sibling pointers: |R| / K
Read data blocks: B(R)
CS 525
CS 525
61
I/O Operations
62
Unclustered B+-tree?
In worst-case we have
K * B(R)
K = 500
500 * B(R) = 250 merge passes
CS 525
CS 525
63
64
Sorting Comparison
B(R) = number of block of R
M = number of available memory blocks
#RB = records per page
HT = height of B+-tree (logarithmic)
K = number of keys per leaf node
Operators Overview
Property
Ext. Mergesort
B+ (clustered)
B+ (unclustered)
Runtime
O (N logM-1 (N))
O(N)
O(N)
#I/O (random)
2 B(R) * (1 +
HT + |R| / K +
logM-1 (B(R) / M)) B(R)
Memory
1 (better HT + X)
1 (better HT + X)
Disk Space
2 B(R)
Variants
1) Merge with
heap
2) Run generation
with heap
3) Larger Buffer
CS 525
HT + |R| / K + K *
#RB
65
(External) Sorting
Joins (Nested Loop, Merge, Hash, )
Aggregation (Sorting, Hash)
Selection, Projection (Index, Scan)
Union, Set Difference
Intersection
Duplicate Elimination
CS 525
66
11
Scan
Operators Overview
(External) Sorting
Joins (Nested Loop, Merge, Hash, )
Aggregation (Sorting, Hash)
Selection, Projection (Index, Scan)
Union, Set Difference
Intersection
Duplicate Elimination
Variants
Sequential
Scan through all tuples of relation
Index
Use index to find tuples that match selection
CS 525
CS 525
67
Options
Transformations: R1
Joint algorithms:
R2, R2
R1
Nested loop
Merge join
Join with index
Hash join
68
69
CS 525
70
Procedure Output-Tuples
While (R1{ i }.C = R2{ j }.C) (i T(R1)) do
[jj j;
while (R1{ i }.C = R2{ jj }.C) (jj T(R2)) do
[output pair R1{ i }, R2{ jj };
jj jj+1 ]
Applicable to:
C is conjunction of equalities or </>:
i i+1 ]
A1 = B1 AND AND An = Bn
CS 525
71
CS 525
72
12
Example
R1{i}.C
R2{j}.C
1
2
3
4
5
10
20
20
30
40
5
20
20
30
30
50
52
1
2
3
4
5
6
7
CS 525
73
R1
2
4
3
5
8
9
CS 525
R2
5
4
12
3
13
8
11
14
75
hash: even/odd
248
R1
Odd:
for each s X do
output (r,s) pair]
Note: X index(rel, attr, value)
then X = set of rel tuples with attr = value
CS 525
74
Algorithm
(1) Hash R1 tuples into G buckets
(2) Hash R2 tuples into H buckets
(3) For i = 0 to k do
match tuples in Gi, Hi buckets
Buckets
Even:
Applicable to:
C is conjunction of equalities
A1 = B1 AND AND An = Bn
Simple example
CS 525
For each r R1 do
359
4 12 8 14
CS 525
76
R2
5 3 13 11
77
CS 525
78
13
R2
Example 1(a)
Nested Loop Join R1
R2
CS 525
79
Can we do better?
CS 525
80
Can we do better?
Use
(1)
(2)
(3)
CS 525
81
CS 525
our memory
Read 100 blocks of R1
Read all of R2 (using 1 block) + join
Repeat until done
82
CS 525
83
CS 525
84
14
Can we do better?
Can we do better?
Reverse join order: R2
R1
CS 525
CS 525
85
R2
CS 525
87
86
CS 525
88
R1
Note
M-k
M-k
M-k
R2
k
CS 525
89
CS 525
90
15
Memory
Memory
R1
..
R1
R2
R2
..
R1
..
R1
..
R2
R2
CS 525
91
B=C
Output: (a,1,1,X)
B=C
ZR
CS 525
ZS
B=C
ZS
ZR
CS 525
93
Output: (b,1,1,X)
S
92
94
B=C
S
Output: (a,2,2,y)
CS 525
ZS
ZR
95
CS 525
ZS
ZR
96
16
B=C
B=C
S
R.B > S.C: advance ZS
Output: (a,2,2,e)
S
ZR
CS 525
ZS
B=C
98
B=C
ZR
CS 525
97
ZS
S
R.B < S.C: advance ZR
S
ZR
CS 525
ZS
CS 525
99
B=C
ZR
100
S
R.B < S.C: DONE
S
R
A
CS 525
ZS
ZS
ZR
Notes 10 - Query Execution
101
CS 525
102
17
Sorted
Chunks
sorted
chunks
...
R1
R2
Memory
...
- Read chunk
- Sort in memory
- Write to disk
...
Memory
CS 525
103
Cost: Sort
Each tuple is read,written,
read, written
so...
Sort cost R1: 4 x 1,000 = 4,000
Sort cost R2: 4 x 500 = 2,000
CS 525
105
CS 525
107
104
CS 525
But say
R1 = 10,000 blocks
R2 = 5,000 blocks
106
contiguous
not ordered
108
18
CS 525
...
R1
In general:
109
CS 525
110
In general:
In general:
CS 525
111
CS 525
112
In our example
R1
Join?
113
sorted runs
CS 525
114
19
CS 525
115
CS 525
116
CS 525
117
CS 525
119
CS 525
118
CS 525
120
20
CS 525
121
CS 525
122
So far
Nested Loop
Merge join
Sort+Merge Join
R1.C Index
R2.C Index
5500
1500
7500 4500
5500 3050 550
________
123
CS 525
100
memory
10 blocks
CS 525
R1
...
...
...
R1
124
...
CS 525
125
CS 525
126
21
Cost:
Cost:
Bucketize:
Read R1 + write
Bucketize:
Read R1 + write
Read R2 + write
Join:
Read R2 + write
Join:
Read R1, R2
Read R1, R2
CS 525
CS 525
127
128
S
Notes 10 - Query Execution
129
CS 525
CS 525
131
130
Duality Hashing-Sorting
Both partition inputs
Until input fits into memory
Logarithmic number of phases in
memory size
CS 525
132
22
E.g., k=33
E.g., k=33
R1 buckets = 31 blocks
keep 2 in memory
memory
in
R1
R1 buckets = 31 blocks
keep 2 in memory
memory
G0
in
R1
31
Memory use:
G0
31 buffers
G1
31 buffers
Output
33-2 buffers
R1 input 1
Total
94 buffers
6 buffers to spare!!
G0
31
G1
33-2=31
...
...
G1
33-2=31
CS 525
133
Next: Bucketize R2
G0
memory
R2 buckets
R1 buckets
16
...
...
33-2=31
33-2=31
135
Cost
Bucketize R1 = 1000+3131=1961
To bucketize R2, only write 31 buckets:
so, cost = 500+3116=996
To compare join (2 buckets already done)
read 3131+3116=1457
one full R2
bucket
Gi
31
G1
CS 525
ans
out
one R1
buffer
CS 525
R2 buckets
R1 buckets
16
31
33-2=31
...
memory
R2
134
...
CS 525
33-2=31
136
R1
memory
G0
R1
in
G0
G1
OR...
137
CS 525
138
23
CS 525
139
CS 525
140
CS 525
141
So far:
500
1000
100
1600
142
Iterate
Merge join
Sort+merge joint
R1.C index
R2.C index
Build R1.C index
Build R2.C index
Hash join
with trick,R1 first
with trick,R2 first
Hash join, pointers
CS 525
CS 525
Read R2:
Read R1:
Get tuples:
5500
1500
7500
5500 550
_____
_____
_____
4500+
4414
_____
1600
143
144
24
Summary
Nested Loop ok for small relations
(relative to memory size)
Need for complex join condition
CS 525
CS 525
145
146
Join Comparison
Ni= number of tuples in Ri
B(Ri) = number of blocks of Ri
#P = number of partition steps for hash join
Pij = average number of join partners
Algorithm
#I/O
Memory
Disk Space
Nested Loop
(block)
B(R1) / (M-1) * !
[min(B(R),M-1)
+ B(R2)]
B(R1) + N1 * P12
B(Index) + 2
Merge (sorted)
B(R1) + B(R2)
Max tuples =
Merge (unsorted)
B(R1) + B(R2)+
(sort 1 pass)
sort
B(R1) + B(R2)
Hash
(2#P + 1) (B(R1) +
B(R2))
root(max(B(R1),
B(R2)), #P + 1)
~B(R1) + B(R2)
CS 525
147
Type of Condition
Example
Nested Loop
any
a LIKE %hello%
Supported by index:
Equi-join (hash)
Equi or range (B-tree)
a=b
a<b
Merge
a < b, a = b AND c = d
Hash
Equi-join
a=b
CS 525
Outer Joins
B=C
Output: (a,1,1,X)
S
R
C
d
e
ZR
ZS
Hash
If no matching tuple fill with NULL
CS 525
148
149
CS 525
150
25
B=C
B=C
ZR
ZS
CS 525
CS 525
151
CS 525
Groups
Determined by equality of group-by
attributes
154
init()!
Initialize state
update(tuple)!
sum(a)
CS 525
Aggregation Function
Interface
Aggregation Example
SELECT sum(a),b!
FROM R!
GROUP BY b!
152
Aggregation
CS 525
153
ZS
Operators Overview
(External) Sorting
Joins (Nested Loop, Merge, Hash, )
Aggregation (Sorting, Hash)
Selection, Projection (Index, Scan)
Union, Set Difference
Intersection
Duplicate Elimination
ZR
close()!
Return result and clean-up
155
CS 525
156
26
Implementation SUM(A)
Aggregation Implementations
Sorting
init()!
sum := 0!
update(tuple)!
Hashing
sum += tuple.A!
close()!
return sum!
CS 525
CS 525
157
158
Aggregation Example
Grouping by sorting
Similar to Merge join
Sort R on group-by attribute
Scan through sorted input
SELECT sum(a),b!
FROM R!
GROUP BY b!
Otherwise
Call update()
CS 525
SELECT sum(a),b!
FROM R!
GROUP BY b!
b
1
1
160
SELECT sum(a),b!
FROM R!
GROUP BY b!
update(3,1)
Aggregation Example
CS 525
CS 525
159
Aggregation Example
init()
sort
161
CS 525
update(3,1)
6
162
27
Aggregation Example
Grouping by Hashing
SELECT sum(a),b!
FROM R!
GROUP BY b!
a
Group by changed!
close(), init(), update(4,2)
CS 525
output
163
CS 525
Aggregation Example
SELECT sum(a),b!
FROM R!
GROUP BY b!
Aggregation Example
SELECT sum(a),b!
FROM R!
GROUP BY b!
Init() and update(3,1)
CS 525
165
CS 525
Aggregation Example
SELECT sum(a),b!
FROM R!
GROUP BY b!
Init() and update(4,2)
a
4
3
SELECT sum(a),b!
FROM R!
GROUP BY b!
update(3,1)
b
1
167
166
Aggregation Example
CS 525
164
CS 525
6
4
168
28
Aggregation Example
SELECT sum(a),b!
FROM R!
Loop through hash table entries
Output tuples
GROUP BY b!
a
CS 525
6
6
Aggregation Summary
Hashing
No sorting -> no extra I/O
Hash table has to fit into memory
No outputs before all inputs have been
processed
Sorting
No memory required
Output one group at a time
Notes 10 - Query Execution
169
CS 525
Operators Overview
170
Duplicate Elimination
(External) Sorting
Joins (Nested Loop, Merge, Hash, )
Aggregation (Sorting, Hash)
Selection, Projection (Index, Scan)
Union, Set Difference
Intersection
Duplicate Elimination
CS 525
CS 525
171
Hash
Directly output tuple and use hash table only to
avoid outputting duplicates
Operators Overview
(External) Sorting
Joins (Nested Loop, Merge, Hash, )
Aggregation (Sorting, Hash)
Selection, Projection (Index, Scan)
Union, Set Difference
Intersection
Duplicate Elimination
CS 525
173
172
Set Operations
Can be modeled as join
with different output requirements
CS 525
174
29
Union
Intersection
Set version
Bag union
Set union
Apply duplicate removal to result
Bag version
Join + project + min(i,j)
Aggegate min(count(i),count(j))
CS 525
175
CS 525
Set Difference
176
Summary
Operator implementations
Using aggregation
Cost estimations
I/O
memory
CS 525
Joins!
Other operators
177
CS 525
178
Next
Query Optimization Physical
-> How to efficiently choose an
efficient plan
CS 525
179
30
SQL query
parse
parse tree
convert
answer
execute
apply laws
improved l.q.p
estimate result sizes
l.q.p. +sizes
statistics
Pi
pick best
{(P1,C1),(P2,C2)...}
estimate costs
{P1,P2,..}
CS 525
CS 525
Cost of Query
Cost of Query
Parse + Analyze
Parse + Analyze
Optimization Find plan
Execution
Return results to client
Execution
Execute plan
CS 525
Physical Optimization
Apply after applying heuristics in logical
optimization
1) Enumerate potential execution plans
All?
Subset
Physical Optimization
To apply pruning in the search for the
best plan
Steps 1 and 2 have to be interleaved
Prune parts of the search space
if we know that it cannot contain any plan that
is better than what we found so far
2) Cost plans
What cost function?
CS 525
CS 525
Example Query
SELECT e.name!
FROM Employee e, !
EmpDep ed, !
Department d!
WHERE e.name = ed.emp " !
!AND ed.dep = d.dep!
"AND d.dep = CS!
name
dep=dep
SELECT e.name!
FROM Employee e, !
EmpDep ed, !
Department d!
WHERE e.name = ed.emp " !
!AND ed.dep = d.dep!
"AND d.dep = CS!
name
dep=dep
NL
name=emp dep=CS
Employee
CS 525
EmpDep
name=emp
MJ
SSEmployee SSEmpDep
Department
Cost Model
CS 525
SELECT e.name!
Cost?
FROM Employee e, !
Need input
EmpDep ed, !
Department d!
WHERE e.name = ed.emp " !
!AND ed.dep = d.dep!
"AND d.dep = CS!
#disk I/O
CPU cost
Response time
Total execution time
Cost of operators
name
size!
dep=dep
NL
name=emp
SSEmployee SSEmpDep
CS 525
IS
dep=CS
Department
10
Plan Enumeration
For each operator in the query
Precision
Incorrect cost-estimation -> choose
suboptimal plan
CS 525
Department
MJ
dep=CS
Cost factors
CS 525
IS
11
12
500
10000
10000
500
10000
dep=dep
1
30
10000
10000
10000
dep=CS ED
500
dep=dep
name=emp dep=CS
name=emp
ED
500
500
dep=dep
name=emp dep=CS
10000
10000
ED
30
CS 525
13
500
500
dep=dep
3
1100
10
x name=emp
dep=CS
10000
ED
1050 x 1
name=emp
500
dep=dep
1001 x 1
1
30
14
Plan Enumeration
10000
10000
dep=CS ED
10000
10000
dep=dep
30
101 x 10
500
30
CS 525
name=emp
10000
10000
dep=CS ED
All
Consider all potential plans of a certain
type (discussed later)
Prune only if sure
Heuristics
Apply heuristics to prune search space
Randomized Algorithms
30
D
CS 525
15
CS 525
Heuristics
Minimum Selectivity, Intermediate result size,
KBZ-Algorithm, AB-Algorithm
Randomized
Genetic Algorithms
Simulated Annealing
Notes 11 - Physical Optimization
16
CS 525
17
Equivalences Equi-Join
1. R a=b S S a=b R
2. (R a=b S) c=d T R a=b (S c=d T)?
3. a=b (R X S) R a=b S?
CS 525
18
Equi-Join Equivalences
(R a=b S) c=d T R a=b (S c=d T)
What if c is attribute of R?
(R a=b S) c=d T R a=bc=d (S X T)
a=b (R X S) R a=b S?
Only useful if a is from R and S from b (viceversa)
CS 525
19
R X S
Result size is O(n2)
Cannot be better than O(n2)
Agenda
20
Join Graph
Assumptions
Only equi-joins (a = b)
CS 525
21
CS 525
Join Graph
22
SELECT e.name!
FROM Employee e, !
EmpDep ed, !
Department d!
WHERE e.name = ed.emp " !
!AND ed.dep = d.dep!
"AND d.dep = CS!
Department
EmpDep
Employee
23
CS 525
24
dep=CS
Department
name=emp
EmpDep
dep=dep
Employee
R
CS 525
25
CS 525
Chain queries
Cycle queries
CS 525
Tree queries
Star queries
b=c
T
26
Chain queries
Clique queries
27
CS 525
SELECT *!
FROM R,S,T,U!
WHERE R.a = S.a!
!AND R.b = T.b!
"AND R.c = U.c"
29
28
Star queries
CS 525
a=b
CS 525
Tree queries
30
Cycle queries
CS 525
Clique queries
31
CS 525
32
Assumption
12 orders
CS 525
33
R
T
S
S
34
k joins
CS 525
n-1
n k - 1 joins
Cn =
C0 = 1
CS 525
35
CS 525
36
CS 525
37
12
120
1,680
30,240
665,280
17,297,280
17,643,225,600
10
670,442,572,800
11
28,158,588,057,600
CS 525
#relations
Example consider
Nested loop
Merge
Hash
Notes 11 - Physical Optimization
39
CS 525
#join trees
108
3240
136,080
7,348,320
484,989,120
37,829,151,360
115,757,203,161,600
10
13,196,321,160,422,400
11
1,662,736,466,213,222,400
Notes 11 - Physical Optimization
40
CS 525
38
CS 525
#join trees
Dynamic programming
A*-search
41
CS 525
42
What is dynamic
programming?
Dynamic Programming
Assumption: Principle of Optimality
To compute the global optimal plan it is
only necessary to consider the optimal
solutions for its sub-queries
Memoize
Depends on cost-function
CS 525
CS 525
F(4)
F(3)
Fib(n)!
{!
"if (n = 0) return 0!
"else if (n = 1) return 1!
"else return Fib(n-1) + Fib(n-2)!
}!
CS 525
44
45
Complexity
F(2)
F(1)
CS 525
F(2)
F(1)
F(1)
F(0)
F(0)
46
Number of calls
C(n) = C(n-1) + C(n-2) + 1 = Fib(n+2)
O(2n)
!
47
CS 525
48
What do we gain?
O(n) instead of O(2n)
F(4)
F(3)
F(2)
F(1)
CS 525
F(0)
49
CS 525
51
CS 525
possible_joins(plan, plan)!
Enumerate all joins (merge, NL, )
variants between the input plans
50
DP Join Enumeration
optPlan Map({R},{plan})!
"
find_join_dp(q(R1,,Rn))!
{!
for i=1 to n!
optPlan[{Ri}] access_paths(Ri)!
for i=2 to n!
foreach S {R1,,Rn} with |S|=i!
optPlan[S] !
foreach O S with O !
optPlan[S] optPlan[S] !
possible_joins(optPlan(O), optPlan(S\O))!
prune_plans(optPlan[S])!
return optPlan[{R1,,Rn}]!
}!
CS 525
52
DP-JE Complexity
Time: O(3n)
Space: O(2n)
Still to much for large number of joins
(10-20)
prune_plans({plan})!
Only keep cheapest plan from input set
CS 525
53
CS 525
54
U
S
R
zig-zag
S
R
CS 525
Number of Join-Trees
Right-deep
bushy
T U S
+left
+ zig-zag
+right
CS 525
55
56
12
120
24
1,680
120
30,240
720
665,280
5040
17,297,280
40,230
17,643,225,600
362,880
10
670,442,572,800
3,628,800
11
28,158,588,057,600
39,916,800
57
Reduced search-space
Each join is with input relation
->can use index joins
->easy to pipe-line
59
U
S
R
CS 525
CS 525
58
Interesting Orders
Number of interesting orders is usually
small
->Extend DP join enumeration to keep
track of interesting orders
Determine interesting orders
For each sub-query store best-plan for
each interesting order
CS 525
60
10
HJ
HJ
T
R
HJ
T
HJ
R
HJ
S
T
best
HJ
HJ
{R,S}
HJ
HJ
S
CS 525
{R,S}
{S,T}
HJ
HJ
CS 525
61
HJ
HJ
HJ
{S,T}
{R,T}
HJ
62
Heuristic method
"
Not best
MJ
CS 525
HJ
{R,T}
HJ
MJ
MJ
63
Space: O(n2)
find_join_dp(q(R1,,Rn))!
{!
for i=1 to n!
plans plans access_paths(Ri)!
for i=n to 2!
cheapest = argminj,k{1,,n} (cost(Pj Pk))!
plans plans \ {Pj,Pk} {Pj Pk}
!
return plans // single plan left!
}!
CS 525
64
Randomized Join-Algorithms
Iterative improvement
Simulated annealing
Tabu-search
Genetic algorithms
CS 525
65
CS 525
66
11
Transformative Approach
Start from (random) complete solutions
Apply transformations to generate new
solutions
Direct application of equivalences
Commutativity
Associativity
Combined equivalences
E.g., (R S) T T (S R)
CS 525
67
CS 525
Iterative Improvement
improve(q(R1,,Rn))!
{!
best random_plan(q)!
while (not reached time limit)!
curplan random_plan(q)!
do
!"
!prevplan curplan!
curplan apply_random_trans (prevplan)"
while (cost(curplan) < cost(prevplan))!
if (cost(prevplan) < cost(best)!
best prevplan!
return best!
}!
CS 525
69
Simulated Annealing
SA(q(R1,,Rn))!
{!
best random_plan(q)!
curplan best!
t tinit // temperature!
while (t > 0)!
newplan apply_random_trans(curplan)!
if cost(newplan) < cost(curplan)
!"
!curplan newplan!
else if random() < e-(cost(newplan)-cost(curplan))/t!
curplan newplan!
if (cost(curplan) < cost(best)!
best curplan!
reduce(t)!
return best!
Notes 11 - Physical Optimization
}! CS 525
71
68
Iterative Improvement
Easy to get stuck in local minimum
Idea: Allow transformations that result
in more expensive plans with the hope
to move out of local minima
->Simulated Annealing
CS 525
70
Simulated Annealing
SA(q(R1,,Rn))!
{!
Until cooled down!
best random_plan(q)!
curplan best!
t tinit // temperature!
while (t > 0)!
newplan apply_random_trans(curplan)!
if cost(newplan) < cost(curplan)
!"
!curplan newplan!
else if random() < e-(cost(newplan)-cost(curplan))/t!
curplan newplan!
if (cost(curplan) < cost(best)!
best curplan!Reduce !
Probability to !
reduce(t)!
Chance!
Take bad plan!
return best!
To jump!
Based on temp.!
Notes 11 - Physical Optimization
}! CS 525
72
12
Genetic Algorithms
Represent solutions as sequences
(strings) = genome
Start with random population of
solutions
Iterations = Generations
R3
R4
73
R1 R2
CS 525
Mutation
74
Cross-over
Sub-set exchange
For two solutions find subsequence
equals length with the same set of relations
Example
J1 = 5632478 and J2 = 5674328
Generate J = 5643278
CS 525
75
CS 525
76
77
CS 525
78
13
Iterative Improvement
Simulated Annealing
Genetic Algorithms
Random transformation
Mixing solutions (crossover)
Probabilistic chance to keep solution based on cost
CS 525
CS 525
79
SQL query
82
From Join-Enumeration to
Plan Enumeration
parse tree
convert
answer
CS 525
81
parse
apply laws
improved l.q.p
80
From Join-Enumeration to
Plan Enumeration
CS 525
execute
statistics
Pi
pick best
{(P1,C1),(P2,C2)...}
estimate costs
{P1,P2,..}
CS 525
83
CS 525
84
14
QGM example
SELECT name, city!
FROM!
"(SELECT * !
"FROM person) AS p,!
"(SELECT * !
"FROM address) AS a!
WHERE p.addrId = a.id!
85
Postgres Example
{QUERY
CS 525
86
:commandType 1
:querySource 0
:canSetTag true
:utilityStmt <>
:resultRelation 0
:intoClause <>
:hasAggs false
:hasSubLinks false
:rtable (
{RTE
:alias
{ALIAS
:aliasname p
:colnames <>
}
:eref
{ALIAS
:aliasname p
:colnames ("name" "addrid")
}
:rtekind 1
:subquery
{QUERY
:commandType 1
:querySource 0
:canSetTag true
Determine order
CS 525
87
89
CS 525
88
Subquery Pull-up
SELECT name, city!
FROM!
"(SELECT * !
"FROM person) AS p,!
"(SELECT * !
"FROM address) AS a!
WHERE p.addrId = a.id!
CS 525
90
15
Parameterized Queries
Problem
Nave approach
Example
Webshop
Typical operation: Retrieve product with all
user comments for that product
Same query modulo product id
CS 525
Parameterized Queries
91
Materialized View
Store common parts of the query
-> Optimizing a query with materialized
views
-> Separate topic not covered here
CS 525
93
CS 525
95
92
Parameterized Queries
How to represent varying parts of a
query
Parameters
Query planned with parameters assumed
to be unknown
For execution replace parameters with
concrete values
CS 525
PREPARE statement
In SQL
94
Nested Subqueries
SELECT name!
FROM person p!
WHERE EXISTS (SELECT newspaper !
"
"
FROM hasRead h !
"
"
WHERE h.name = p.name
"
"
"
AND h.newspaper = Tribune)!
CS 525
96
16
If no correlations:
Execute once and cache results
For correlations:
Create plan for query with parameters
CS 525
97
CS 525
99
q outer query!
q inner query!
result execute(q)!
foreach tuple t in result!
qt q(t) // parameterize q with values from t!
result execute (qt)"
evaluate_nested_condition (t,result)"
CS 525
person
SELECT newspaper !
FROM hasRead h !
WHERE h.name = p.name !
AND h.newspaper
"
= Tribune)!
person
hasRead
name
gender
name
newspaper
Alice
female
Alice
Tribune
Bob
male
Alice
Courier
Joe
male
Joe
Courier
CS 525
person
hasRead
SELECT newspaper !
FROM hasRead h !
WHERE h.name = Alice!
AND h.newspaper
"
= Tribune)!
hasRead
gender
name
newspaper
name
gender
name
newspaper
Alice
female
Alice
Tribune
Alice
female
Alice
Tribune
Bob
male
Alice
Courier
Bob
male
Alice
Courier
Joe
male
Joe
Courier
Joe
male
Joe
Courier
101
100
name
CS 525
98
SELECT name!
FROM person p!
WHERE EXISTS (SELECT newspaper !
"
"
FROM hasRead h !
"
"
WHERE h.name = p.name "
"!
AND h.newspaper = Tribune)!
CS 525
102
17
person
SELECT newspaper !
FROM hasRead h !
WHERE h.name = p.name !
AND h.newspaper
"
= Tribune)!
hasRead
person
result
EXISTS evaluates to
true!!
!
Output(Alice)!
hasRead
result
name
gender
name
newspaper
newspaper
name
gender
name
newspaper
newspaper
Alice
female
Alice
Tribune
Tribune
Alice
female
Alice
Tribune
Tribune
Bob
male
Alice
Courier
Bob
male
Alice
Courier
Joe
male
Joe
Courier
Joe
male
Joe
Courier
CS 525
CS 525
103
person
hasRead
name
gender
name
newspaper
Alice
female
Alice
Bob
male
Joe
male
CS 525
person
result
hasRead
name
gender
name
newspaper
Tribune
Alice
female
Alice
Tribune
Alice
Courier
Bob
male
Alice
Courier
Joe
Courier
Joe
male
Joe
Courier
newspaper
105
104
CS 525
result
newspaper
106
If correlated
Improve:
Plan once and substitute parameters
EXISTS: stop processing after first result
IN/ANY: stop after first match
107
Decorrelation:
Turn correlations into join expressions
CS 525
108
18
Equivalences
N-type Nesting
Properties
CS 525
CS 525
109
Example
A-type Nesting
Properties
SELECT name!
FROM orders o!
WHERE o.amount = (SELECT max(amount)!
"
"
FROM orders i)!
CS 525
J-type Nesting
111
Example
CS 525
JA-type Nesting
Properties
Expression equality comparison
Nested query uses equality comparison
with correlated attribute
Nested query uses aggregation and no
GROUP BY
Example
CS 525
110
Properties
Example
SELECT name!
FROM orders o!
WHERE o.cust IN (SELECT cId!
"
"
FROM customer!
"
"
WHERE region = USA)!
SELECT name!
FROM orders o!
WHERE o.amount = (SELECT max(amount)!
"
"
FROM orders i!
"
"
WHERE i.cust = o.cust)!
Notes 11 - Physical Optimization
113
SELECT name!
FROM orders o!
WHERE o.amount IN (SELECT amount!
"
" FROM orders i!
"
" WHERE i.cust = o.cust
!
Notes 11 - Physical Optimization
112
AND i.shop
= New York)!
Unnesting A-type
Move nested query to FROM clause
Turn nested condition (op ANY, IN) into
op with result attribute of nested query
CS 525
114
19
Unnesting N/J-type
Move nested query to FROM clause
Add DISTINCT to SELECT clause of
nested query
Turn equality comparison with
correlated attributes into join conditions
Turn nested condition (op ANY, IN) into
op with result attribute of nested query
CS 525
115
Example
1. To FROM
clause
2. Add
DISTINCT
3. Correlation
to join
4. Nesting
condition to
join
CS 525
Example
1. To FROM
clause
2. Add
DISTINCT
3. Correlation
to join
4. Nesting
condition to
join
CS 525
SELECT name!
FROM orders o, !
(SELECT DISTINCT amount!
FROM orders i!
WHERE i.cust = o.cust
!
AND i.shop = New York) AS sub!
117
1. To FROM
clause
2. Add
DISTINCT
3. Correlation
to join
4. Nesting
condition to
join
CS 525
Example
1. To FROM
clause
2. Add
DISTINCT
3. Correlation
to join
4. Nesting
condition to
join
CS 525
116
SELECT name!
FROM orders o!
WHERE o.amount IN (SELECT amount!
"
" FROM orders i!
"
" WHERE i.cust = o.cust
!
AND i.shop = New York)!
SELECT name!
FROM orders o, !
(SELECT DISTINCT amount, cust!
FROM orders i
!
WHERE i.shop = New York) AS sub!
WHERE sub.cust = o.cust !
Notes 11 - Physical Optimization
118
Unnesting JA-type
SELECT name!
FROM orders o!
WHERE o.amount IN (SELECT amount!
"
" FROM orders i!
"
" WHERE i.cust = o.cust
!
AND i.shop = New York)!
SELECT name!
FROM orders o, !
(SELECT DISTINCT amount, cust!
FROM orders i
!
WHERE i.shop = New York) AS sub!
WHERE sub.cust = o.cust!
AND o.amount = sub.amount!
Notes 11 - Physical Optimization
SELECT name!
FROM orders o, !
(SELECT amount!
FROM orders i!
WHERE i.cust = o.cust
!
AND i.shop = New York) AS sub!
Example
SELECT name!
FROM orders o!
WHERE o.amount IN (SELECT amount!
"
" FROM orders i!
"
" WHERE i.cust = o.cust
!
AND i.shop = New York)!
SELECT name!
FROM orders o!
WHERE o.amount IN (SELECT amount!
"
" FROM orders i!
"
" WHERE i.cust = o.cust
!
AND i.shop = New York)!
119
120
20
Example
1. To FROM
clause
2. Introduce
GROUP BY
and join
conditions
3. Nesting
condition to
join
Example
SELECT name!
FROM orders o!
WHERE o.amount = (SELECT max(amount)!
"
"
FROM orders i!
"
"
WHERE i.cust = o.cust)!
SELECT name!
FROM orders o,!
(SELECT max(amount)!
FROM orders I!
WHERE i.cust = o.cust) sub!
CS 525
1. To FROM
clause
2. Introduce
GROUP BY
and join
conditions
3. Nesting
condition to
join
SELECT name!
FROM orders o,!
(SELECT max(amount) AS ma, i.cust!
FROM orders i!
GROUP BY i.cust) sub!
WHERE sub.cust = o.cust!
AND o.amount = sub.ma!
N(orders) = 1,000,000
V(cust,orders) = 10,000
S(orders) = 1/10 block
M = 10,000
122
SELECT name!
FROM orders o!
WHERE o.amount = (SELECT max(amount)!
"
"
FROM orders i!
"
"
WHERE i.cust = o.cust)!
CS 525
SELECT name!
FROM orders o,!
(SELECT max(amount) AS ma, i.cust!
FROM orders i!
GROUP BY i.cust) sub!
WHERE i.cust = sub.cust!
CS 525
121
Example
1. To FROM
clause
2. Introduce
GROUP BY
and join
conditions
3. Nesting
condition to
join
SELECT name!
FROM orders o!
WHERE o.amount = (SELECT max(amount)!
"
"
FROM orders i!
"
"
WHERE i.cust = o.cust)!
N(orders) =
1,000,000
V(cust,orders) =
10,000
S(orders) =
1/10 block
CS 525
123
SELECT name!
FROM orders o!
WHERE o.amount = (SELECT max(amount)!
"
"
FROM orders i!
"
"
WHERE i.cust = o.cust)!
Inner query:
N(orders) = 1,000,000
V(cust,orders) = 10,000
S(orders) = 1/10 block
M = 10,000
Inner queries:
SELECT name!
FROM orders o!
WHERE o.amount = (SELECT max(amount)!
"
"
FROM orders i!
"
"
WHERE i.cust = o.cust)!
SELECT name!
FROM orders o,!
(SELECT max(amount) AS ma, i.cust!
FROM orders i!
GROUP BY i.cust) sub!
WHERE sub.cust = o.cust!
AND o.amount = sub.ma!
Notes 11 - Physical Optimization
124
SELECT name!
FROM orders o,!
(SELECT max(amount) AS ma, i.cust!
FROM orders i!
GROUP BY i.cust) sub!
WHERE sub.cust = o.cust!
AND o.amount = sub.ma!
Outer query:
125
CS 525
126
21
CS 525
Notes 12 - Transaction
Management
Notes 12 - Transaction
Management
EMP
Name
Age
White
52
Green 3421
Gray
1
CS 525
Notes 12 - Transaction
Management
Notes 12 - Transaction
Management
CS 525
Notes 12 - Transaction
Management
Definition:
Consistent state: satisfies all constraints
Consistent DB: DB in consistent state
CS 525
Notes 12 - Transaction
Management
CS 525
CS 525
Notes 12 - Transaction
Management
account
Acct #
balance deleted?
Notes 12 - Transaction
Management
Reality
DB
CS 525
Notes 12 - Transaction
Management
CS 525
a2
TOT
CS 525
.
.
50
.
.
1000
.
.
150
.
.
1000
Notes 12 - Transaction
Management
.
.
150
.
.
1100
11
Notes 12 - Transaction
Management
10
Transactions
Transaction: Sequence of
operations executed by one
concurrent client that preserve
consistency
CS 525
Notes 12 - Transaction
Management
12
Consistent DB
CS 525
Consistent DB
Notes 12 - Transaction
Management
Correctness
13
Big assumption:
If T starts with consistent state +
T executes in isolation
T leaves consistent state
CS 525
Notes 12 - Transaction
Management
14
Transactions - ACID
(informally)
Atomicity
Either all or no commands of transaction are executed
(their changes are persisted in the DB)
Consistency
After transaction DB is consistent (if before consistent)
Isolation
Transactions are running isolated from each other
Durability
Modifications of transactions are never lost
CS 525
Notes 12 - Transaction
Management
15
CS 525
Notes 12 - Transaction
Management
16
Part 13 (Recovery):
due to failures
Data sharing
CS 525
systems analysts
Notes 12 - Transaction
Management
17
CS 525
Notes 12 - Transaction
Management
18
Data Items:
CS 525
Notes 12 - Transaction
Management
19
CS 525
Notes 12 - Transaction
Management
20
Operations:
Operations:
CS 525
Notes 12 - Transaction
Management
21
Constraint: A=B
T1: A A 2
B B2
CS 525
Notes 12 - Transaction
Management
A: 8
B: 8
memory
CS 525
Notes 12 - Transaction
Management
23
22
CS 525
disk
Notes 12 - Transaction
Management
24
A: 8
B: 8
memory
memory
disk
CS 525
Notes 12 - Transaction
Management
25
Transactions in SQL
CS 525
time
Example
BEGIN WORK;!
UPDATE accounts !
SET bal = bal + 40 !
WHERE acc = 10;!
!
!
!
!
27
Bank customer
transfers money
from account 9
to account 10
BEGIN WORK;!
UPDATE accounts!
SET bal = bal * 1.05;!
COMMIT;!
UPDATE accounts !
SET bal = bal - 40 !
WHERE acc = 9;!
COMMIT;!
CS 525
Notes 12 - Transaction
Management
BEGIN WORK;!
UPDATE accounts!
SET bal = bal * 1.05;!
COMMIT;!
UPDATE accounts !
SET bal = bal - 40 !
WHERE acc = 9;!
COMMIT;!
ABORT/ROLLBACK
Notes 12 - Transaction
Management
26
Example
COMMIT
time
disk
Notes 12 - Transaction
Management
BEGIN WORK;!
UPDATE accounts !
SET bal = bal + 40 !
WHERE acc = 10;!
BEGIN WORK
CS 525
A: 8 16
B: 8
CS 525
time
Notes 12 - Transaction
Management
Example
BEGIN WORK;!
UPDATE accounts !
SET bal = bal + 40 !
WHERE acc = 10;!
!
!
!
!
28
BEGIN WORK;!
UPDATE accounts!
SET bal = bal * 1.05;!
COMMIT;!
UPDATE accounts !
SET bal = bal - 40 !
WHERE acc = 9;!
COMMIT;!
29
CS 525
Notes 12 - Transaction
Management
30
Potential Problems:
1. Transactions are interrupted
time
No reduction in bal of acc 9
Only some accounts got
BEGIN WORK;!
interest
UPDATE accounts ! 2. Interleaving of Transaction
SET bal = bal + 40 ! Acc 9 too much interest
WHERE acc = 10;!
(before
has been
BEGIN 40
WORK;!
!
deducted)
UPDATE accounts!
!
CS 525
Notes 12 - Transaction
Management
31
CS 525
Notes 12 - Transaction
Management
32
T1=r1(a10),w1(a10),r1(a9),w1(a9),c1 !
Ttime
2=r2(a1),w2(a1),r2(a2),w2(a2),r2(a9),w2(a9),r2(a10),w2(a10),c1 !
time
BEGIN WORK;!
UPDATE accounts !
SET bal = bal + 40 !
WHERE acc = 10;!
BEGIN WORK;!
UPDATE accounts !
SET bal = bal + 40 !
WHERE acc = 10;!
!
!
!
!
!
!
!
UPDATE accounts !
SET bal = bal - 40 !
WHERE acc = 9;!
COMMIT;!
UPDATE accounts !
SET bal = bal - 40 !
WHERE acc = 9;!
COMMIT;!
CS 525
Notes 12 - Transaction
Management
33
CS 525
Schedules
Notes 12 - Transaction
Management
34
Note
Notes 12 - Transaction
Management
35
CS 525
Notes 12 - Transaction
Management
36
Conflicting Operations
Two operations are conflicting if
At least one of them is a write
Both are accessing the same data item
Intuition
The order of execution for conflicting
operations can influence result!
CS 525
Notes 12 - Transaction
Management
37
Conflicting Operations
Examples
w1(X), r2(X) are conflicting
w1(X), w2(Y) are not conflicting
r1(X), r2(X) are not conflicting
w1(X), w1(X) are not conflicting
CS 525
Notes 12 - Transaction
Management
39
CS 525
Notes 12 - Transaction
Management
38
CS 525
T1=r1(a10),w1(a10),r1(a9),w1(a9),c1 !
Notes 12 - Transaction
Management
40
T1=r1(a10),w1(a10),r1(a9),w1(a9),c1 !
Ttime
2=r2(a1),w2(a1),r2(a2),w2(a2),r2(a9),w2(a9),r2(a10),w2(a10),c1 !
Ttime
2=r2(a1),w2(a1),r2(a2),w2(a2),r2(a9),w2(a9),r2(a10),w2(a10),c1 !
Conflicting operations
Complete Schedule
S=r2(a1),r1(a10),w2(a1),r2(a2),w1(a10),w2(a2),r2(a9),w2(a9),
r1(a9),w1(a9),c1 r2(a10),w2(a10),c1 !
Incomplete Schedule
S=r2(a1),r1(a10),w2(a1),w1(a10)!
S1 = w2(a1) w1(a10)!
Not a Schedule
S=r2(a1),r1(a10),c1 !
CS 525
S2 = w1(a1) w2(a10)!
Notes 12 - Transaction
Management
41
CS 525
Notes 12 - Transaction
Management
42
Why Schedules?
Study properties of different execution
orders
Easy/Possible to recover after failure
Isolation
-> preserve ACID properties
Notes 12 - Transaction
Management
43
Now
CS 525: Advanced Database
Organization
Crash recovery
CS 525
Correctness
CS 525
(informally)
Transaction bug
DBMS bug
Hardware failure
e.g., disk crash alters balance of account
Data sharing
e.g.: T1: give 10% raise to programmers
T2: change programmers
CS 525
Recovery
CS 525
Events
CS 525
CS 525
systems analysts
Desired
Undesired
Expected
Unexpected
memory
CS 525
processor
disk
CS 525
Undesired Unexpected:
Everything else!
Examples:
Disk data is lost
Memory lost without CPU halt
CPU implodes wiping out universe.
Undesired Unexpected:
CS 525
Everything else!
CS 525
10
Storage hierarchy
11
Memory
Disk
DB Buffer
CS 525
12
Operations:
Operations:
CS 525
13
Constraint: A=B
T1: A A 2
B B2
CS 525
A: 8
B: 8
memory
CS 525
15
CS 525
disk
16
A: 8
B: 8
memory
CS 525
14
memory
disk
17
A: 8 16
B: 8
CS 525
disk
18
Need atomicity:
execute all actions of a transaction or
none at all
19
CS 525
20
T1 is unfinished
T1: Read (A,t); t t2
-> need to undo the
Write (A,t);
write to A to recover
Read (B,t); t t2
to consistent state
Write (B,t);
Output (A);
failure!
Output (B);
We need to either
Store additional data to be able to Undo/Redo
Avoid ending up in situations where we need to
Undo/Redo
A: 8 16
B: 8
memory
CS 525
21
Logging
CS 525
disk
22
We need to know
Which operations have been executed
Which operations are reflected on disk
CS 525
23
CS 525
24
CS 525
25
No Undo
Redo
Undo
No Redo
Redo
Undo
No steal No Redo
steal
CS 525
No force
27
26
CS 525
28
T1 = w1(X),c1 !
T2 = r2(X),w2(X),c2 !
Recoverable Schedules
CS 525
29
S1 = w1(X),r2(X),w2(X),c1,c2 !
Nonrecoverable Schedule
S2 = w1(X),r2(X),w2(X),c2,c1 !
CS 525
Notes 12 - Transaction
Management
30
Cascading Abort
Cascadeless Schedules
CS 525
31
CS 525
T1 = w1(X),c1 !
T1 = w1(X),a1 !
T2 = r2(X),w2(X),c2 !
T2 = r2(X),w2(X),c2 !
Consider what
happens if T1
aborts!
S1 = w1(X),a1,r2(X),w2(X),c2 !
S2 = w1(X),r2(X),w2(X),c1,c2 !
Nonrecoverable Schedule
Nonrecoverable Schedule
S3 = w1(X),r2(X),w2(X),c2,c1 !
Notes 12 - Transaction
Management
32
CS 525
S3 = w1(X),r2(X),w2(X),c2,a1 !
33
CS 525
Notes 12 - Transaction
Management
34
T1 = w1(X),c1 !
T2 = r2(X),w2(X),c2 !
Strict Schedules
S1 = w1(X),c1,r2(X),w2(X),c2 !
Nonrecoverable Schedule
S3 = w1(X),r2(X),w2(X),c2,c1 !
CS 525
35
CS 525
Notes 12 - Transaction
Management
36
All schedules
Compare Classes
Recoverable schedules
ST CL RC ALL
Cascadeless schedules
Strict
schedules
CS 525
37
CS 525
38
(immediate
modification)
CS 525
CS 525
39
Undo logging
One solution: undo logging
(immediate
modification)
A:8
B:8
41
CS 525
A=B
A:8
B:8
memory
CS 525
(Immediate modification)
40
disk
log
42
Undo logging
Undo logging
(Immediate modification)
(Immediate modification)
A=B
A=B
<T1, start>
<T1, A, 8>
A:8 16
B:8 16
memory
CS 525
A:8 16
B:8 16
A:8
B:8
disk
Undo logging
A:8 16
B:8 16
A:8 16
B:8 16
CS 525
A=B
<T1, start>
<T1, A, 8>
<T1, B, 8>
<T1, commit>
A:8 16
B:8 16
memory
log
45
log
44
(Immediate modification)
<T1, start>
<T1, A, 8>
<T1, B, 8>
disk
disk
Undo logging
A=B
A:8 16
B:8 16
memory
CS 525
CS 525
(Immediate modification)
A:8 16
B:8
memory
log
43
<T1, start>
<T1, A, 8>
<T1, B, 8>
disk
log
46
One complication
One complication
memory
A: 8 16
B: 8 16
Log:
<T1,start>
<T1, A, 8>
<T1, B, 8>
CS 525
A: 8
B: 8
memory
DB
Log
47
A: 8 16
B: 8 16
Log:
<T1,start>
<T1, A, 8>
<T1, B, 8>
CS 525
A: 8 16
B: 8
DB
BAD STATE
#1
Log
48
One complication
memory
A: 8 16
B: 8
A: 8 16
B: 8 16
Log:
<T1,start>
<T1, A, 8>
<T1, B, 8>
<T1, commit>
CS 525
DB
BAD STATE
#2
...
Log
<T1, B, 8>
<T1, commit>
Recovery rules:
49
Undo logging
CS 525
Recovery rules:
50
Undo logging
CS 525
Recovery rules:
51
Undo logging
- write (X, v)
- output (X)
CS 525
53
52
Question
CS 525
time/log
T2 write A
54
Undo is idempotent
55
CS 525
Logical operations
Operation to execute for undo/redo
E.g., delete record x
56
To discuss:
Redo logging
Undo/redo logging, why both?
Real world actions
Checkpoints
Media failures
Hybrid (Physiological)
Delete record x from page y
CS 525
CS 525
57
58
A: 8
B: 8
memory
A: 8 16
B: 8 16
A: 8
B: 8
DB
memory
<T1, start>
<T1, A, 16>
<T1, B, 16>
<T1, commit>
A: 8
B: 8
DB
LOG
CS 525
59
LOG
CS 525
60
10
A: 8 16
B: 8 16
<T1, start>
<T1, A, 16>
<T1, B, 16>
<T1, commit>
output
A: 8 16
B: 8 16
A: 8 16
B: 8 16
DB
memory
<T1, start>
<T1, A, 16>
<T1, B, 16>
<T1, commit>
<T1, end>
output
A: 8 16
B: 8 16
DB
memory
LOG
CS 525
61
Recovery rules:
63
Redo logging
IS THIS CORRECT??
CS 525
Recovery rules:
LOG
CS 525
65
62
Redo logging
CS 525
Recovery rules:
64
Redo logging
66
11
CS 525
Actions:
write X
output X
write X
output X
write X
output X
write X
output X
combined <end> (checkpoint)
69
CS 525
...
<T3,C,21>
...
<T2,commit>
...
<T2,B,17>
...
Checkpoint
<T1,commit>
<T1,A,16>
...
68
Periodically:
(1) Do not accept new transactions
(2) Wait until all transactions finish
(3) Flush all log records to disk (log)
(4) Flush all buffers to disk (DB) (do not discard buffers)
(5) Write checkpoint record on disk (log)
(6) Resume transaction processing
...
Actions:
write X
output X
write X
output X
write X
output X
write X
output X
Solution: Checkpoint
CS 525
CS 525
67
70
Advantage of Checkpoints
Limits recovery to parts of the log after
the checkpoint
Crash
Source of backups
If we backup checkpoints we can use them
for media recovery!
CS 525
71
CS 525
72
12
Checkpoints Justification
Checkpoint should be consistent DB
state
No active transactions
Key drawbacks:
Undo logging:
cannot bring backup DB copies up to date
Redo logging:
need to keep all modified blocks in memory
until commit
73
CS 525
75
CS 525
CS 525
76
Checkpoint Cost
...
...
...
...
<T1, commit>
...
<checkpoint>
log (disk):
...
74
Rules
CS 525
77
Crash
CS 525
78
13
Non-quiesce checkpoint
L
O
G
Start-ckpt
active TR:
Ti,T2,...
end
ckpt
...
no T1 commit
...
L
O
G
...
...
for
undo
79
CS 525
...
T1,a
Undo T1
CS 525
...
Ckpt
T1
Ckpt
end
...
81
...
T1b
...
Ckpt
end
...
80
...
T1b
Example
L
O
G
...
T1
ckpt-s
T1
ckptT1
T1
...
...
...
...
...
...
a
T1
b
end
c
cmt
CS 525
82
T1
ckpt-s
T1
ckptT1
T1
...
...
...
...
...
...
a
T1
b
end
c
cmt
L
O ...
G
ckpt
start
...
ckpt
end
...
T1
T1
ckpt... start ...
...
b
c
start
of latest
valid
checkpoint
CS 525
Ckpt
T1
(undo a,b)
Example
L
O
G
...
dirty buffer
pool pages
flushed
CS 525
L
O
G
T1,a
...
83
CS 525
84
14
Recovery process:
Backwards pass
Forward pass
forward pass
Notes 13 - Failure and Recovery
85
CS 525
Solution
86
ATM
Give$$
(amt, Tid, time)
lastTid:
time:
give(amt)
CS 525
87
CS 525
88
A: 16
A: 16
89
CS 525
90
15
Example #2
X1
CS 525
X3
X2
91
CS 525
active
database
log
92
log
backup
database
Redundant writes,
Single reads
db
dump
last
needed
undo
checkpoint
time
CS 525
93
CS 525
94
Underlying Ideas
Keep track of state of pages by relating them to
entries in the log
WAL
Recovery in three phases
Analysis, Redo, Undo
DB2
MSSQL
CS 525
95
CS 525
96
16
LSN
Entry type
Update, compensation, commit,
LSN
TID
Transaction identifier
PrevLSN
LSN of previous log record for same transaction
UndoNxtLSN
Next undo operation for CLR (later!)
Undo/Redo data
Data needed to undo/redo the update
CS 525
97
CS 525
CS 525
99
Entries <PageID,RecLSN>
Whenever a page is first fixed in the buffer
pool with indention to modify
98
Forward Processing
Normal operations when no ROLLBACK is
required
WAL: write redo/undo log record for each action
of a transaction
100
101
CS 525
102
17
Transaction Table:
<1, U, -, ->
Transaction Table
Page_LSN:
LSN of last
modification to page
T1=r1(A),A=A*2,w1(A)!
buffer
TransID
Persistent
log
disk
State
Commit state
LastLSN
UndoNxtLSN
A: 16
B: 16
CS 525
Transaction Table:
<1, U, -, ->
CS 525
103
disk
13
A: 16
B: 16
A: 16
B: 16
105
CS 525
Update page
buffer
14
14
A: 32
B: 16
A: 32
B: 16
CS 525
T1=r1(A),A=A*2,w1(A)!
A: 16
B: 16
Persistent
log
<13,U,2,10,-,-A=3+A=16>
13
<14,U,1,-,-,-A=16+A=32>
107
106
disk
<13,U,2,10,-,-A=3+A=16>
13
<13,U,2,10,-,-A=3+A=16>
A: 16
B: 16
Transaction Table:
<1, U, 14, 14>
Persistent
log
disk
Persistent
log
13
<14,U,1,-,-,-A=16+A=32>
T1=r1(A),A=A*2,w1(A)!
<14,U,1,-,-,-A=16+A=32>
disk
<13,U,2,10,-,-A=3+A=16>
13
buffer
buffer
A: 16
B: 16
Transaction Table:
<1, U, 14, 14>
104
T1=r1(A),A=A*2,w1(A)!
Persistent
log
13
CS 525
Transaction Table:
<1, U, -, ->
T1=r1(A),A=A*2,w1(A)!
buffer
<13,U,2,10,-,-A=3+A=16>
13
CS 525
A: 16
B: 16
108
18
CS 525
Transaction Table:
<1, U, 4, 4>
109
CS 525
Transaction Table:
<1, U, 5, 3>
Undo T1
buffer
4
A: 4
B: 5
<1,U,1,-,-,-A=3+A=6>
<2,U,1,1,-,-B=10+B=5>
<3,U,1,2,-,-C=5+C=10>
<4,U,1,3,-,-A=6+A=4>
<5,CLR,1,-,3,+A=6>
C: 10
D: 20
CS 525
Transaction Table:
<1, U, 6, 2>
111
CS 525
Transaction Table:
<1, U, 7, 1>
Undo T1
112
Undo T1
buffer
buffer
5
CS 525
A: 6
B: 5
C: 10
D: 20
<1,U,1,-,-,-A=3+A=6>
<2,U,1,1,-,-B=10+B=5>
<3,U,1,2,-,-C=5+C=10>
<4,U,1,3,-,-A=6+A=4>
<5,CLR,1,-,3,+A=6>
<6,CLR,1,-,2,+C=5>
Undo T1
buffer
<1,U,1,-,-,-A=3+A=6>
<2,U,1,1,-,-B=10+B=5>
<3,U,1,2,-,-C=5+C=10>
<4,U,1,3,-,-A=6+A=4>
110
A: 6
B: 5
<1,U,1,-,-,-A=3+A=6>
<2,U,1,1,-,-B=10+B=5>
<3,U,1,2,-,-C=5+C=10>
<4,U,1,3,-,-A=6+A=4>
<5,CLR,1,-,3,+A=6>
<6,CLR,1,-,2,+C=5>
<7,CLR,1,-,1,+B=10>
C: 5
D: 20
113
CS 525
A: 6
B: 10
6
C: 5
D: 20
114
19
Transaction Table:
<1, U, 8, ->
Undo T1
buffer
Begin of checkpoint
Write begin_cp log entry
Write end_cp log entry with
A: 3
B: 10
<1,U,1,-,-,-A=3+A=6>
<2,U,1,1,-,-B=10+B=5>
<3,U,1,2,-,-C=5+C=10>
<4,U,1,3,-,-A=6+A=4>
<5,CLR,1,-,3,+A=6>
<6,CLR,1,-,2,+C=5>
<7,CLR,1,-,1,+B=10>
<8,CLR,1,-,-,+A=3>
CS 525
Master Record
C: 5
D: 20
CS 525
Restart Recovery
1. Analysis Phase
2. Redo Phase
3. Undo Phase
CS 525
Analysis Phase
1) Determine LSN of last checkpoint
using Master Record
2) Get Dirty Page Table and Transaction
Table from checkpoint end record
3) RedoLSN = min(RecLSN) from Dirty
Page Table or checkpoint LSN if no dirty
page
117
CS 525
Analysis Phase
4) Scan log forward starting from
RedoLSN
If necessary: Add Page to Dirty Page Table
Add Transaction to Transaction Table or update
LastLSN
Result
Transaction Table
Transactions to be later undone
RedoLSN
Log entry to start Redo Phase
119
118
Analysis Phase
CS 525
116
CS 525
120
20
Redo Phase
Redo Phase
121
CS 525
Redo Phase
122
Undo Phase
Result
CS 525
123
CS 525
Undo Phase
124
Idempotence?
Redo
We are not logging during Redo so
repeated Redo will result in the same state
Undo
If we see CLRs we do not undo this action
again
CS 525
125
CS 525
126
21
T1 = w1(A),
T2 =
r(A), w(A)!
Redo
Log
<1,begin(T1), ->
<2,begin(T2), ->
<3,write(A,T1),1>
<4,write(X,T2),2>
<5,write(B,T1),3>
<6,write(C,T1),5>
<7,write(A,T1),6>
<8,commit(T1),7>
<9,write(A,T2),4>
Undo
If we see CLRs we do not undo this action
again
CS 525
T1 = w1(A),
T2 =
CS 525
T1 = w1(A),
Log
<1,begin(T1), ->
<2,begin(T2), ->
<3,write(A,T1),1>
<4,write(X,T2),2>
<5,write(B,T1),3>
<6,write(C,T1),5>
<7,write(A,T1),6>
<8,commit(T1),7>
<9,write(A,T2),4>
129
CS 525
Log
<1,begin(T1), ->
<2,begin(T2), ->
<3,write(A,T1),1>
<4,write(X,T2),2>
<5,write(B,T1),3>
<6,write(C,T1),5>
<7,write(A,T1),6>
<8,commit(T1),7>
<9,write(A,T2),4>
131
w1(X),
r(A), w(A)!
T1 = w1(A),
r(A), w(A)!
T2 =
128
CS 525
T1 = w1(A),
r(A), w(A)!
Analysis Phase:
- start at log entry 1
- add T1 to transaction table (rec. 1)
- add T2 to transaction table (rec. 2)
- add A to dirty page table (RecLSN 3)
- add X to dirty page table (RecLSN 4)
- add B to dirty page table (RecLSN 5)
- add C to dirtypage table (RecLSN 6)
- remove T1 from Transaction Table (rec. 8)
T2 =
CS 525
127
T2 =
Log
<1,begin(T1), ->
<2,begin(T2), ->
<3,write(A,T1),1>
<4,write(X,T2),2>
<5,write(B,T1),3>
<6,write(C,T1),5>
<7,write(A,T1),6>
<8,commit(T1),7>
<9,write(A,T2),4>
130
r(A), w(A)!
CS 525
Log
<1,begin(T1), ->
<2,begin(T2), ->
<3,write(A,T1),1>
<4,write(X,T2),2>
<5,write(B,T1),3>
<6,write(C,T1),5>
<7,write(A,T1),6>
<8,commit(T1),7>
<9,write(A,T2),4>
<10,CLR(A,T2),4>
<11,CLR(X,T2),->
132
22
Media Recovery
What if disks where log or DB is stored
failes
->keep backups of log + DB state
133
CS 525
Log Backup
134
Backup DB state
CS 525
CS 525
135
136
Summary
Consistency of data
One source of problems: failures
- Logging
- Redundancy
CS 525
137
23
T1
14: Concurrency
Control
T2
Tn
DB
(consistency
constraints)
Boris Glavic
Slides: adapted from a course taught by
Hector Garcia-Molina, Stanford InfoLab
CS 525
CS 525
T1: Read(A)
A A+100
Write(A)
Read(B)
B B+100
Write(B)
Constraint: A=B
T1
Read(A); A A+100
Write(A);
Read(B); B B+100;
Write(B);
T2: Read(A)
A A2
Write(A)
Read(B)
B B2
Write(B)
T2
Read(A);A A2;
Write(A);
Read(B);B B2;
Write(B);
CS 525
Schedule A
Schedule B
T1
Read(A); A A+100
Write(A);
Read(B); B B+100;
Write(B);
A
25
T2
B
25
T1
T2
Read(A);A A2;
125
Write(A);
Read(B);B B2;
125
Read(A);A A2;
Write(A);
250
Read(B);B B2;
Write(B);
250
CS 525
Schedule A
Example:
CS 525
250
250
Write(B);
Read(A); A A+100
Write(A);
Read(B); B B+100;
Write(B);
CS 525
Schedule B
Schedule C
T1
A
25
T2
B
25
Read(A);A A2;
50
Write(A);
T1
Read(A); A A+100
Write(A);
T2
Read(A);A A2;
Read(B);B B2;
50
Write(B);
Read(A); A A+100
Write(A);
Read(B); B B+100;
Write(B);
150
Read(B);B B2;
150
CS 525
Write(A);
Read(B); B B+100;
Write(B);
150
150
Write(B);
CS 525
Schedule C
Schedule D
T1
Read(A); A A+100
Write(A);
A
25
T2
B
25
125
T1
Read(A); A A+100
Write(A);
T2
Read(A);A A2;
Read(A);A A2;
250
Write(A);
Read(B); B B+100;
Write(B);
Write(A);
Read(B);B B2;
125
Read(B);B B2;
Write(B);
250
CS 525
250
250
Write(B);
Read(B); B B+100;
Write(B);
CS 525
Schedule D
A
25
T2
B
25
125
T1
Read(A); A A+100
Write(A);
Read(A);A A2;
Write(A);
Read(B);B B2;
Read(B);B B1;
50
Write(B);
Read(B); B B+100;
Write(B);
250
Notes 14 - Concurrency Control
11
T2
Read(A);A A1;
250
Write(A);
10
Same as Schedule D
but with new T2
Schedule E
T1
Read(A); A A+100
Write(A);
CS 525
150
150
Write(B);
Read(B); B B+100;
Write(B);
CS 525
12
Same as Schedule D
but with new T2
Schedule E
T1
Read(A); A A+100
Write(A);
A
25
T2
Serial Schedules
B
25
125
Read(A);A A1;
125
Write(A);
Read(B);B B1;
Read(B); B B+100;
Write(B);
125
CS 525
25
Write(B);
125
125
13
CS 525
14
T1 = r1(A),w1(A),r1(B),w1(B),c1 !
T2 = r2(A),w2(A),r2(B),w2(B),c2 !
Serial Schedule
No transactions are interleaved
S1 = r2(A),w2(A),r2(B),w2(B),c2,r1(A),w1(A),r1(B),w1(B),c1 !
Nonserial Schedule
S2 = r2(A),w2(A),r1(A),w1(A),r2(B),w2(B),c2,r1(B),w1(B),c1 !
!
CS 525
15
CS 525
Notes 12 - Transaction
Management
16
Compare Classes
Cascadeless (CL)
S ST CL RC ALL
Strict (ST)
Serial (S)
Abbreviations
S = Serial
ST = Strict
CL = Cascadeless
RC = Recoverable
ALL = all possible schedules
CS 525
17
CS 525
18
No concurrency!
CS 525
19
Outline
Since serial schedules have good
properties we would like our schedules
to behave like (be equivalent to) serial
schedules
1. Need to define equivalence based solely
on order of operations
2. Need to define class of schedules which is
equivalent to serial schedule
3. Need to design scheduler that guarantees
that we only get these good schedules
CS 525
21
CS 525
Example:
Sc=r1(A)w1(A)r2(A)w2(A)r1(B)w1(B)r2(B)w2(B)
Sc =r1(A)w1(A) r1(B)w1(B)r2(A)w2(A)r2(B)w2(B)
T1
CS 525
as a matter of fact,
T2 must precede T1
in any equivalent schedule,
i.e., T2 T1
Notes 14 - Concurrency Control
23
T2
22
T2 T1
Also, T1 T2
T1
CS 525
20
CS 525
T2
Sd cannot be rearranged
into a serial schedule
Sd is not equivalent to
any serial schedule
Sd is bad
Notes 14 - Concurrency Control
24
Returning to Sc
Returning to Sc
Sc=r1(A)w1(A)r2(A)w2(A)r1(B)w1(B)r2(B)w2(B)
T1 T2
Sc=r1(A)w1(A)r2(A)w2(A)r1(B)w1(B)r2(B)w2(B)
T1 T2
T1 T2
T1 T2
no cycles Sc is equivalent to a
serial schedule
(in this case T1,T2)
CS 525
25
CS 525
26
Concepts
Transaction: sequence of ri(x), wi(x) actions
Conflicting actions: r1(A) w2(A) w1(A)
w2(A) r1(A) w2(A)
Schedule: represents chronological order
in which actions are executed
Serial schedule: no interleaving of actions
or transactions
CS 525
27
Ti issues System
read(x,t) issues
input(x)
CS 525
Input(X)
t
completes
x
time
28
Input(X)
t
completes
x
time
T2 issues
input(B)
System
write(B,S)
completes
issues
System
output(B)
issues
output(B)
BS
input(B)
completes
CS 525
29
CS 525
30
end r1(A)
start r1(A)
time
end w2(A)
start w2(A)
end r1(A)
time
end w2(A)
31
CS 525
Outline
Since serial schedules have good
properties we would like our schedules
to behave like (be equivalent to) serial
schedules
32
Conflict Equivalence
Define equivalence based on the order
of conflicting actions
33
34
Outline
Definition
S1, S2 are conflict equivalent schedules
if S1 can be transformed into S2 by a
series of swaps on non-conflicting
actions.
Alternatively:
If the order of conflicting actions in S1
and S2 is the same
CS 525
CS 525
35
36
How to check?
Definition
A schedule is conflict serializable (CSR) if
it is conflict equivalent to some serial
schedule.
37
is schedule)
Nodes: transactions in S
Arcs: Ti Tj whenever
- pi(A), qj(A) are actions in S
- pi(A) <S qj(A)
- at least one of pi, qj is a write
CS 525
38
Exercise:
What is P(S) for
S = w3(A) w2(C) r1(A) w1(B) r1(C) w2(A) r4(A) w4(D)
Is S serializable?
CS 525
39
Exercise:
40
Exercise:
T1
T2
T1
T2
T3
T4
T3
T4
Is S serializable?
CS 525
CS 525
Is S serializable?
41
CS 525
42
Another Exercise:
Another Exercise:
CS 525
43
CS 525
T1
T2
T3
T4
44
Lemma
Lemma
CS 525
45
CS 525
46
CS 525
47
S1=w1(A) r2(A)
w2(B) r1(B)
S2=r2(A) w1(A)
r1(B) w2(B)
CS 525
48
Theorem
Theorem
P(S1) acyclic S1 conflict serializable
() Assume S1 is conflict serializable
Ss: Ss, S1 conflict equivalent
P(Ss) = P(S1)
P(S1) acyclic since P(Ss) is acyclic
CS 525
49
CS 525
Dirty reads
Non-repeatable reads
Phantom reads (later)
52
T2
Non-repeatable Read
T1
Read(A), A += 100
Write(A);
T2
Read(A), A +=200
Abort
Write(A);
A=50
T1: A = 150
A = 150
T2: A = 350
S1 = r1(A),w1(A),r2(A),a1,w2(A) !
CS 525
CS 525
51
T1
Dirty Read
50
Dirty Read
CS 525
53
CS 525
54
T1
Inconsistent Read
T1
Read(A)
T2
A = 100
Read(A), A /= 2
Write(A)
Commit
A = 50
A = 50
Read(A)
Commit
S1
CS 525
T2
= r1(A),r2(A),w2(A),c2,r1(A),c1 !
Notes 14 - Concurrency Control
55
CS 525
57
DB
CS 525
59
58
A locking protocol
CS 525
56
li (A)
ui (A)
T2
scheduler
CS 525
lock
table
60
10
Rule #2
61
Legal scheduler
no lj(A)
4) Only one transaction can hold a lock
on A at the same time
CS 525
62
Exercise:
Exercise:
S1 = l1(A)l1(B)r1(A)w1(B)l2(B)u1(A)u1(B)
r2(B)w2(B)u2(B)l3(B)r3(B)u3(B)
S1 = l1(A)l1(B)r1(A)w1(B)l2(B)u1(A)u1(B)
r2(B)w2(B)u2(B)l3(B)r3(B)u3(B)
S2 = l1(A)r1(A)w1(B)u1(A)u1(B)
l2(B)r2(B)w2(B)l3(B)r3(B)u3(B)
S2 = l1(A)r1(A)w1(B)u1(A)u1(B)
l2(B)r2(B)w2(B)l3(B)r3(B)u3(B)
S3 = l1(A)r1(A)u1(A)l1(B)w1(B)u1(B)
l2(B)r2(B)w2(B)u2(B)l3(B)r3(B)u3(B)
S3 = l1(A)r1(A)u1(A)l1(B)w1(B)u1(B)
l2(B)r2(B)w2(B)u2(B)l3(B)r3(B)u3(B)
CS 525
63
64
Schedule F
Schedule F
A B
T1
T2
l1(A);Read(A)
A A+100;Write(A);u1(A)
l2(A);Read(A)
A Ax2;Write(A);u2(A)
l2(B);Read(B)
B Bx2;Write(B);u2(B)
l1(B);Read(B)
B B+100;Write(B);u1(B)
CS 525
CS 525
65
T1
T2
25 25
l1(A);Read(A)
A A+100;Write(A);u1(A)
125
l2(A);Read(A)
A Ax2;Write(A);u2(A) 250
l2(B);Read(B)
B Bx2;Write(B);u2(B)
50
l1(B);Read(B)
B B+100;Write(B);u1(B)
150
250 150
CS 525
66
11
# locks
held by
Ti
no locks
Time
Growing
Phase
67
CS 525
Shrinking
Phase
68
Schedule G
Schedule G
T1
l1(A);Read(A)
A A+100;Write(A)
l1(B); u1(A)
T2
delayed
T1
l1(A);Read(A)
A A+100;Write(A)
l1(B); u1(A)
l2(A);Read(A)
A Ax2;Write(A);l2(B)
T2
delayed
l2(A);Read(A)
A Ax2;Write(A);l2(B)
Read(B);B B+100
Write(B); u1(B)
CS 525
69
Schedule G
CS 525
Schedule H
T1
l1(A);Read(A)
A A+100;Write(A)
l1(B); u1(A)
70
(T2 reversed)
T2
delayed
l2(A);Read(A)
A Ax2;Write(A);l2(B)
T1
l1(A); Read(A)
A A+100;Write(A)
l1(B)
delayed
Read(B);B B+100
Write(B); u1(B)
T2
l2(B);Read(B)
B Bx2;Write(B)
l2(A)
delayed
l2(B); u2(A);Read(B)
B Bx2;Write(B);u2(B);
CS 525
71
CS 525
72
12
Deadlock
Two or more transactions are waiting
for each other to release a lock
In the example
T1 is waiting for T2 and is making no
progress
T2 is waiting for T1 and is making no
progress
-> if we do not do anything they would
wait forever
CS 525
73
E.g., Schedule H =
This space intentionally
left blank!
CS 525
Next step:
74
CS 525
75
CS 525
76
CS 525
77
CS 525
78
13
Lemma
Ti Tj in S SH(Ti) <S SH(Tj)
Lemma
Ti Tj in S SH(Ti) <S SH(Tj)
Proof of lemma:
Ti Tj means that
S = pi(A) qj(A) ; p,q conflict
By rules 1,2:
S = pi(A) ui(A) lj(A) ... qj(A)
CS 525
79
Lemma
Ti Tj in S SH(Ti) <S SH(Tj)
Proof of lemma:
Ti Tj means that
S = pi(A) qj(A) ; p,q conflict
By rules 1,2:
S = pi(A) ui(A) lj(A) ... qj(A)
By rule 3:
SH(Ti)
SH(Tj)
CS 525
80
81
CS 525
82
CS 525
83
Serial (S)
CS 525
84
14
85
CS 525
86
Shared locks
So far:
S = ...l1(A) r1(A) u1(A) l2(A) r2(A) u2(A)
Do not conflict
CS 525
87
Shared locks
So far:
S = ...l1(A) r1(A) u1(A) l2(A) r2(A) u2(A)
Do not conflict
Instead:
S=... ls1(A) r1(A) ls2(A) r2(A) . us1(A) us2(A)
CS 525
89
CS 525
88
Lock actions
l-ti(A): lock A in t mode (t is S or X)
u-ti(A): unlock t mode (t is S or X)
Shorthand:
ui(A): unlock whatever modes
Ti has locked A
CS 525
90
15
Rule #1
CS 525
91
CS 525
no l-Xj(A)
S = ... l-Xi(A)
93
S
X
CS 525
S
true
false
CS 525
Rule # 3
Compatibility matrix
X
false
false
95
ui(A)
no l-Xj(A)
no l-Sj(A)
Think of
- Get 2nd lock on A, or
- Drop S, get X lock
Comp
92
S = ....l-Si(A) ui(A)
Option 2: Upgrade
CS 525
94
2PL transactions
CS 525
96
16
Detail:
l-ti(A), l-rj(A) do not conflict if comp(t,r)
l-ti(A), u-rj(A) do not conflict if comp(t,r)
CS 525
97
CS 525
98
S
X
I
INi(A)
CS 525
99
CS 525
100
Update locks
Comp
S
X
I
CS 525
S
T
F
F
X
F
F
F
I
F
F
T
101
102
17
Solution
New request
CS 525
103
New request
Comp
Lock
already
held in
S
S
T
X
F
U TorF
X
F
F
F
U
T
F
F
Comp
Lock
already
held in
CS 525
S
X
U
104
CS 525
105
CS 525
106
S1=...l-S1(A)l-S2(A)l-U3(A) l-S4(A)?
l-U4(A)?
107
CS 525
108
18
#
locks
time
109
CS 525
110
Scheduler, part I
2PL (2PL)
lock
table
SS2PL (SS2PL)
l(A),Read(A),l(B),Write(B)
Scheduler, part II
Serial (S)
Read(A),Write(B)
DB
CS 525
Lock table
111
CS 525
112
Conceptually
A
H
...
...
CS 525
...
A
B
C
113
CS 525
114
19
T1
no
T2
no
T3
yes
Relation A
Relation B
Tuple A
Disk
block
A
Disk
block
B
DB
DB
...
Tuple B
Tuple C
...
Object:A
Group mode:U
Waiting:yes
List:
...
To other T3
records
DB
CS 525
115
CS 525
116
CS 525
117
CS 525
Example
Stall 2
Stall 3
118
R1
Stall 4
t1
t2
t4
t3
restroom
hall
CS 525
119
CS 525
120
20
Example
Example
T1(IS)
T1(IS) , T2(S)
R1
t1
t2
R1
t1
t4
t3
T1(S)
CS 525
t2
T1(S)
121
CS 525
Example (b)
Example
T1(IS) , T2(IX)
R1
t2
R1
t1
t4
t3
T1(S)
CS 525
t2
T2(IX)
T1(S)
123
CS 525
Comp
Comp
Requestor
IS IX S SIX X
IS
Holder IX
S
Requestor
IS IX S SIX X
IS T
Holder IX T
S T
SIX T
SIX
X
X
125
CS 525
124
Multiple granularity
t4
t3
Multiple granularity
CS 525
122
T1(IS)
t1
t4
t3
F
F
F
F
F
F
F
F
126
21
Parent
locked in
Child can be
locked in
IS
IX
S
SIX
X
Parent
locked in
IS
IX
S
SIX
X
IS, S
IS, S, IX, X, SIX
none
X, IX, [SIX]
none
P
C
not necessary
CS 525
127
CS 525
128
Exercise:
Rules
(1) Follow multiple granularity comp function
(2) Lock root of tree first, any mode
(3) Node Q can be locked by Ti in S or IS only if
parent(Q) locked by Ti in IX or IS
(4) Node Q can be locked by Ti in X,SIX,IX only
if parent(Q) locked by Ti in IX,SIX
(5) Ti is two-phase
(6) Ti can unlock node Q only if none of Q s
children are locked by Ti
CS 525
129
R1
t1
T1(IX) t2
T1(X) f2.1
CS 525
t4
t3
f2.2
f3.1
f3.2
130
Exercise:
Exercise:
T1(IX)
T1(IS)
R1
t1
CS 525
T1(X)
t2
f2.1
f2.2
f3.1
t1
t4
t3
f3.2
131
R1
CS 525
T1(S)
t2
f2.1
f2.2
t4
t3
f3.1
f3.2
132
22
Exercise:
Exercise:
T1(SIX)
T1(SIX)
R1
t1
T1(IX) t2
T1(X) f2.1
CS 525
f2.2
f3.1
f3.2
133
...
Z
135
f3.2
134
E# Name
55 Smith
75 Jones
CS 525
136
T2
S2(o1)
S2(o2)
Check Constraint
...
CS 525
f3.1
...
o1
o2
CS 525
f2.2
Insert
T1(X) f2.1
t4
t3
CS 525
T1(IX) t2
t1
t4
t3
R1
Insert o3[08,Obama,..]
Insert o4[08,McCain,..]
137
CS 525
138
23
Back to example
Solution
T1: Insert<04,Kerry>
T1
X1(R)
CS 525
t2
X2(R)
X2(R)
Check constraint
Oops! e# = 04 already in R!
t3
139
CS 525
E#=5
CS 525
140
...
Index
100<E#<200
Index
0<E#<100
E#=2
delayed
Check constraint
Insert<04,Kerry>
U(R)
T2: Insert<04,Bush>
T2
...
E#=107
E#=109
141
...
CS 525
142
Example
Next:
Tree-based concurrency control
Validation concurrency control
A
B
D
E
CS 525
143
CS 525
144
24
Example
Example
A
B
T1 lock
T1 lock
C
T1 lock
T1 lock
D
T1 lock
C
T1 lock
D
F
F
can we release A lock
145
CS 525
146
T1 lock
C
T1 lock
D
E
CS 525
D
F
147
CS 525
148
A
T1 lock
Root
B
T1 lock
Q
D
E
CS 525
149
Ti Tj
150
25
CS 525
151
CS 525
152
T1 S lock(released)
T1 S lock(released)
T1 X lock (released)
B
T1 X lock (released)
B
T1 S lock (held)
D
E
T1 S lock (held)
D
F
E
T1 X lock (will get)
CS 525
153
C
T2 reads:
B modified by T1
F not yet modified by T1
F
T1 X lock (will get)
CS 525
154
CS 525
155
Deadlocks (again)
Before we assumed that we are able to
detect deadlocks and resolve them
Now two options
(1) Deadlock detection (and resolving)
(2) Deadlock prevention
CS 525
156
26
Deadlock Prevention
Option 1:
Deadlock Prevention
Option 1:
#
locks
time
CS 525
157
CS 525
Deadlock Prevention
Option 2:
158
Deadlock Prevention
Option 2:
159
CS 525
Deadlock Prevention
Option 3 (Preemption)
160
Deadlock Prevention
Option 3 (Preemption)
CS 525
161
CS 525
162
27
Deadlock Prevention
Option 3:
Option 4:
CS 525
Timeout-based Scheme
163
CS 525
Simple scheme
Hard to find a good value of X
To high: long wait times for a transaction
before it gets eventually aborted
To low: to many transaction that are not
deadlock get aborted
CS 525
165
T3
T4
CS 525
T1
164
Timeout-based Scheme
Option 4:
166
(2) Validate
check if schedule so far is serializable
T5
(3) Write
if validate ok, write to DB
CS 525
167
CS 525
168
28
Key idea
Make validation atomic
If T1, T2, T3, is validation order, then
resulting schedule will be conflict
equivalent to Ss = T1 T2 T3...
CS 525
169
CS 525
allow
Example of what validation must prevent:
RS(T2)={B}
RS(T3)={A,B} =
WS(T3)={C}
WS(T2)={B,D}
T2
start
T3
start
T2
validated
T3
validated
RS(T2)={B}
RS(T3)={A,B} =
WS(T3)={C}
WS(T2)={B,D}
T2
start
171
RS(T2)={A}
WS(T2)={D,E}
T2
validated
RS(T2)={A}
WS(T2)={D,E}
T2
validated
T3
validated
T2
CS 525
173
time
172
RS(T3)={A,B}
WS(T3)={C,D}
finish
T3
validated
T3
start
T2
finish
phase 3
CS 525
T2
validated
T3
start
time
CS 525
170
RS(T3)={A,B}
WS(T3)={C,D}
T3
validated
finish
time
T2
CS 525
time
174
29
allow
Another thing validation must prevent:
RS(T2)={A}
WS(T2)={D,E}
T2
validated
CS 525
RS(T3)={A,B}
WS(T3)={C,D}
T3
validated
finish
finish
T2
T2
time
175
Check (Tj):
176
Check (Tj):
RS(Tj) OR
IF [ WS(Ti)
RS(Tj) OR
CS 525
177
Improving Check(Tj)
For Ti VAL - IGNORE (Tj) DO
IF [ WS(Ti)
(Ti
178
start
validate
finish
Exercise:
U: RS(U)={B}
WS(U)={D}
RS(Tj) OR
CS 525
W: RS(W)={A,D}
WS(W)={A,C}
WS(Tj) )]
179
CS 525
V: RS(V)={B}
WS(V)={D,E}
180
30
Is Validation = 2PL?
Val
2PL
Val
2PL
Val
2PL
CS 525
2PL
Val
181
183
CS 525
CS 525
182
CS 525
185
184
Multiversioning Concurrency
Control (MVCC)
Keep old versions of data item and use
this to increase concurrency
Each write creates a new version of the
written data item
Use version numbers of timestamps to
identify versions
CS 525
186
31
Multiversioning Concurrency
Control (MVCC)
Different transactions operate over
different versions of data items
-> readers never have to wait for writers
-> great for combined workloads
MVCC schemes
MVCC timestamp ordering
MVCC 2PL
Snapshot isolation (SI)
We will only cover this one
CS 525
187
CS 525
CS 525
189
188
CS 525
190
191
192
32
X!
Y!
Z!
0"
1"
5"
W(Y := 1)"
0"
"
"
Commit"
"
"
"
"
"
"
"
"
"
"
"
"
2"
"
3"
future transactions"
2"
"
T3!
R(Y) 1"
6
7
W(X:=2)"
W(Z:=3)"
Commit"
"
"
10
R(Z) 5"
"
11
R(Y) 1"
3"
12
W(X:=3)"
CS 525
3"
T2!
13
Commit-Req"
14
Abort"
193
Is that serializable?
Almost ;-)
There is still one type of conflict which
cannot occur in serialize schedules
called write-skew
194
Write Skew
Consider two data items A and B
A = 5, B = 5
195
CS 525
Write Skew
Example: Oracle
196
CS 525
197
CS 525
198
33
SI Discussion
Advantages
Readers and writers do not block each other
If we do not GC old row versions we can go back
to previous versions of the database -> Time
travel
E.g., show me the customer table as it was yesterday
Disadvantages
Storage overhead to keep old row versions
GC overhead
Not strictly serializable
CS 525
199
Summary
Have studied CC mechanisms used in practice
- 2 PL variants
- Multiple lock granularity
- Deadlocks
- Tree (index) protocols
- Optimistic CC (Validation)
- Multiversioning Concurrency Control (MVCC)
CS 525
200
34