CH 07

Space-time tradeoffs
For many problems some extra space really pays off

(extra space in tables - breathing room)
input enhancement
non comparison-based sorting
auxiliary tables (shift tables for pattern matching)
prestructuring
hashing
indexing schemes (eg, B-trees)
tables of information that do all the work

dynamic programming
Design and Analysis of Algorithms - Chapter 7
Sorting by Counting
Algorithm ComparisonCountingSort(A[0..n-1])
for i 0 to n-1 do Count[i] 0
for i 0 to n-2 do
for j i+1 to n-1 do
if A[i] <A[j] then Count[j] Count[j]+1
else Count[i] Count[i]+1
for i 0 to n-1 do S[Count[i]] A[i]
Example: 62 31 84 96 19 47
Efficiency
Sorting by Counting (2)

Algorithm DistributionCountingSort(A[0..n-1])
for j 0 to u-l do D[j] 0
for i 0 to n-1 do D[A[i]-l] D[A[i]-l] + 1
for j 1 to u-l do D[j] D[j-1]+D[j]
for i n-1 down to 0 do
j A[i]-l; S[D[j]-1] A[i]; D[j] D[j]-1
Example: 13 11 12 13 12 12
Efficiency
String matching
pattern: a string of m characters to search for

text: a (long) string of n characters to search in
Brute force algorithm:
1. Align pattern at beginning of text

2. moving from left to right, compare each character of
pattern to the corresponding character in text until
all characters are found to match (successful search); or
a mismatch is detected
3. while pattern is not found and the text is not yet exhausted,
realign pattern one position to the right and repeat step 2.
String searching - History
1970: Cook shows (using finite-state machines) that problem

can be solved in time proportional to n+m
1976 Knuth and Pratt find algorithm based on Cooks idea;
Morris independently discovers same algorithm in attempt
to avoid backing up over text
At about the same time Boyer and Moore find an algorithm
that examines only a fraction of the text in most cases (by
comparing characters in pattern and text from right to left,
instead of left to right)
1980 Another algorithm proposed by Rabin and Karp
virtually always runs in time proportional to n+m and has
the advantage of extending easily to two-dimensional
pattern matching and being almost as simple as the bruteforce method.
Horspools Algorithm
A simplified version of Boyer-Moore

algorithm that retains key insights:
compare pattern characters to text from
right to left
given a pattern, create a shift table that
determines how much to shift the pattern
when a mismatch occurs (input
enhancement)
How far to shift?

Look at first (rightmost) character in text that was compared. Three cases:
The character is not in the pattern
.....c...................... (c not in pattern)
BAOBAB
The character is in the pattern (but not at rightmost position)

.....O...................... (O occurs once in pattern)
BAOBAB
.....A...................... (A occurs twice in pattern)
BAOBAB
The rightmost characters produced a match

.....B......................
BAOBAB
Shift Table: Stores number of characters to shift by depending on first

character compared
Shift table
Constructed by scanning pattern before search begins
All entries are initialized to length of pattern.

For c occurring in pattern, update table entry to distance
of rightmost occurrence of c from end of pattern
Algorithm ShiftTable(P[0..m-1])
for i 0 to size-1 do Table[i] m
for j 0 to m-2 do Table[P[j]] m-1-j
return Table
Shift table
Example for pattern BAOBAB:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
Then:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
The Algorithm
Horspool Matching(P[0..m-1,T[0..n-1]])
ShiftTable(P[0..m-1])
i m-1
while i<=n-1 do
k 0
while k<=m-1 and P[m-1-k]=T[i-k] do
k k+1
if k=m return i-m+1
else i i+Table[T[i]]
return -1
10
Boyer-Moore algorithm
Based on same two ideas:

compare pattern characters to text from right to
left
given a pattern, create a shift table that
determines how much to shift the pattern when
a mismatch occurs (input enhancement)
Uses additional shift table with same idea applied
to the number of matched characters
11
The bad-symbol shift
Based on the Horspool idea of using the extra table

However, this table is computed differently
If c, the text character corresponding to the last pattern
character, is not in the pattern, then shift in the same
way by m characters (actually c is a bad symbol)
If the mismatching character (the bad symbol) does not
appear in the pattern, then shift to overpass it
If the mismathcing character (the bad symbol) appears
in the pattern, then shift to align the bad symbol to the
same text character (lying to the left of the mismatching
position).
12
The bad-symbol shift - example

The bad symbol IS NOT in the pattern
...SER......................
BARBER
BARBER
shift 4 positions
The bad symbol IS in the pattern
...AER......................
BARBER
BARBER shift 2 positions,
This shift is given by: d=max[t1(c)-k,1], where t1 is the
Horspool table, k the distance between the bad
symbol from the end of the pattern
13
The good-suffix shift - example

What happens if a matched suffix appears
again in the pattern (eg. ABRACADABRA)
Important to find another suffix with a
different previous character. Calculate the
shift as the distance between two
occurrences of the suffix.
Also, important to find the longest prefix of
size l<k that matches the suffix of size l.
Calculate the shift as the distance between
the suffix and the prefix.
14
The good-suffix shift example (2)

K
pattern
d2
pattern
d2
1 ABCBAB 2
1 BAOBAB 2
2 ABCBAB 4
2 BAOBAB 5
3 ABCBAB 4
3 BAOBAB 5
4 ABCBAB 4
4 BAOBAB 5
5 ABCBAB 4
5 BAOBAB 5
15
Final rule for Boyer-Moore
Calculate shift as
d1 if k=0
d=
max(d1,d2) if k>0
where d1=max(t1(c)-k)
Example
BESS_KNEW_ABOUT_BAOBABS
BAOBAB
16
Hashing
A very efficient method for implementing a

dictionary, i.e., a set with the operations:
insert
find
delete
Applications:
databases
symbol tables
17
Hash tables and hash functions
Hash table: an array with indices that correspond to

buckets
Hash function: determines the bucket for each record
Example: student records, key=SSN. Hash function:
h(k) = k mod m
(k is a key and m is the number of buckets)
if m=1000, where is record with SSN= 315-17-4251 stored?
Hash function must:
be easy to compute
distribute keys evenly throughout the table
18
Collisions
If h(k1) = h(k2) then there is a collision.

Good hash functions result in fewer collisions.
Collisions can never be completely eliminated.
Two types handle collisions differently:
Open hashing - bucket points to linked list of all keys
hashing to it.
Closed hashing one key per bucket, in case of collision,
find another bucket for one of the keys
linear probing: use next bucket
double hashing: use second hash function to compute increment
19
Open hashing
If hash function distributes keys uniformly,
average length of linked list will be n/m
Average number of probes S = 1+/2, U =
Worst-case is still linear!
Open hashing still works if n>m.
20
Closed hashing
Does not work if n>m.

Avoids pointers.
Deletions are not straightforward.
Number of probes to insert/find/delete a key depends
on load factor = n/m (hash table density)
successful search: () (1+ 1/(1- ))
unsuccessful search: () (1+ 1/(1- ))
As the table gets filled ( approaches 1), number of

probes increases dramatically:
21

CH 07

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CH 07

Загружено:

Авторское право:

Доступные форматы

Space-time tradeoffs

For many problems some extra space really pays off

tables of information that do all the work

Sorting by Counting (2)

pattern: a string of m characters to search for

Brute force algorithm:

1. Align pattern at beginning of text

Design and Analysis of Algorithms - Chapter 7

String searching - History

1970: Cook shows (using finite-state machines) that problem

Design and Analysis of Algorithms - Chapter 7

A simplified version of Boyer-Moore

How far to shift?

The character is in the pattern (but not at rightmost position)

The rightmost characters produced a match

Shift Table: Stores number of characters to shift by depending on first

Design and Analysis of Algorithms - Chapter 7

Constructed by scanning pattern before search begins

All entries are initialized to length of pattern.

Example for pattern BAOBAB:

Design and Analysis of Algorithms - Chapter 7

Based on same two ideas:

Design and Analysis of Algorithms - Chapter 7

The bad-symbol shift

Based on the Horspool idea of using the extra table

The bad-symbol shift - example

The good-suffix shift - example

Design and Analysis of Algorithms - Chapter 7

The good-suffix shift example (2)

Design and Analysis of Algorithms - Chapter 7

Final rule for Boyer-Moore

A very efficient method for implementing a

Hash tables and hash functions

Hash table: an array with indices that correspond to

Hash function must:

Design and Analysis of Algorithms - Chapter 7

If h(k1) = h(k2) then there is a collision.

Design and Analysis of Algorithms - Chapter 7

Design and Analysis of Algorithms - Chapter 7

Does not work if n>m.

As the table gets filled ( approaches 1), number of

Design and Analysis of Algorithms - Chapter 7

Вам также может понравиться