Академический Документы
Профессиональный Документы
Культура Документы
Princeton University COS 226 Algorithms and Data Structures Spring 2004 Kevin Wayne http://www.Princeton.EDU/~cos226 2
k k
a a c a a g t t t a c a a g c a a c a a g t t t a c a a g c
i j i j
3 4
A Sorting Solution Suffix Sorting: Java Implementation
a a c a a g t t t a c a a g c a a c a a g t t t a c a a g c
a c a a g t t t a c a a g c a a g c public class SuffixSorter {
public static void main(String[] args) {
c a a g t t t a c a a g c a a g t t t a c a a g c
a a g t t t a c a a g c a c a a g c In stdin = new In(); read input
a g t t t a c a a g c a c a a g t t t a c a a g c String s = stdin.readAll();
g t t t a c a a g c a g c int N = s.length();
t t t a c a a g c a g t t t a c a a g c create suffixes
String[] suffixes = new String[N];
t t a c a a g c c (linear time)
for (int i = 0; i < N; i++)
t a c a a g c c a a g c suffixes[i] = s.substring(i, N);
a c a a g c c a a g t t t a c a a g c
Arrays.sort(suffixes); sort and find
c a a g c g c
longest match
a a g c g t t t a c a a g c findLongestMatch(suffixes); (bottleneck)
a g c t a c a a g c }
g c t t a c a a g c }
c t t t a c a a g c
5 6
7 8
String Sorting Key Indexed Counting
a count temp
0 d a b 0 d a b 0 d a b 0 a c e
0 d a b a 0 a 2 0 a d d
1 a d d 1 c a b 1 c a b 1 a d d
1 a d d b 2 b 5 1 a c e
2 c a b 2 e b b 2 f a d 2 b a d
for (int i = L; i <= R; i++) 2 c a b c 3 c 6 2 b a d
3 f a d 3 a d d 3 b a d 3 b e d
a[i] = temp[i - L]; 3 f a d d 1 d 8 3 b e e
4 f e e 4 f a d 4 d a d 4 b e e
4 f e e e 2 e 9 4 b e d
5 b a d 5 b a d 5 e b b 5 c a b
copy back 5 b a d f 1 f 12 5 c a b
6 d a d 6 d a d 6 a c e 6 d a b
6 d a d g 3 g 11 6 d a b
7 b e e 7 f e d 7 a d d 7 d a d
7 b e e 7 d a d
8 f e d 8 b e d 8 f e d 8 e b b
8 f e d 8 e b b
9 b e d 9 f e e 9 b e d 9 f a d
9 b e d 9 f a d
10 e b b 10 b e e 10 f e e 10 f e d
10 e b b 10 f e e
11 a c e 11 a c e 11 b e e 11 f e e
11 a c e 11 f e d
25 27
LSD Radix Sort LSD Radix Sort: Correctness
Proof 2. (right-to-left)
n If the characters not yet examined differ, it
public static void lsd(String[] a, int lo, int hi) {
for (int d = W-1; d >= 0; d--) { doesn't matter what we do now.
// do key-indexed counting sort on digit d n If the characters not yet examined agree, later
...
pass won't affect order.
}
}
28 29
30 31
MSD Radix Sort Implementation String Sorting Performance
private static void msd(String[] a, int lo, int hi, int d) { Quicksort W N log N 9.5
if (hi <= lo) return;
LSD * W(N + R) -
// do key-indexed counting sort on digit d MSD W(N + R) 395
int[] count = new int[256+1];
MSD with cutoff W(N + R) 6.8
...
R = radix. estimate
W = max length of string. * assumes fixed length strings.
N = number of strings. probabilistic guarantee.
32 33
34 35
Correspondence With Sorting Algorithms 3-Way Radix Quicksort
Correspondence between trees and sorting algorithms. Idea 1. Use dth character to "sort" into 3 pieces instead of 256, and
n BSTs correspond to quicksort recursive partitioning structure. sort each piece recursively.
n R-way tries corresponds to MSD radix sort. Idea 2. Keep all duplicates together in partitioning step.
by h the
e e
l shells shore
sea sells
36
Partition Algorithm 37
Recursive Structure of MSD Radix Sort vs. 3-Way Quicksort 3-Way Partitioning
3-way radix quicksort collapses empty links in MSD tree. 3-way partitioning.
n Natural way to deal with equal keys.
n Partition elements into 3 parts:
elements between i and j equal to partition element v
MSD Recursion Tree
no larger elements to left of i
no smaller elements to right of j
38 39
3-Way Partitioning 3-Way Radix Quicksort
Elegant solution to Dutch national flag problem. private static void quicksortX(String a[], int lo, int hi, int d) {
if (hi - lo <= 0) return;
n Partition elements into 4 parts: int i = lo-1, j = hi, p = lo-1, q = hi;
no larger elements to left of m char v = a[hi].charAt(d);
42 43
String Sorting Performance Suffix Sorting: Worst Case Input
44 45
Suffix Sorting in N log N Time: Key Idea Suffix Sorting in N log N Time
Input: "babaaaabcbabaaaaa"
46 47
String Sorting Performance
R = radix. estimate
W = max length of string. * fixed length strings only
N = number of strings. probabilistic guarantee
suffix sorting only
48