Вы находитесь на странице: 1из 8

Ch 2 - Sorting Sedgewick, Algorithms 

 
SORTING  
We learn algorithms for sorting data (specifically, arrays), and compare which ones are best for which 
applications. These algorithms for the basis for many later algorithms.  
 
For our purposes, “sorting” is taken to mean “rearranging the elements of an array” into some order. The 
order is determined by each element’s ​key​: this allows us to abstract from primitive data types to any objects 
that have keys (e.g. Date objects). 
 
The Basic Template: Sorts Based On ​Comparable 
Our sort classes will each fit into the template on page 245. Each class will have:  
● a ​sort(Comparable[] a) ​method, which sorts the input.  
● a ​less(Comparable v, Comparable w)​ method, which returns true if v < w, false otherwise 
● a ​exch(Comparable[] a, int i, int j) ​method, which exchanges values at index i and j 
● a ​isSorted(Comparable[] a) ​method, to test if array entries are in order 
● a ​main ​method, which reads strings from StdIn, sorts them, checks them, and prints (for tests) 
 
The ​isSorted(a) ​method allows us to ​assert isSorted(a) ​to certify that the sort is complete.  
 
By implementing each sort under this system, we can meaningfully compare their efficiencies in terms of the 
number of ​compares ​(and the cost of each) and the number of ​exchanges ​(and their cost). Some 
implementations will use no exchanges (instead creating a copy of the array, for instance), and for those we 
will track array accesses.  
 
The code we use for these sorts will sort anything that implements the ​Comparable​ interface (including 
numeric wrapper types, Strings, etc). When we create our own classes, to implement the interface we need 
to include a ​x.compareTo(y)​ method that defines their ordering by returning -1, 0, or 1.  
 
Fancy Sorting: The ​Comparator​ Interface 
The ​Comparator​ interface allows us to sort using an alternate order (as long as it is a ​total order​). Because 
this exists outside of the data type’s class, it decouples the definition of the data type from what it means to 
compare two objects of that type, allowing us even to edit it later.  
 
If we call our ​sort(Comparable[] a)​ method as before, it will sort using the “natural order” determined 
by its ​compareTo() ​method. However, we can override this by passing a second argument to ​sort(), 
determining the manner of sorting; it then uses the ​compare()​ method from ​Comparator​.  
 
Implementing this in code requires a nested class 
that implements the ​Comparator​ interface 
(which must implement the ​compare() 
method).  
 
Now the data can be sorted by multiple keys.  
Ch 2 - Sorting Sedgewick, Algorithms 
 
Comparing Sorts: Overview  
Each of our sorts will be a compare-based sorting algorithm, with different costs and drawbacks. The 
algorithms are:  
 
- Elementary sorts: s​ election sort​, ​insertion sort​, ​shellsort 
- Easy to understand 
- Run in roughly quadratic time (on all inputs) 
- Are truly “in-place” —> they use less than clogN extra memory 
- Stable​ = insertion sort ; ​Unstable​ = selection sort, shellsort 
 
- Classic sorts: ​mergesort​, ​quicksort 
- More complex algorithms (recursion) 
- Run in linearithmic time on all inputs 
- Require some extra space (proportional to N) 
- Stable​ = mergesort ; ​Unstable​ = quicksort 
 
Running Time 
For arbitrary input, we can analyze our algorithms’ speed compared to a lower bound. Trivially, an array of 
size N requires N compares (you must touch all the data), but a more complex tree-based analysis reveals 
that these sorting algorithms require log(N!) ~ NlogN compares. When we succeed in finding an algorithm 
whose upper bound in NlogN (such as ​mergesort​), we know we have found an optimal algorithm. Note that 
this lower bound is for arbitrary input: for special inputs, faster algorithms may be possible! 
 
Also, note that the bare number of compares alone does not determine the fastest algorithm: other ideas, 
such as the amount of data movement, may factor in when comparing algorithms of similar speed.  
 
Space Required 
The classic sorts require extra space to hold auxiliary arrays (proportional to N), while the elementary sorts 
are truly “in-place” and require almost no extra memory. This makes them practical on certain types of 
hardware.  
 
Stability 
Particularly when sorting by 2 or more keys, the ​stability​ ​of a sort may be important. A sort is ​stable​ if it 
preserves the order of elements which have equal keys: for instance, if I sort by student name and then by 
section, each section will be in alphabetical order for a stable sort (but not an unstable one).  
 
Duplicate Keys 
Many data files that we will sort involve many ​duplicate keys:​ objects with equal values w.r.t. the variable we 
are sorting. This doesn’t matter for an algorithm like mergesort: but for quicksort, certain implementations 
go quadratic in the case of many dupes.  
 
System Sorts in Java 
Java’s ​Arrays.sort()​ method has different methods for each primitive type, for data types that 
implement ​Comparable​, and for a ​Comparator.  
● For ​primitive types​, average performance is most important, s Java uses a tuned quicksort. 
● For ​objects​, stability and guaranteed performance win out, so Java uses a tuned mergesort.  
   
Ch 2 - Sorting Sedgewick, Algorithms 
 
Elementary Sorts  
Selection Sort  Exchanges​: always N  - no different for different inputs 
find the smallest and  Compares:​ ~ N​2​/ 2  (so inefficient for pre-sorted input) 
move to the front  QUADRATIC, ​stable  - minimal data movement 

int N = a.length; // N is the length of the array 


 
for (int i = 0; i < N; i++) { // for each value in the array 
 
int min = i; // set min equal to that value 
 
for (int j = i+1; j < N; j++) { // check every later array entry 
 
if (less(a[j], a[min])) // if the value is less than min 
min = j; // make it the new min 

exch(a, i, min); // exchange min & current value 

Insertion Sort  Exchanges: ​0 to ~ N​2​/ 2  - linear time for already sorted arrays 
take elements one at a  ​ - 1 to ~N​2​/2  
Compares: N - efficient for partially sorted arrays, adding 
time & sort them  QUADRATIC, ​unstable  a few items to a pre-sorted array, etc.  

int N = a.length; // N is the length of the array 


 
for (int i = 0; i < N; i++) { // for each value in the array 
 
for (int j = i; j > 0 ; j--) { // check every leftward entry 
 
if (less(a[j], a[j-1])) // if it’s less than its neighbor 
exch(a, j, j-1); // swap them 
else break; // otherwise do nothing 

Shell Sort  Exchanges: ?​ ??  - fast unless array size is huge 


insertion sort, modified  Compares: ​ > Nlog​3​N   - used in some hardware (tiny amt of code) 
to sort sub-sequences  QUADRATIC, ​unstable  - math is complex for performance 

int N = a.length; // N is the length of the array 


int h = 1   
while (h < N/3) h = 3*h + 1; // Knuth’s 3x+1 values of h 
 
while (h >= 1) { 
   
// Go through the array for every h.  
for (int i = h; i < N; i++) {   
 
// Execute insertion sort at intervals of h.  
for (int j = i; j >= h && less(a[j]. a[j-h]); j-=h) { 
exch(a, j, j-h); 

h = h/3; // Move to next increment 

Ch 2 - Sorting Sedgewick, Algorithms 
 
Classic Sorts 
Mergesort  Quicksort 
break in half, (recursively) sort, and merge results  shuffle, partition, sort each piece recursively 

LINEARITHMIC, ​stable   LINEARITHMIC-ISH​, unstable 


Space considerations: ​Gets big if you create a new  Space considerations: I​ n-place! We can use an aux 
array for each merge. Use an auxiliary array  array to make it stable, but this is dumb: it destroys 
instead--taking extra space proportional to ​N.   the advantage over mergesort. 

This algorithm consists of two parts, hence its  This algorithm consists of three parts: a ​shuffle​, a 
name: the ​merge​ and the ​sort​.   partition​, and a ​sort​.  
   
merge​: takes two sorted subarrays (NOTE: check  shuffle​: required for performance guarantee 
with assert) and combines them into a sorted array 

 
 
partition​: choose an element. Scan two pointers 
from left to right and right to left, exchanging 
values smaller & bigger than the element until a 
partition is formed. Place the element in its spot. 
 
 
sort: ​(the recursive way) split in two, sort the left 
half, sort the right half, merge and return.  

    
(NOTE: second part avoids zillions of aux arrays) 
 
sort: ​(the bottom-up way) pass through subarrays   
of size 1, then 2, 4, 8, 16…  
sort​: recursively sorts left and right parts 

   

Improvements:   Improvements:  
(1) Use insertion sort for < 7 items.   (1) Use insertion sort for <10 items 
(2) Stop if already sorted (biggest item in left is  (2) Choose a pivot near the median  
smaller than smallest item in right)   (3) Stop partitioning on equal keys ​or t​ hree-way 
(3) Switch a and aux in every call.   partitioning (​BELOW​) 
 
Ch 2 - Sorting Sedgewick, Algorithms 
 
Three-way Quicksort 

 
   
Ch 2 - Sorting Sedgewick, Algorithms 
 
Application: Shuffling 
Our goal is to shuffle an array of objects (cards, say) into a random order.  
 
● Implementation 1:​ Generate a random value for each card. Sort these random values into order.  
○ COST: requires a sort (~linearithmic time) 
 
● Implementation 2: ​The ​Knuth shuffle​ - go through each card. At each position, generate a random 
integer less than or equal to the current card. Swap the current card with that card.  
○ COST: linear time (1 swap per card) 

 
 
Related Problem: Selection 
Our goal is to find the kth largest item in an array of N items.  
 
For this problem, the upper bound is NlogN (borrowed from sorting: if we sort them, we can find the kth). 
The lower bound is N (we must look at all the items at least once). It’s possible that selection is as hard as 
sorting, or that it might be much easier--in fact 
 
● Implementation:​ ​Quick-select​, a quicksort variant, uses the same algorithm but does half the work. 
After choosing random pivot j, the algorithm only sorts the half of the array that k is in.  

 
○ COST: linear time on average (requires random shuffle), quadratic in worst case 
 
Application: Convex Hull 
The convex hull of a set of points is the subset that form the vertices of a polygon which encloses all the 
points. Important ideas for our algorightm 
—> We can traverse the convex hull through a series of counter-clockwise turns 
—> The vertices of the convex hull appear in increasing order of polar angle with respect to p (the 
point with the lowest y-coordinate) 
 
Ch 2 - Sorting Sedgewick, Algorithms 
 
The ​Graham scan​: Choose p (the point with the smallest y-coordinate). Sort all the other points by their 
polar angle with p. Consider the points in order from smallest to largest polar angle. Discard each point that 
fails to create a ccw turn (really the previous point).  
 
● How do we find p? (sort by y-coordinate) 
● How do we order the points? (sort by polar angle) 
● How to determine if it is a ccw turn? * 
● How to handle degeneracies? (e.g. 3 points on line) 
 
* A tricky fix: use the cross product / determinant to give a 
signed area of the triangle. The sign gives us ccw / cw:  

 
 
An implementation of the polar order using the Comparator interface: 

   
Ch 2 - Sorting Sedgewick, Algorithms 
 
SEARCHING  
We learn algorithms for finding an item in a large collection. Like sorting, there are many different alternatives 
with different efficiencies depending on the underlying data structure.  
 
GRAPHS  
Graphs are abstract data structures that consist of items and connections (sometimes with weights or 
orientations). This chapter deals with algorithms for processing graphs.  
 
STRINGS 
Strings, or sequences of characters, are very important in modern computing. This chapter deals with algorithms 
for processing strings--particularly, how to implement searching, sorting, and other algorithms in ways that are 
more efficient for strings.  
 
CONTEXT 
This chapter relates the topics in this book to broader subjects, from scientific computing to the theory of 
computation.  
 

Вам также может понравиться