Академический Документы
Профессиональный Документы
Культура Документы
SORTING
We learn algorithms for sorting data (specifically, arrays), and compare which ones are best for which
applications. These algorithms for the basis for many later algorithms.
For our purposes, “sorting” is taken to mean “rearranging the elements of an array” into some order. The
order is determined by each element’s key: this allows us to abstract from primitive data types to any objects
that have keys (e.g. Date objects).
The Basic Template: Sorts Based On Comparable
Our sort classes will each fit into the template on page 245. Each class will have:
● a sort(Comparable[] a) method, which sorts the input.
● a less(Comparable v, Comparable w) method, which returns true if v < w, false otherwise
● a exch(Comparable[] a, int i, int j) method, which exchanges values at index i and j
● a isSorted(Comparable[] a) method, to test if array entries are in order
● a main method, which reads strings from StdIn, sorts them, checks them, and prints (for tests)
The isSorted(a) method allows us to assert isSorted(a) to certify that the sort is complete.
By implementing each sort under this system, we can meaningfully compare their efficiencies in terms of the
number of compares (and the cost of each) and the number of exchanges (and their cost). Some
implementations will use no exchanges (instead creating a copy of the array, for instance), and for those we
will track array accesses.
The code we use for these sorts will sort anything that implements the Comparable interface (including
numeric wrapper types, Strings, etc). When we create our own classes, to implement the interface we need
to include a x.compareTo(y) method that defines their ordering by returning -1, 0, or 1.
Fancy Sorting: The Comparator Interface
The Comparator interface allows us to sort using an alternate order (as long as it is a total order). Because
this exists outside of the data type’s class, it decouples the definition of the data type from what it means to
compare two objects of that type, allowing us even to edit it later.
If we call our sort(Comparable[] a) method as before, it will sort using the “natural order” determined
by its compareTo() method. However, we can override this by passing a second argument to sort(),
determining the manner of sorting; it then uses the compare() method from Comparator.
Implementing this in code requires a nested class
that implements the Comparator interface
(which must implement the compare()
method).
Now the data can be sorted by multiple keys.
Ch 2 - Sorting Sedgewick, Algorithms
Comparing Sorts: Overview
Each of our sorts will be a compare-based sorting algorithm, with different costs and drawbacks. The
algorithms are:
- Elementary sorts: s election sort, insertion sort, shellsort
- Easy to understand
- Run in roughly quadratic time (on all inputs)
- Are truly “in-place” —> they use less than clogN extra memory
- Stable = insertion sort ; Unstable = selection sort, shellsort
- Classic sorts: mergesort, quicksort
- More complex algorithms (recursion)
- Run in linearithmic time on all inputs
- Require some extra space (proportional to N)
- Stable = mergesort ; Unstable = quicksort
Running Time
For arbitrary input, we can analyze our algorithms’ speed compared to a lower bound. Trivially, an array of
size N requires N compares (you must touch all the data), but a more complex tree-based analysis reveals
that these sorting algorithms require log(N!) ~ NlogN compares. When we succeed in finding an algorithm
whose upper bound in NlogN (such as mergesort), we know we have found an optimal algorithm. Note that
this lower bound is for arbitrary input: for special inputs, faster algorithms may be possible!
Also, note that the bare number of compares alone does not determine the fastest algorithm: other ideas,
such as the amount of data movement, may factor in when comparing algorithms of similar speed.
Space Required
The classic sorts require extra space to hold auxiliary arrays (proportional to N), while the elementary sorts
are truly “in-place” and require almost no extra memory. This makes them practical on certain types of
hardware.
Stability
Particularly when sorting by 2 or more keys, the stability of a sort may be important. A sort is stable if it
preserves the order of elements which have equal keys: for instance, if I sort by student name and then by
section, each section will be in alphabetical order for a stable sort (but not an unstable one).
Duplicate Keys
Many data files that we will sort involve many duplicate keys: objects with equal values w.r.t. the variable we
are sorting. This doesn’t matter for an algorithm like mergesort: but for quicksort, certain implementations
go quadratic in the case of many dupes.
System Sorts in Java
Java’s Arrays.sort() method has different methods for each primitive type, for data types that
implement Comparable, and for a Comparator.
● For primitive types, average performance is most important, s Java uses a tuned quicksort.
● For objects, stability and guaranteed performance win out, so Java uses a tuned mergesort.
Ch 2 - Sorting Sedgewick, Algorithms
Elementary Sorts
Selection Sort Exchanges: always N - no different for different inputs
find the smallest and Compares: ~ N2/ 2 (so inefficient for pre-sorted input)
move to the front QUADRATIC, stable - minimal data movement
Insertion Sort Exchanges: 0 to ~ N2/ 2 - linear time for already sorted arrays
take elements one at a - 1 to ~N2/2
Compares: N - efficient for partially sorted arrays, adding
time & sort them QUADRATIC, unstable a few items to a pre-sorted array, etc.
This algorithm consists of two parts, hence its This algorithm consists of three parts: a shuffle, a
name: the merge and the sort. partition, and a sort.
merge: takes two sorted subarrays (NOTE: check shuffle: required for performance guarantee
with assert) and combines them into a sorted array
partition: choose an element. Scan two pointers
from left to right and right to left, exchanging
values smaller & bigger than the element until a
partition is formed. Place the element in its spot.
sort: (the recursive way) split in two, sort the left
half, sort the right half, merge and return.
(NOTE: second part avoids zillions of aux arrays)
sort: (the bottom-up way) pass through subarrays
of size 1, then 2, 4, 8, 16…
sort: recursively sorts left and right parts
Improvements: Improvements:
(1) Use insertion sort for < 7 items. (1) Use insertion sort for <10 items
(2) Stop if already sorted (biggest item in left is (2) Choose a pivot near the median
smaller than smallest item in right) (3) Stop partitioning on equal keys or t hree-way
(3) Switch a and aux in every call. partitioning (BELOW)
Ch 2 - Sorting Sedgewick, Algorithms
Three-way Quicksort
Ch 2 - Sorting Sedgewick, Algorithms
Application: Shuffling
Our goal is to shuffle an array of objects (cards, say) into a random order.
● Implementation 1: Generate a random value for each card. Sort these random values into order.
○ COST: requires a sort (~linearithmic time)
● Implementation 2: The Knuth shuffle - go through each card. At each position, generate a random
integer less than or equal to the current card. Swap the current card with that card.
○ COST: linear time (1 swap per card)
Related Problem: Selection
Our goal is to find the kth largest item in an array of N items.
For this problem, the upper bound is NlogN (borrowed from sorting: if we sort them, we can find the kth).
The lower bound is N (we must look at all the items at least once). It’s possible that selection is as hard as
sorting, or that it might be much easier--in fact
● Implementation: Quick-select, a quicksort variant, uses the same algorithm but does half the work.
After choosing random pivot j, the algorithm only sorts the half of the array that k is in.
○ COST: linear time on average (requires random shuffle), quadratic in worst case
Application: Convex Hull
The convex hull of a set of points is the subset that form the vertices of a polygon which encloses all the
points. Important ideas for our algorightm
—> We can traverse the convex hull through a series of counter-clockwise turns
—> The vertices of the convex hull appear in increasing order of polar angle with respect to p (the
point with the lowest y-coordinate)
Ch 2 - Sorting Sedgewick, Algorithms
The Graham scan: Choose p (the point with the smallest y-coordinate). Sort all the other points by their
polar angle with p. Consider the points in order from smallest to largest polar angle. Discard each point that
fails to create a ccw turn (really the previous point).
● How do we find p? (sort by y-coordinate)
● How do we order the points? (sort by polar angle)
● How to determine if it is a ccw turn? *
● How to handle degeneracies? (e.g. 3 points on line)
* A tricky fix: use the cross product / determinant to give a
signed area of the triangle. The sign gives us ccw / cw:
An implementation of the polar order using the Comparator interface:
Ch 2 - Sorting Sedgewick, Algorithms
SEARCHING
We learn algorithms for finding an item in a large collection. Like sorting, there are many different alternatives
with different efficiencies depending on the underlying data structure.
GRAPHS
Graphs are abstract data structures that consist of items and connections (sometimes with weights or
orientations). This chapter deals with algorithms for processing graphs.
STRINGS
Strings, or sequences of characters, are very important in modern computing. This chapter deals with algorithms
for processing strings--particularly, how to implement searching, sorting, and other algorithms in ways that are
more efficient for strings.
CONTEXT
This chapter relates the topics in this book to broader subjects, from scientific computing to the theory of
computation.