Notes On Algorithms by Sedgewick - Ch2 - Sorting

Ch 2 - Sorting Sedgewick, Algorithms

SORTING
We learn algorithms for sorting data (specifically, arrays), and compare which ones are best for which
applications. These algorithms for the basis for many later algorithms.

For our purposes, “sorting” is taken to mean “rearranging the elements of an array” into some order. The
order is determined by each element’s key: this allows us to abstract from primitive data types to any objects
that have keys (e.g. Date objects).

The Basic Template: Sorts Based On Comparable
Our sort classes will each fit into the template on page 245. Each class will have:
● a sort(Comparable[] a) method, which sorts the input.
● a less(Comparable v, Comparable w) method, which returns true if v < w, false otherwise
● a exch(Comparable[] a, int i, int j) method, which exchanges values at index i and j
● a isSorted(Comparable[] a) method, to test if array entries are in order
● a main method, which reads strings from StdIn, sorts them, checks them, and prints (for tests)

The isSorted(a) method allows us to assert isSorted(a) to certify that the sort is complete.

By implementing each sort under this system, we can meaningfully compare their efficiencies in terms of the
number of compares (and the cost of each) and the number of exchanges (and their cost). Some
implementations will use no exchanges (instead creating a copy of the array, for instance), and for those we
will track array accesses.

The code we use for these sorts will sort anything that implements the Comparable interface (including
numeric wrapper types, Strings, etc). When we create our own classes, to implement the interface we need
to include a x.compareTo(y) method that defines their ordering by returning -1, 0, or 1.

Fancy Sorting: The Comparator Interface
The Comparator interface allows us to sort using an alternate order (as long as it is a total order). Because
this exists outside of the data type’s class, it decouples the definition of the data type from what it means to
compare two objects of that type, allowing us even to edit it later.

If we call our sort(Comparable[] a) method as before, it will sort using the “natural order” determined
by its compareTo() method. However, we can override this by passing a second argument to sort(),
determining the manner of sorting; it then uses the compare() method from Comparator.

Implementing this in code requires a nested class
that implements the Comparator interface
(which must implement the compare()
method).

Now the data can be sorted by multiple keys.

Comparing Sorts: Overview
Each of our sorts will be a compare-based sorting algorithm, with different costs and drawbacks. The
algorithms are:

- Elementary sorts: s election sort, insertion sort, shellsort
- Easy to understand
- Run in roughly quadratic time (on all inputs)
- Are truly “in-place” —> they use less than clogN extra memory
- Stable = insertion sort ; Unstable = selection sort, shellsort

- Classic sorts: mergesort, quicksort
- More complex algorithms (recursion)
- Run in linearithmic time on all inputs
- Require some extra space (proportional to N)
- Stable = mergesort ; Unstable = quicksort

Running Time
For arbitrary input, we can analyze our algorithms’ speed compared to a lower bound. Trivially, an array of
size N requires N compares (you must touch all the data), but a more complex tree-based analysis reveals
that these sorting algorithms require log(N!) ~ NlogN compares. When we succeed in finding an algorithm
whose upper bound in NlogN (such as mergesort), we know we have found an optimal algorithm. Note that
this lower bound is for arbitrary input: for special inputs, faster algorithms may be possible!

Also, note that the bare number of compares alone does not determine the fastest algorithm: other ideas,
such as the amount of data movement, may factor in when comparing algorithms of similar speed.

Space Required
The classic sorts require extra space to hold auxiliary arrays (proportional to N), while the elementary sorts
are truly “in-place” and require almost no extra memory. This makes them practical on certain types of
hardware.

Stability
Particularly when sorting by 2 or more keys, the stability of a sort may be important. A sort is stable if it
preserves the order of elements which have equal keys: for instance, if I sort by student name and then by
section, each section will be in alphabetical order for a stable sort (but not an unstable one).

Duplicate Keys
Many data files that we will sort involve many duplicate keys: objects with equal values w.r.t. the variable we
are sorting. This doesn’t matter for an algorithm like mergesort: but for quicksort, certain implementations
go quadratic in the case of many dupes.

System Sorts in Java
Java’s Arrays.sort() method has different methods for each primitive type, for data types that
implement Comparable, and for a Comparator.
● For primitive types, average performance is most important, s Java uses a tuned quicksort.
● For objects, stability and guaranteed performance win out, so Java uses a tuned mergesort.


Elementary Sorts
Selection Sort Exchanges: always N - no different for different inputs
find the smallest and Compares: ~ N2/ 2 (so inefficient for pre-sorted input)
move to the front QUADRATIC, stable - minimal data movement
int N = a.length; // N is the length of the array

for (int i = 0; i < N; i++) { // for each value in the array

int min = i; // set min equal to that value

for (int j = i+1; j < N; j++) { // check every later array entry

if (less(a[j], a[min])) // if the value is less than min
min = j; // make it the new min
}
exch(a, i, min); // exchange min & current value
Insertion Sort Exchanges: 0 to ~ N2/ 2 - linear time for already sorted arrays
take elements one at a - 1 to ~N2/2
Compares: N - efficient for partially sorted arrays, adding
time & sort them QUADRATIC, unstable a few items to a pre-sorted array, etc.

for (int i = 0; i < N; i++) { // for each value in the array

for (int j = i; j > 0 ; j--) { // check every leftward entry

if (less(a[j], a[j-1])) // if it’s less than its neighbor
exch(a, j, j-1); // swap them
else break; // otherwise do nothing
}
Shell Sort Exchanges: ? ?? - fast unless array size is huge

insertion sort, modified Compares: > Nlog3N - used in some hardware (tiny amt of code)
to sort sub-sequences QUADRATIC, unstable - math is complex for performance

int h = 1
while (h < N/3) h = 3*h + 1; // Knuth’s 3x+1 values of h

while (h >= 1) {

// Go through the array for every h.
for (int i = h; i < N; i++) {

// Execute insertion sort at intervals of h.
for (int j = i; j >= h && less(a[j]. a[j-h]); j-=h) {
exch(a, j, j-h);
}
h = h/3; // Move to next increment
}

Classic Sorts
Mergesort Quicksort
break in half, (recursively) sort, and merge results shuffle, partition, sort each piece recursively
LINEARITHMIC, stable LINEARITHMIC-ISH, unstable

Space considerations: Gets big if you create a new Space considerations: I n-place! We can use an aux
array for each merge. Use an auxiliary array array to make it stable, but this is dumb: it destroys
instead--taking extra space proportional to N. the advantage over mergesort.
This algorithm consists of two parts, hence its This algorithm consists of three parts: a shuffle, a
name: the merge and the sort. partition, and a sort.

merge: takes two sorted subarrays (NOTE: check shuffle: required for performance guarantee
with assert) and combines them into a sorted array

partition: choose an element. Scan two pointers
from left to right and right to left, exchanging
values smaller & bigger than the element until a
partition is formed. Place the element in its spot.

sort: (the recursive way) split in two, sort the left
half, sort the right half, merge and return.

(NOTE: second part avoids zillions of aux arrays)

sort: (the bottom-up way) pass through subarrays
of size 1, then 2, 4, 8, 16…
sort: recursively sorts left and right parts

Improvements: Improvements:
(1) Use insertion sort for < 7 items. (1) Use insertion sort for <10 items
(2) Stop if already sorted (biggest item in left is (2) Choose a pivot near the median
smaller than smallest item in right) (3) Stop partitioning on equal keys or t hree-way
(3) Switch a and aux in every call. partitioning (BELOW)


Three-way Quicksort


Application: Shuffling
Our goal is to shuffle an array of objects (cards, say) into a random order.

● Implementation 1: Generate a random value for each card. Sort these random values into order.
○ COST: requires a sort (~linearithmic time)

● Implementation 2: The Knuth shuffle - go through each card. At each position, generate a random
integer less than or equal to the current card. Swap the current card with that card.
○ COST: linear time (1 swap per card)

Related Problem: Selection
Our goal is to find the kth largest item in an array of N items.

For this problem, the upper bound is NlogN (borrowed from sorting: if we sort them, we can find the kth).
The lower bound is N (we must look at all the items at least once). It’s possible that selection is as hard as
sorting, or that it might be much easier--in fact

● Implementation: Quick-select, a quicksort variant, uses the same algorithm but does half the work.
After choosing random pivot j, the algorithm only sorts the half of the array that k is in.

○ COST: linear time on average (requires random shuffle), quadratic in worst case

Application: Convex Hull
The convex hull of a set of points is the subset that form the vertices of a polygon which encloses all the
points. Important ideas for our algorightm
—> We can traverse the convex hull through a series of counter-clockwise turns
—> The vertices of the convex hull appear in increasing order of polar angle with respect to p (the
point with the lowest y-coordinate)


The Graham scan: Choose p (the point with the smallest y-coordinate). Sort all the other points by their
polar angle with p. Consider the points in order from smallest to largest polar angle. Discard each point that
fails to create a ccw turn (really the previous point).

● How do we find p? (sort by y-coordinate)
● How do we order the points? (sort by polar angle)
● How to determine if it is a ccw turn? *
● How to handle degeneracies? (e.g. 3 points on line)

* A tricky fix: use the cross product / determinant to give a
signed area of the triangle. The sign gives us ccw / cw:

An implementation of the polar order using the Comparator interface:


SEARCHING
We learn algorithms for finding an item in a large collection. Like sorting, there are many different alternatives
with different efficiencies depending on the underlying data structure.

GRAPHS
Graphs are abstract data structures that consist of items and connections (sometimes with weights or
orientations). This chapter deals with algorithms for processing graphs.

STRINGS
Strings, or sequences of characters, are very important in modern computing. This chapter deals with algorithms
for processing strings--particularly, how to implement searching, sorting, and other algorithms in ways that are
more efficient for strings.

CONTEXT
This chapter relates the topics in this book to broader subjects, from scientific computing to the theory of
computation.

Notes On Algorithms by Sedgewick - Ch2 - Sorting

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Notes On Algorithms by Sedgewick - Ch2 - Sorting

Загружено:

Авторское право:

Доступные форматы

Ch 2 - Sorting Sedgewick, Algorithms

int N = a.length; // N is the length of the array

int N = a.length; // N is the length of the array

Shell Sort Exchanges: ? ?? - fast unless array size is huge

int N = a.length; // N is the length of the array

LINEARITHMIC, stable LINEARITHMIC-ISH, unstable

Вам также может понравиться

Notes On Algorithms by Sedgewick - Ch2 - Sorting

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Notes On Algorithms by Sedgewick - Ch2 - Sorting

Загружено:

Авторское право:

Доступные форматы

Ch 2 - Sorting Sedgewick, Algorithms

int N = a.length; // N is the length of the array

int N = a.length; // N is the length of the array

Shell Sort Exchanges: ?​ ?? - fast unless array size is huge

int N = a.length; // N is the length of the array

LINEARITHMIC, ​stable LINEARITHMIC-ISH​, unstable

Вам также может понравиться

Shell Sort Exchanges: ? ?? - fast unless array size is huge

LINEARITHMIC, stable LINEARITHMIC-ISH, unstable