Sorting

Sorting
- CH 2 & CH 7 -
Sorting
Reasons for choosing this problem
Quite a few algorithms have been devised
that solve the problem
Learn how to choose among several algorithms and
Learn how to improve a given algorithm
One of the few problems
for which we have developed algorithms
whose time complexities are as good as lower bound

CH2(2.2, 2.4) & CH7

CH2. Divide-and-Conquer
Divide-and-Conquer :
Divide the problem into a number of subproblems
Conquer the subproblems by solving them recursively
If the subproblem sizes are small enough,
just solve the subproblems in a straightforward manner
Combine solutions to subprobs into solution for original prob

Top-down approach
Solution to a top-level instance of a problem is obtained by
going down and obtaining solutions to smaller instances
can be implemented as recursive call

2.2 Mergesort
2-way merging:
combine two sorted arrays into one sorted array

Divide array into two subarrays each with n/2 items
Conquer each subarray by sorting it
Unless array is sufficiently small, use recursion to do this
Combine the solutions to the subarrays
by merging them into a single sorted array

Ex. 27, 10, 12, 20, 25, 13, 15, 22
Mergesort Algorithm
void mergesort (int n, keytype S[]) {
const int h = n / 2, m = n - h;
keytype U[1..h], V[1..m];
if (n > 1) {
copy S[1] through S[h] to U[1] through U[h];
copy S[h+1] through S[n] to V[1] through V[m];
mergesort(h,U);
mergesort(m,V);
merge(h,m,U,V,S);
}
}
Merge
Problem: Merge two sorted arrays into one sorted array
Input: positive integers h and m
sorted arrays U[1..h] & V[1..m]
Output: an sorted array S[1..h+m],
containing the keys in U & V in a single sorted array

Ex. U=10,12,20,27, V=13,15,22,25
Merge Algorithm
void merge(h, m, U[], V[], S[]) {
index i, j, k;
i = 1; j = 1; k = 1;
while (i <= h && j <= m) {
if (U[i] < V[j]) {
S[k] = U[i]; i++;}
else {
S[k] = V[j]; j++;}
k++;
}
if (i > h) copy V[j] through V[m] to S[k] through S[h+m];
else copy U[i] through U[h] to S[k] through S[h+m];
}
Space Complexity
In-Place
Not use any extra space
beyond that needed to store the input
Algo 2.2 is not an in-place algo
because it uses arrays U & V besides input array S
It is possible to reduce the amount of extra space
to only one array containing n items
by doing much of manipulation on the input array S

Mergesort2 Algorithm
void mergesort2 (index low, index high) {
index mid;
if (low < high) {
mid = (low + high) / 2;
mergesort2(low, mid);
mergesort2(mid+1, high);
merge2(low, mid, high);
}
}
Merge2
Problem: Merge two sorted subarrays of S
created in Mergesort2
Input: indices low,mid,high
subarray of S indexed from low to high
The keys in array slots from low to mid are already sorted,
as are the keys in array slots from mid+1 to high
Output: subarray of S, indexed from low to high

Ex. S = 10,12,20,27, 13,15,22,25

Merge2 Algorithm
void merge2(index low, index mid, index high) {
index i, j, k; U[low..high];
i = low; j = mid + 1; k = low;
while (i <= mid && j <= high) {
if (S[i] < S[j]) {
U[k] = S[i]; i++; }
else {
U[k] = S[j]; j++; }
k++; }
if (i > mid) copy S[j] through S[high] to U[k] through U[high];
else copy S[i] through S[mid] to U[k] through U[high];
copy U[low] through U[high] to S[low] through S[high];
}
2.4 Quicksort(Partition Exchange Sort)
Developed by Hoare, 1962
the array is partitioned by
placing all items smaller than some pivot
before that item and
all items larger than or equal to the pivot item
after it
each partition is sorted recursively

Ex. 15(pivot), 22, 13, 27, 12, 10, 20, 25
Quicksort Algorithm
void quicksort (index low, index high) {
index pivotpoint;
if (high > low) {
partition(low, high, pivotpoint);
quicksort(low, pivotpoint-1);
quicksort(pivotpoint+1, high);
}
}

Partition
Problem: Partition the array S for Quicksort
Input: indices low, high, subarray of S
indexed from low to high
Output: pivot point for the subarray
indexed from low to high

Ex. 15(pivot), 22, 13, 27, 12, 10, 20, 25
Partition Algorithm
void partition (index low, index high, index& pivotpoint) {
index i, j; keytype pivotitem;
pivotitem = S[low]; //choose first item for pivotitem
j = low;
for (i = low + 1; i <= high; i++)
if (S[i] < pivotitem) {
j++;
exchange S[i] and S[j];
}
pivotpoint = j;
exchange S[low] and S[pivotpoint]; //put pivotitem at pivotpoint
}
Quicksort Algorithm Analysis
Worst-Case Time Complexity
B.O. : comparison of S[i] with pivotitem in partition
Input size : n, the no of items in the array S

Worst-Case Scenario: ???
T(n) = T(0) + T(n-1) + n -1 // T(0) = 0
= T(n-1) + n - 1, for n >0
= n(n-1) / 2 // Example B.16
W(n) s T(n) = n(n-1) / 2 // by using induction
e O(n
2
)
Quicksort Algorithm Analysis
Average-Case Time Complexity
Assume that
the value of pivotpoint returned by partition
is equally likely to be any of the numbers
from 1 through n

A(n) = ???
A(n) e O(n lg n) // Example B.22

Remind: Exchange Sort
void exchangesort (int n, keytype S[]) {
index i, j;

for (i = 1; i <= n-1; i++)
for (j = i+1; j <= n; j++)
if (S[j] < S[i])
exchange S[i] and S[j];
}
W(n) ???
A(n) ???
CH7. Computational Complexity - Sorting
It will take years to sort 1 billion keys using a O(n
2
) algo
Suppose someone wanted 1 billion keys to be sorted in
real-time

There are two approaches for this problem
Try to develop a more efficient algo for the prob
Try to prove that a more efficient algo is impossible
Once we have such a proof, we know that
we should quit trying to obtain a faster algorithm
Actually, we have proven that
an algorithm better than O(nlgn) is not possible
7.1 Computational Complexity
Exchange sort: O(n
2
)
This does not mean that
the problem of sorting requires n
2

The function is a property of that one algo,
not necessarily a property of the prob

Mergesort: O(n lg n)

Computational Complexity
An important question is
whether it is possible to find an even more efficient algo

Computation complexity is study of all possible algos
that can solve a given problem

A computational complexity analysis tries to
determine a lower bound
on the efficiency of all algorithms
for a given problem
Computational Complexity
Ex. Suppose lower bound for problem is O(n lg n)
It does not mean that it must be possible
to create a O(n lg n) algorithm for that problem
It means only that it is impossible
to create one that is better than O(n lg n)

Sorting problem is one of few problems
for which we have been successful in developing algos
whose time complexities are as good as lower bound
7.2 Insertion & Selection Sort
Insertion Sort

sort by inserting records in an existing sorted array
Ex) 8 4 2 7 9 5 13

void insertionsort(int n, keytype S[]) {
index i,j;
keytype x;
for (i=2; i<=n; i++) {
x = S[i];
j = i - 1;
while (j>0 && S[j]>x) {
S[j+1] = S[j];
j--; }
S[j+1] = x; } }
Insertion Sort Algorithm Analysis
Worst-Case Time Complexity No of Comparisons of Keys:
Basic Operation: comparison of S[j] with x
For a given i,
the comparison(in while-loop) is done at most i-1 times
Total no of comparisons is at most ???

i (5, 4, 3, 2, 1)
2 (4, 5, 3, 2, 1)
3 (3, 4, 5, 2, 1)
4 (2, 3, 4, 5, 1)
5 (1, 2, 3, 4, 5)

Insertion Sort Algorithm Analysis
Extra Space Analysis:
The only space usage that increases with n is
the size of the input array
Therefore, the algo is an in-place sort
The extra space is in O(1)
Selection Sort
A slight modification of
Exchange Sort
The assignments of
records are significantly
different
Simply keeps track of the
index of the current
smallest key among the
keys in the ith through the
nth slots
After determining that
record, it exchanges it
with the record in the ith
slot
void selectionsort(n, S[]){
index i,j,smallest;
for(i=1; i<=n-1; i++){
smallest = i;
for(j=i+1; j<=n; j++)
if(S[j]<S[smallest])
smallest = j;
exchange S[i] and S[smallest];
}
}
7.3 Lower Bound - algorithms that remove
at most one inversion per comparison
Because there are n! permutations of the first n positive integers,
there are n! different orderings of those integers
Denote a permutation by [k
1
, k
2
,,,, k
n
],
where k
i
is the integer at the ith position
An inversion in a permutation is
a pair (k
i
, k
j
) s.t. i<j and k
i
> k
j

A permutation contains no inversion
iff it is the sorted ordering [1, 2, 3, 4, 5, 6]
The task of sorting n distinct keys is
the removal of all inversions in a permutation
Lower Bound
Theorem 7.1
Any algorithm that
sorts n distinct keys only by comparisons of keys and
removes at most one inversion after each comparison
must
in the worst-case
do at least comparisons of keys and
on the avg
do at least comparisons of keys

2
) 1 ( - n n
4
) 1 ( - n n
Lower Bound
Proof
Case 1: Worst-Case
We need only show that
there is a permutation with n(n-1)/2 inversions,
because
when that permutation is the input,
any algo will have to remove that many inversions
and therefore do at least that many comparisons
[n, n-1,,,,2, 1]
Lower Bound
Case 2: Average-Case
We pair permutation [k
n
,,,, k
2
,

k
1
]with the permutation [k
1
, k
2
,,,, k
n
]
Let r and s be integers(between 1 and n) such that s > r

Given a permutation, the pair (s, r) is an inversion in
either the permutation or its transpose, and not in both
Then, there are n(n-1)/2 such pairs of integers between 1 and n
a permutation and its transpose have exactly n(n-1)/2 inversions

So, the avg no of inversions in a permutation and its transpose is

2
1

n(n-1)/2
Lower Bound
Therefore,
if we consider all permutations equally probable for input,
the avg no of inversions in the input is also n(n-1)/4

Because we assumed that
algo removes at most one inversion after each comparison,
on the avg
it must do at least this many comparisons
to remove all inversions
7.6 Heap Sort
Complete Binary Tree
All internal nodes have two children
All leaves have depth d
depth of a node: the no of edges in the unique path
from the root to that node
Essentially Complete Binary Tree
It is a CBT down to a depth of d - 1
The nodes with depth d are as to the left as possible
CBT
CBT(X), ECBT
X
Heap
Heap is a data structure
ECBT(Essentially Complete Binary Tree)
The value stored at each node is greater than or equal to
the values stored at its children - Heap Property

Heap
Heap Viewed as (a) a binary tree and (b) an array
16 12
8
7
2 4 1
9
3
1
2
3
4
5 6
7
8
9
10
18
18 16 12 8 7 9 3
1 2 3 4 5 6 7 8 9
2 4 1
10
parent(i) = i/2
rchild(i) = 2*i+1
lchild(i) = 2*i

SiftDown(Heap)
Insert a node at the root into 2 children Heaps
12 18
8
7
2 4 1
16
3
1
2
3
4
5 6
7
8 9
10
9
siftdown(H,1)
New value at root.
parent
Assumption: Subtrees Satisfy the Heap property
Right Child is larger
Exchange root and right child
SiftDown(Heap)
siftdown (H,i)
parent = root at ith position
largeChild = max (parents children)
while (parent.key < largeChild.key)
exchange parent.key and largeChild.key
parent = largeChild
largeChild = max (parents children)
SiftDown(Heap)
12 9
8
7
2 4 1
16
3
1
2
3
4
5 6
7
8 9
10
18 Parent
Left Child is larger
Exchange parent and left child
SiftDown(Heap)
12 16
8
7
2 4 1
9
3
1
2
3
4
5 6
7
8 9
10
18
Satisfy the heap property?
What is the run time to do siftdown?
In-Place?
MakeHeap(Heap)
makeheap( A )
for i length(A)/2 downto 1
do siftdown(A, i)

17 3 2 8 7 9 12
1 2 3 4 5 6 7 8 9
16 4 18
10
A.length/2 = 5
3 2
8
7
16 4 18
9
12
1
2
3
4
5 6
7
8 9
10
17
Algorithm starts here in building heaps.
siftdown makes it a heap
MakeHeap(Heap)
17 3 2 8 18 9 12
1 2 3 4 5 6 7 8 9
16 4 7
10
3 2
8
18
16 4 7
9
12
1
2
3
4
5 6
7
8 9
10
17
i = 4
siftdown makes this into heap
this is a heap
MakeHeap(Heap)
17 3 2 16 18 9 12
1 2 3 4 5 6 7 8 9
8 4 7
10
3 2
16
18
8 4 7
9
12
1
2
3
4
5 6
7
8 9
10
17
These are heaps
i = 3
Siftdown makes heap
MakeHeap(Heap)
17 3 12 16 18 9 2
1 2 3 4 5 6 7 8 9
8 4 7
10
1
i = 2
12
9
2
3
6
7
17
3
16
18
8 4 7
2
4
5
8 9
10
18
16
3
8 4 7
2
4
5
8 9
10
18
16
7
8 4 3
2
4
5
8 9
10
Siftdown
MakeHeap(Heap)
17 18 12 16 7 9 2
1 2 3 4 5 6 7 8 9
8 4 3
10
18 12
16
7
8 4 3
9
2
1
2
3
4
5 6
7
8 9
10
17
i = 1
12
16
7
8 4 3
9
2
1
2
3
4
5 6
7
8 9
10
18
17
Analysis of MakeHeap(Heap)
assume n = 2
d
(depth : d)

consider

0
1
.
.

d-1
d

Depth
0
1
2
:
j
:
d-1
#nodes
2
0
2
1
2
2
:
2
j
:
2
d-1

#sifts
d-1
d-2
d-3
:
d-j-1
:
0
Analysis of MakeHeap(Heap)
Total #sifts : at most

Actual upper bound :

Total # comp. :
HeapSort
11

9

8

2

5

3

1

5

6

4

7

6

9

8

2

5

3

1

5

6

4

7

11

Delete max
7 2 5 3 1 5 6 11 9 8 4
A 1 2 3 4 5 6 7 8 9 10 11
HeapSort
Heapsort ( A )
1. makeheap (A)
2. for i length(A) downto 2 do
3. exchange A[1] A[ i ]
4. heapSize[A] heapSize[A] -1
5. siftdown(A, 1)

What is the worst case run time?
Extra Space?
Heap Insert
(log n) time

Priority Queue
Insert & delete_max
11

9

8

2

5

3

1

5

6

4

7

10

11

9

10

8

5

3

1

5

6

4

7

2

Comparison of O(nlgn) Sort Algorithms

Heap
Quick
Merge
Space Complexity Time Complexity
Algorithm
W(n) = O(n
2
)
A(n) = O(nlgn)
W(n) = O(nlgn)
A(n) = O(nlgn)
W(n) = O(nlgn)
A(n) = O(nlgn)
O(n)
O(lgn)
O(1)
So far, the best sorting algorithms is O(n lg n) in the worst case.
CAN WE DO BETTER??
7.8 Lower Bounds for Comparison-Only Sortings

Can we develop sorting algorithms
whose time complexities are better than O(n lg n )?

As long as we limit ourselves to
sorting only by comparisons of keys,
such algorithms are not possible
Decision Trees for Sorting Algorithms

Sorting three keys(a,b,c)
At each node, a decision must be made as to which node to visit next
Sorted keys are stored at the leaves
There is a leaf for every permutation of three keys,
because the algorithm can sort every possible input of size 3
n! leaves
Lower Bounds for Comparison-Only Sortings

Lemma 7.2: The worst-case number of comparisons
done by a decision tree is equal to its depth

Lemma 7.3: If m is the no of leaves in a binary tree
and d is the depth, then d > lg m(

Proof: Using induction on d, we show first that 2
d
> m
Induction Base:
d = 0: 2
d
> 1.
Induction Hypothesis:
Assume that 2
d
> m (where m is the no of leaves),
for any binary tree with depth d
Induction Step:
We need to show that 2
d +1
> m (where m is the no of leaves),
for any binary tree with depth d + 1
2
d+1
= 2 2
d
> 2m (Induction Hypothesis)
> m (Each parent can have at most two children)
Taking log of both sides, d > lg m

Theorem 7.2: Any algorithm that sorts n distinct keys
only by comparison of keys must in the worst-case
do at least lg(n!)( comparisons of keys

Theorem 7.3: Any algorithm that sorts n distinct keys
only by comparison of keys must in the worst-case
do at least n lg n - 1.45n( comparisons of keys

Sorting - CH 2 & CH 7

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Sorting - CH 2 & CH 7

Загружено:

Авторское право:

Доступные форматы

Вам также может понравиться