Вы находитесь на странице: 1из 32

Chapter 12- Sorting

Sorting 12

12.1 Introduction
12.2 Types of Sorting
12.3 Bubble Sort
12.4 Insertion Sort
12.5 Selection Sort
12.6 Shell Sort
12.7 Heap Sort
12.8 Merge Sort
12.9 Quick Sort
12.10 Bucket Sort
12.11 Indirect Sorting
12.12 Summary
12.13 Key Terms
12.14 Review Questions

228
Chapter 12- Sorting

Objectives
To understand the concept of sorting
To understand the difference between internal and external sorting
To learn various sorting methods with examples, algorithms and running time

12.1 Introduction

Sorting refers to ordering data in an increasing or decreasing fashion according to some linear
relationship among the data items. Sorting can be done on names, numbers and records. Sorting
reduces the For example, it is relatively easy to look up the phone number of a friend from a
telephone dictionary because the names in the phone book have been sorted into alphabetical
order. This example clearly illustrates one of the main reasons that sorting large quantities of
information is desirable. That is, sorting greatly improves the efficiency of searching.

12.2 Types of Sorting

There are two types of sorting techniques are there.


 Internal sorting
 External sorting

In internal sorting all the data to sort is stored in memory at all times while sorting is in progress.
In external sorting data is stored outside memory (like on disk) and only loaded into memory in
small chunks. External sorting is usually applied in cases when data can't fit into memory
entirely.
Internal Sorting can be performed using several methods, they are:
 Bubble Sort
 Insertion Sort
 Selection Sort
 Shell Sort
 Heap Sort
 Merge Sort
 Quick Sort
 Bucket Sort

229
Chapter 12- Sorting

12.3 Bubble Sort

Bubble sort is a simple sorting algorithm that works by repeatedly stepping through the list to be
sorted, comparing each pair of adjacent items and swapping them if they are in the wrong order.
The pass through the list is repeated until no swaps are needed, which indicates that the list is
sorted. The algorithm gets its name from the way smaller elements "bubble" to the top of the
list. Because it only uses comparisons to operate on elements, it is a comparison sort.

Algorithm

void bubble( int a[], int n )


{
int i, j;
for(i=0;i<n;i++)
{
for(j=1;j<(n-i);j++)
{
if( a[j-1]>a[j] )
SWAP(a[j-1],a[j]);
}
}
}

Example
Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest
number using bubble sort. In each step, elements written in bold are being compared. Three
passes will be required.
First Pass:
( 51 4 2 8 ) ( 15 4 2 8 ), Here, compares the first two elements, and swaps since 5 > 1.
( 1 54 2 8 ) ( 1 45 2 8 ), Swap since 5 > 4
( 1 4 52 8 ) ( 1 4 25 8 ), Swap since 5 > 2
( 1 4 2 58 ) ( 1 4 2 58 ), Now, since these elements are already in order (8 > 5), algorithm does
not swap them.
Second Pass:
( 14 2 5 8 ) ( 14 2 5 8 )
( 1 42 5 8 ) ( 1 24 5 8 ), Swap since 4 > 2
( 1 2 45 8 ) ( 1 2 45 8 )

230
Chapter 12- Sorting

( 1 2 4 58 ) ( 1 2 4 58 )
Now, the array is already sorted, but our algorithm does not know if it is completed. The
algorithm needs one whole pass without any swap to know it is sorted.
Third Pass:
( 12 4 5 8 ) ( 12 4 5 8 )
( 1 24 5 8 ) ( 1 24 5 8 )
( 1 2 45 8 ) ( 1 2 45 8 )
( 1 2 4 58 ) ( 1 2 4 58 )

Analysis
Each of these algorithms requires n-1 passes: each pass places one item in its correct place. (The
nth is then in the correct place also.) The ith pass makes either ior n - i comparisons and moves.
So:
T(n) =1+2+3+…..+(n-1)

T(n)= n(n-1)/2

12.4 Insertion Sort

Insertion sort is a simple a comparison sorting algorithm. Insertion sorting algorithm is similar
to bubble sort. But insertion sort is more efficient than bubble sort because in insertion sort the
elements comparisons are less as compare to bubble sort. Every iteration of insertion sort
removes an element from the input data, inserting it into the correct position in the already-
sorted list, until no input elements remain. The choice of which element to remove from the
input is arbitrary, and can be made using almost any choice algorithm.

Example
Pass 1
5 4 3 2 1 ⟶4 5 3 2 1
Pass 2
4 5 3 2 1 ⟶ 4 3 5 2 1 ⟶3 4 5 2 1
Pass 3
3 4 5 2 1 ⟶ 3 4 2 5 1⟶ 3 2 4 5 1 ⟶2 3 4 5 1
Pass 4
2 3 4 5 1 ⟶ 2 3 4 1 5 ⟶ 2 3 1 4 5 ⟶ 2 1 3 4 5 ⟶1 2 3 4 5

231
Chapter 12- Sorting

Algorithm

void insertion_sort( input_type a[ ], unsigned int n )


{
unsigned int j, p;
input_type tmp;
a[0] = MIN_DATA;
for( p=2; p <= n; p++ )
{
tmp = a[p];
for( j = p; tmp < a[j-1]; j-- )
a[j] = a[j-1];
a[j] = tmp;
}
}

Analysis
The worst case input is an array sorted in reverse order. In this case every iteration of the inner
loop will scan and shift the entire sorted subsection of the array before inserting the next
2
element. For this case insertion sort has a quadratic running time (i.e., O(n )). The running time
can be bound as the total number of comparison being made in the entire execution of this
st nd
algorithm. Thus, the worst case comparisons in the respective passes are as follows: 1 ⟶ 1 2
rd
⟶ 2 3 ⟶ 3 … (n – 1)th pass ⟶ n - 1 Therefore, total comparisons = n(n -1) / 2; which implies
2
O(n ) time complexity. The best case input is an array that is already sorted. In this case
insertion sort has a linear running time (i.e., Θ(n)).

12.5 Selection Sort

Selection sort is also a simple a comparison sorting algorithm. The algorithm works as follows:
1. Find the minimum value in the list
2. Swap it with the value in the first position
3. Repeat the steps above for the remainder of the list (starting at the second position and
advancing each time)
2
Selection sort is one of the O(n ) sorting algorithms, which makes it quite inefficient for sorting
large data volumes. Selection sort is notable for its programming simplicity and it can over
perform other sorts in certain situations (see complexity analysis for more details).

232
Chapter 12- Sorting

Steps
1. 1st iteration selects the smallest element in the array, and swaps it with the first element.
2. 2nd iteration selects the 2nd smallest element (which is the smallest element of the remaining
elements) and swaps it with the 2nd element.
3. The algorithm continues until the last iteration selects the 2nd largest element and swaps it
with the 2nd last index, leaving the largest element in the last index.
Effectively, the list is divided into two parts: the sublist of items already sorted, which is built
up from left to right and is found at the beginning, and the sublist of items remaining to be
sorted, occupying the remainder of the array.

Example
66 25 12 22 11
11 25 12 22 66
11 12 25 22 66
11 12 22 25 66
11 12 22 25 66

Algorithm

SELECTION_SORT (A)
for i ← 1 to n-1 do
min j ← i;
min x ← A[i]
for j ← i + 1 to n do
If A[j] < min x then
min j ← j
min x ← A[j]
A[min j] ← A [i]
A[i] ← min x

Performance
The all inputs are worst case input for selection sort as each current element has to be compared
with the rest of unsorted array. The running time can be bound as the total number of
comparison being made in the entire execution of this algorithm. Thus, the worst case
comparisons in the respective passes are as follows:
st
1 ⟶n–1
nd
2 ⟶n–2

233
Chapter 12- Sorting


(n – 1)th pass ⟶ 1
2
Therefore total comparisons = n(n -1) / 2; which implies O(n ) time complexity.

12.6 Shell Sort

Shell sort works by comparing elements that are distant rather than adjacent elements in an
array or list where adjacent elements are compared. Shell sort uses a sequence h1, h2, …, ht
called the increment sequence. Any increment sequence is fine as long as h1 = 1 and some
other choices are better than others. Shell sort makes multiple passes through a list and sorts a
number of equally sized sets using the insertion sort. Shell sort improves on the efficiency of
insertion sort by quickly shifting values to their destination.

Shell sort is also known as diminishing increment sort. The distance between comparisons
decreases as the sorting algorithm runs until the last phase in which adjacent elements are
compared After each phase and some increment hk, for every i, we have a[ i ] ≤ a [ i + hk ] all
elements spaced hk apart are sorted. The file is said to be hk – sorted.

Empirical Analysis of Shell sort (Advantage)


 Advantage of Shell sort is that its only efficient for medium size lists. For bigger lists,
the algorithm is not the best choice. Fastest of all O(N^2) sorting algorithms.
 5 times faster than the bubble sort and a little over twice as fast as the insertion sort, its
closest competitor.
 Disadvantage of Shell sort is that it is a complex algorithm and it’s not nearly as efficient
as the merge, heap, and quick sorts.
 The shell sort is still significantly slower than the merge, heap, and quick sorts, but its
relatively simple algorithm makes it a good choice for sorting lists of less than 5000
items unless speed important. It's also an excellent choice for repetitive sorting of
smaller lists.

Example
Original 81 94 11 93 12 35 17 95 28 58 41 75 15
--------------------------------------------------------------------------------------------
After 5-sort 35 17 11 28 12 41 75 15 96 58 81 94 95
After 3-sort 28 12 11 35 15 41 58 17 94 75 81 96 95
After 1-sort 11 12 15 17 28 35 41 58 75 81 94 95 96

234
Chapter 12- Sorting

Algorithm

void shellsort( input_type a[ ], unsigned int n )


{
unsigned int i, j, increment;
input_type tmp;
for( increment = n/2; increment > 0; increment /= 2 )
for( i = increment+1; i<=n; i++ )
{
tmp = a[i];
for( j = i; j > increment; j -= increment )
if( tmp < a[j-increment] )
a[j] = a[j-increment];
else
break;
a[j] = tmp;
}
}

Analysis
 Best Case: The best case in the shell sort is when the array is already sorted in the right
order. The number of comparisons is less.
 The running time of Shell sort depends on the choice of increment sequence.
 The problem with Shell’s increments is that pairs of increments are not necessarily
relatively prime and smaller increments can have little effect.

12.7 Heap Sort

Heap sort is a comparison-based sorting algorithm which is much more efficient version of
selection sort. It also works by determining the largest (or smallest) element of the list, placing
that at the end (or beginning) of the list, then continuing with the rest of the list, but
accomplishes this task efficiently by using a data structure called a heap. Once the data list has
been made into a heap, the root node is guaranteed to be the largest(or smallest) element. When
it is removed (using deleteMin/deleteMax) and placed at the end of the list, the heap is
rearranged so the largest element of remaining moves to the root. Using the heap, finding the
next largest element takes O(log n) time, instead of O(n) for a linear scan as in simple selection

235
Chapter 12- Sorting

sort. This allows Heap sort to run in O(n log n) time.

Properties of heap data structure


 The structure property: the tree is a complete binary tree; that is, all levels of the tree,
except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not
complete, the nodes of that level are filled from left to right.
 The heap property: each node is greater than or equal to each of its children according to
a comparison predicate defined for the data structure.

Steps
Step I: The user inputs the size of the heap(within a specified limit).The program generates a
corresponding binary tree with nodes having randomly generated key Values.
Step II: Build Heap Operation: Let n be the number of nodes in the tree and i be the key of a
tree. For this, the program uses operation Heapify, when Heapify is called both the left and right
subtree of the i are Heaps. The function of Heapify is to let i settle down to a position (by
swapping itself with the larger of its children, whenever the heap property is not satisfied) till the
heap property is satisfied in the tree which was rooted at (i).
Step III: Remove maximum element: The program removes the largest element of the heap (the
root) by swapping it with the last element.
Step IV: The program executes Heapify (new root) so that the resulting tree satisfies the heap
property.
Step V: Goto step III till heap is empty

Example
Given an array of 6 elements: 15, 19, 10, 7, 17, 16, sort it in ascending order using heap sort.
Steps:
1. Consider the values of the elements as priorities and build the heap tree.
2. Start deleteMin operations, storing each deleted element at the end of the heap array.

After performing step 2, the order of the elements will be opposite to the order in the heap tree.
Hence, if we want the elements to be sorted in ascending order, we need to build the heap tree
in descending order - the greatest element will have the highest priority.
Note that we use only one array, treating its parts differently:
a. when building the heap tree, part of the array will be considered as the heap,
and the rest part - the original array.
b. when sorting, part of the array will be the heap, and the rest part - the sorted array.
This will be indicated by colors: white for the original array, blue for the heap and red for the
sorted array
Here is the array: 15, 19, 10, 7, 17, 6

236
Chapter 12- Sorting

Step 1: Building the heap tree


The array represented as a tree, complete but not ordered:

Start with the rightmost node at height 1 - the node at position 3 = Size/2.
It has one greater child and has to be percolated down. After processing array[3] the situation is:

Next comes array[2]. Its children are smaller, so no percolation is needed. The last node to be
processed is array[1]. Its left child is the greater of the children.
The item at array[1] has to be percolated down to the left, swapped with array[2]. As a result the
situation is:

The children of array[2] are greater, and item 15 has to be moved down further, swapped with
array[5].

237
Chapter 12- Sorting

Now the tree is ordered, and the binary heap is built.


Step 2: Sorting - performing deleteMax operations:
1. Delete the top element 19.
Store 19 in a temporary place. A hole is created at the top

Swap 19 with the last element of the heap. As 10 will be adjusted in the heap, its cell will no
longer be a part of the heap. Instead it becomes a cell from the sorted array. Percolate down the
hole.

2. DeleteMax the top element 17


Store 17 in a temporary place. A hole is created at the top.

238
Chapter 12- Sorting

Swap 17 with the last element of the heap. As 10 will be adjusted in the heap, its cell will no
longer be a part of the heap. Instead it becomes a cell from the sorted array. The element 10 is
less than the children of the hole, and we percolate the hole down:

3. DeleteMax 16
Store 16 in a temporary place. A hole is created at the top

239
Chapter 12- Sorting

Swap 16 with the last element of the heap. As 7 will be adjusted in the heap, its cell will no
longer be a part of the heap. Instead it becomes a cell from the sorted array. Percolate the hole
down.

4. DeleteMax the top element 15


Store 15 in a temporary location. A hole is created.

Swap 15 with the last element of the heap. As 10 will be adjusted in the heap, its cell will no
longer be a part of the heap. Instead it becomes a position from the sorted array Store 10 in the
hole (10 is greater than the children of the hole)

240
Chapter 12- Sorting

5. DeleteMax the top element 10.


Remove 10 from the heap and store it into a temporary location.

Swap 10 with the last element of the heap. As 7 will be adjusted in the heap, its cell will no
longer be a part of the heap. Instead it becomes a cell from the sorted array. Store 7 in the hole
(as the only remaining element in the heap.

7 is the last element from the heap, so now the array is sorted.

Algorithm

void heapsort( input_type a[], unsigned int n )


{
int i;
for( i=n/2; i>0; i-- ) /* build_heap */
perc_down (a, i, n );
for( i=n; i>=2; i-- )
{
swap( &a[1], &a[i] ); /* delete_max */
perc_down( a, 1, i-1 );
}
}

241
Chapter 12- Sorting

void perc_down( input_type a[], unsigned int i, unsigned int n )


{
unsigned int child;
input_type tmp;
for( tmp=a[i]; i*2<=n; i=child )
{
child = i*2;
if( ( child != n ) && ( a[child+1] > a[child] ) )
child++;
if( tmp < a[child] )
a[i] = a[child];
else
break;
}
a[i] = tmp;
}

Analysis
The basic strategy is to build a binary heap of n elements. This stage takes O(n) time. We then
perform n delete_min operations. The elements leave the heap smallest first, in sorted order. By
recording these elements in a second array and then copying the array back, we sort n elements.
Since each delete_min takes O(log n) time, the total running time is O(n log n).

12.8 Merge Sort

The merge sort algorithm is based on the classical divide-and-conquer paradigm.


DIVIDE: Partition the n-element sequence to be sorted into two subsequences of n/2 elements
each.
CONQUER: Sort the two subsequences recursively using the merge sort.
COMBINE: Merge the two sorted sorted subsequences of size n/2 each to produce the sorted
sequence consisting of n elements.

Note that recursion "bottoms out" when the sequence to be sorted is of unit length. Since every
sequence of length 1 is in sorted order, no further recursive call is necessary. The key operation
of the merge sort algorithm is the merging of the two sorted sequences in the "combine step". To
perform the merging, we use an auxiliary procedure Merge(A,p,q,r), where A is an array and p, q

242
Chapter 12- Sorting

and r are indices numbering elements of the array and it assumes that the subarrays A[p..q] and
A[q+1...r] are in sorted order. It merges them to form a single sorted subarray that replaces the
current subarray A[p..r]. Thus finally, we obtain the sorted array A[1..n], which is the solution.

Figure 12.1 – Merge sort

Figure 12.2 – Merge sort Process

243
Chapter 12- Sorting

Example

Apply merge sort to sort the following elements 38, 27, 43, 3, 9, 82, 10. Figure 12.2 shows how
the elements are to be divided and combined in the case of merge sort. Initially the list consists
of 7 elements. Divide the list into two equal halves, one half of the array contains 4 elements and
other half of the array contains 3 elements. This is a recursive process. So, the left and right array
is further subdivided into left and right half until the array contains only one element. After
completing the dividing process, next we have to combining the two portions of arrays in the
sorted order.

Figure 12.3 – Merge the two sorted array

244
Chapter 12- Sorting

Algorithm

void mergeSort(int numbers[], int temp[], int array_size)


{
m_sort(numbers, temp, 0, array_size - 1);
}

void m_sort(int numbers[], int temp[], int left, int right)


{
int mid;
if (right > left)
{
mid = (right + left) / 2;
m_sort(numbers, temp, left, mid);
m_sort(numbers, temp, mid+1, right);
merge(numbers, temp, left, mid+1, right);
}
}

void merge(int numbers[], int temp[], int left, int mid, int right)
{
int i, left_end, num_elements, tmp_pos; left_end = mid - 1;
tmp_pos = left;
num_elements = right - left + 1;
while ((left <= left_end) && (mid <= right))
{
if (numbers[left] <= numbers[mid])
{
temp[tmp_pos] = numbers[left];
tmp_pos = tmp_pos + 1;
left = left +1;
}

245
Chapter 12- Sorting

else
{
temp[tmp_pos] = numbers[mid];
tmp_pos = tmp_pos + 1;
mid = mid + 1;
}
}
while (left <= left_end)
{
temp[tmp_pos] = numbers[left];
left = left + 1;
tmp_pos = tmp_pos + 1;
}
while (mid <= right)
{
temp[tmp_pos] = numbers[mid];
mid = mid + 1;
tmp_pos = tmp_pos + 1;
}
for (i=0; i <= num_elements; i++)
{
numbers[right] = temp[right];
right = right - 1;
}
}

Analysis
Merge sort goes through the same steps - independent of the data.
T(n)= running time on a list on n elements.
T(1) = 1
T(n)= 2 times running time on a list of n/2 elements + linear merge
T(n)= 2T(n/2)+ n
Brute force method:
T(n)=2T(n/2)+n
=2[2T(n/4)+n/2]+n
=4T(n/4)+2n
T(n)=4[2T(n/8)+n/4]+2n

246
Chapter 12- Sorting

=8T(n/8)+3n
T(n)= 2**kT(n/2**k)+k*n
and there are k=log n equations to get to T(1):
T(n) = nT(1)+nlog n
T(n) = nlog n + n

Complexity
Best case analysis : O(nlogn)
Average case analysis : O(nlogn)
Worst case analysis : O(nlogn)

12.9 Quick Sort

Quick sort is a divide and conquer algorithm. Quick sort first divides a large list into two smaller
sub-lists: the low elements and the high elements. Quick sort can then recursively sort the sub-
lists.
Divide: The array >A is partitioned (rearranged) into two nonempty subarrays A[p . . q] and
A[q+1 . . r].The index q is computed as a part of this partitioning procedure.
CONQUER: The two subarrays A[p . . q] and A[q+1 . . r] are sorted by recursive calls to quick
sort.
COMBINE: Since the subarrays are sorted in place, no work is needed to combine them: the
entire array A is now sorted.

Figure 12.4 – Quick sort


The steps are:
1. Pick an element, called a pivot, from the list.
2. Reorder the list so that all elements with values less than the pivot come before the pivot,
while all elements with values greater than the pivot come after it (equal values can go
either way). After this partitioning, the pivot is in its final position. This is called the
partition operation.

247
Chapter 12- Sorting

3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements.
The base case of the recursion are lists of size zero or one, which never need to be sorted.

Example

Figure 12.5 – Example of Quick sort

Good points
 It is in-place since it uses only a small auxiliary stack.
 It requires only n log(n) time to sort n items.
 It has an extremely short inner loop
 This algorithm has been subjected to a thorough mathematical analysis; a very precise
statement can be made about performance issues.
Bad Points
 It is recursive. Especially if recursion is not available, the implementation is extremely
complicated.
 It requires quadratic (i.e., n2) time in the worst-case.

248
Chapter 12- Sorting

 It is fragile i.e., a simple mistake in the implementation can go unnoticed and cause it to
perform badly.

Algorithm

void q_sort( input_type a[], int left, int right )


{
int i, j;
input_type pivot;
if( left + CUTOFF <= right )
{
pivot = median3( a, left, right );
i=left; j=right-1;
for(;;)
{
while( a[++i] < pivot );
while( a[--j] > pivot );
if( i < j )
swap( &a[i], &a[j] );
else
break;
}
swap( &a[i], &a[right-1] ); /*restore pivot*/
q_sort( a, left, i-1 );
q_sort( a, i+1, right );
}
}

Analysis of Quick Sort


The recurrence relation for the best case performance is:
Cbest(n)=2 Cbest(n/2)+n,for n>1, Cbest(1)=0
To derive the recurrence relation,
Substitute n=2k, then we get
Cbest (2k) =2 Cbest (2k-1)+ 2k
=2[2 Cbest (2k-2)+2k-1]+ 2k
=22 Cbest (2k-2)+ 2k +2k

249
Chapter 12- Sorting

=22 [2 Cbest (2k-3)+2k-2]+ 2k +2k


=23 Cbest (2k-3)+3. 2k
………………..
For i terms,
Cbest (2k)=2i Cbest (2k-i)+i. 2k
Replace i=k then we get,
Cbest (2k)=k. 2k
Substitute n=2k,
Cbest (n)=O(nlogn)
The recurrence relation for the worst case performance is:
Cworst(n)=(n+1)+n+…+3=(n+1)(n+2)/2-3 € O(n2)
The recurrence relation for the average case performance is:
Cavg(n)=2nlogn

Complexity
Best case analysis : O(nlogn)
Average case analysis : O(nlogn)
Worst case analysis : O(n2)

12.10 Bucket Sort

Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a number
of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or
by recursively applying the bucket sorting algorithm. It is a distribution sort, and is a cousin of
radix sort in the most to least significant digit flavor. Bucket sort is a generalization of
pigeonhole sort. Since bucket sort is not a comparison sort, the Ω(n log n) lower bound is
inapplicable. The computational complexity estimates involve the number of buckets.
Bucket sort works as follows:
1. Set up an array of initially empty "buckets."
2. Scatter: Go over the original array, putting each object in its bucket.
3. Sort each non-empty bucket.
4. Gather: Visit the buckets in order and put all elements back into the original array.

Example
Apply Bucket Sort, sort the following elements 29, 25, 3, 49, 9, 37, 21, 43.The given sets of
numbers are placed into the buckets based on their range. For example, if the number is fall

250
Chapter 12- Sorting

between 0-9, then put the number into the first bucket, if the number is fall between 10-19, and
then put the numbers into the second bucket and so on.

Sort the numbers in each and every bucket and combine the elements from all the buckets.

Figure 12.6- Bucket Sort

Algorithm

Let n be the length of the input list L;


For each element i from L
if B[i] is not empty
Put A[i] into B[i] using insertion sort;
else B[i] := A[i]
Concatenate B[i .. n] into one sorted list;

Complexity
The complexity of bucket sort isn’t constant depending on the input. However in the average
case the complexity of the algorithm is O(n + k) where n is the length of the input sequence,
while k is the number of buckets. The problem is that its worst-case performance is O(n2) which
makes it as slow as bubble sort.

12.11 Indirect Sorting

Indirect Sorting is also called as external sorting. This term is used to refer to sorting methods
that are employed when the data to be sorted is too large to fit in primary memory.

251
Chapter 12- Sorting

Characteristics of External Sorting


1. During the sort, some of the data must be stored externally. Typically the data will be
stored on tape or disk.
2. The cost of accessing data is significantly greater than either bookkeeping or comparison
costs.
3. There may be severe restrictions on access. For example, if tape is used, items must be
accessed sequentially.

Criteria for Developing an External Sorting Algorithm


1. Minimize number of times an item is accessed.
2. Access items in sequential order

12.11.1. Two Way Merge


The basic external sorting algorithm uses the merge routine from merge sort. Suppose we have
four tapes, Ta1, Ta2, Tb1, Tb2, which are two input and two output tapes. Depending on the
point in the algorithm, then a and b tapes are either input tapes or output tapes.

Figure 12.7 – Two Way Merge

Initially the file is on Ta1.


N records on Ta1
M records can fit in the memory
Step 1: Break the file into blocks of size M, [N/M]+1 blocks
Step 2: Sorting the blocks:
o read a block, sort, store on Tb1
o read a block, sort, store on Tb2,
o read a block, sort, store on Tb1,
o etc, alternatively writing on Tb1 and Tb2
Each sorted block is called a run.
Each output tape will contain half of the runs
Step 3: Merge:
a. From Tb1, Tb2 to Ta1, Ta2.
Merge the first run on Tb1 and the first run on Tb2, and store the result on Ta1:

252
Chapter 12- Sorting

Read two records in main memory, compare, store the smaller on Ta1
Read the next record (from Tb1 or Tb2 - the tape that contained the record
stored on Ta1) compare, store on Ta1, etc.
Merge the second run on Tb1 and the second run on Tb2, store the result on Ta2.
Merge the third run on Tb1 and the third run on Tb2, store the result on Ta1.
etc, storing the result alternatively on Ta1 and Ta2.
Now Ta1 and Ta2 will contain sorted runs twice the size of the previous runs on Tb1 and
Tb2
b. From Ta1, Ta2 to Tb1, Tb2.
Merge the first run on Ta1 and the first run on Ta2, and store the result on Tb1.
Merge the second run on Ta1 and the second run on Ta2, store the result on Tb2
Etc, merge and store alternatively on Ta1 and Ta2.
c. Repeat the process until only one run is obtained. This would be the sorted file

Example
N = 14, M = 3 (14 records on tape Ta1, memory capacity: 3 records.)
Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43
A. Sorting of runs:
1. Read 3 records in main memory, sort them and store them on Tb1:
17, 3, 29 -> 3, 17, 29
Tb1: 3, 17, 29
2. Read the next 3 records in main memory, sort them and store them on Tb2
56, 24, 18 -> 18, 24, 56
Tb2: 18, 24, 56
3. Read the next 3 records in main memory, sort them and store them on Tb1
4, 9, 10 -> 4, 9, 10
Tb1: 3, 17, 29, 4, 9, 10
4. Read the next 3 records in main memory, sort them and store them on Tb2
6, 45, 36 -> 6, 36, 45
Tb2: 18, 24, 56, 6, 36, 45
5. Read the next 3 records in main memory, sort them and store them on Tb1
(there are only two records left)
11, 43 -> 11, 43
Tb1: 3, 17, 29,4, 9, 10, 11, 43
At the end of this process we will have three runs on Tb1 and two runs on Tb2:
Tb1: 3, 17, 29 | 4, 9, 10 | 11, 43
Tb2: 18, 24, 56 |6, 36, 45 |

253
Chapter 12- Sorting

Figure 12.8 – Data Transfer between memory and disk


B. Merging of runs
B1. Merging runs of length 3 to obtain runs of length 6.
Source tapes: Tb1 and Tb2, result on Ta1 and Ta2.
Merge the first two runs (on Tb1 and Tb2) and store the result on Ta1.
Tb1: 3, 17, 29 |4, 9, 10 | 11, 43
Tb2: 18, 24, 56 |6, 36, 45 |

Read the next record (17) from Tb1, compare with 18, and store the smaller on Ta1.

Read the next record (29) from Tb1, compare with 18, and store the smaller on Ta1.

The last stored record (18) was from Tb2. So, we read the next record from Tb2 (24), compare
with the record in main memory-29 from Tb1, and store the smaller on Ta1.

The last stored record (24) was from Tb2. So, we read the next record from Tb2 (56), compare
with the record in main memory-29 from Tb1, and store the smaller on Ta1.

254
Chapter 12- Sorting

There are no more records in the first runs on Tb1, so we write on Ta1 the record that was in
main memory (56) and the remaining records (if any) from Tb2.
Now Ta1 is:

In a similar way the second run from Tb1: 4,9,10 is merged with the second run from Tb2:
6,36,45. Then the result is
4,6,9,10,36,45 is stored on Ta2.
Thus we have the first two runs on Ta1 and Ta2, each twice the size of the original runs:

Next we merge the third runs on Tb1 and Tb2 and store the result on Ta1. Since only Tb1
contains a third run, it is copied onto Ta1:

B2. Merging runs of length 6 to obtain runs of length 12.


Source tapes: Ta1 and Ta2. Result on Tb1 and Tb2:
After merging the first two runs from Ta1 and Ta2, we get a run of length 12, stored on
Tb1:

The second set of runs is only one run, copied to Tb2

Now on each tape there is only one run. The last step is to merge these two runs and to
get the entire file sorted.
B3. Merging the last two runs.
The result is:

Number of passes: log(N/M)

In each pass the size of the runs is doubled, thus we need [log(N/M)]+1 to get to a run equal in
size to the original file. This run would be the entire file sorted.

255
Chapter 12- Sorting

In the example we needed three passes (B1, B2 and B3) because [Log(14/3)] + 1 = 3.
The algorithm requires [log(N/M)] passes plus the initial run-constructing pass. Each pass
merges runs of length r to obtain runs of length 2*r. The first runs are of length M. The last run
would be of length N.

12.11.2. Multi-way merge


The basic algorithm is 2-way merge - we use 2 output tapes. Assume that we have k tapes - then
the number of passes will be reduced - logk(N/M). At a given merge step we merge the first k
runs, then the second k runs, etc.
The task here: to find the smallest element out of k elements
Solution: priority queues
Idea: Take the smallest elements from the first k runs, store them into main memory in a heap
tree. Then repeatedly output the smallest element from the heap. The smallest element is
replaced with the next element from the run from which it came. When finished with the first set
of runs, do the same with the next set of runs.

Example
Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43
Assume that we have three tapes (k = 3) and the memory can hold three records.
A. Main memory sort
The first three records are read into memory, sorted and written on Tb1,
the second three records are read into memory, sorted and stored on Tb2,
finally the third three records are read into memory, sorted and stored on Tb3.
Now we have one run on each of the three tapes:
Tb1: 3, 17, 29
Tb2: 18, 24, 56
Tb3: 4, 9, 10
The next portion of three records is sorted into main memory
and stored as the second run on Tb1:
Tb1: 3, 17, 29, 6, 36, 45
The next portion, which is also the last one, is sorted and stored onto Tb2:
Tb2: 18, 24, 56, 11, 43
Nothing is stored on Tb3.
Thus, after the main memory sort, our tapes look like this:
Tb1: 3, 17, 29, | 6, 36, 45,
Tb2: 18, 24, 56, | 11, 43
Tb3: 4, 9, 10
B. Merging
B.1. Merging runs of length M to obtain runs of length k*M

256
Chapter 12- Sorting

In our example we merge runs of length 3


and the resulting runs would be of length 9.
a. We build a heap tree in main memory out of the first records in each tape.
These records are: 3, 18, and 4.
b. We take the smallest of them - 3, using the deleteMin operation,
and store it on tape Ta1.
The record '3' belonged to Tb1, so we read the next record from Tb1 - 17,
and insert it into the heap. Now the heap contains 18, 4, and 17.
c. The next deleteMin operation will output 4, and it will be stored on Ta1.
The record '4' comes from Tb3, so we read the next record '9' from Tb3
and insert it into the heap.
Now the heap contains 18, 17 and 9.
d. Proceeding in this way, the first three runs will be stored in sorted order on Ta1.
Ta1: 3, 4, 9, 10, 17, 18, 24, 29, 56
Now it is time to build a heap of the second three runs.
(In fact they are only two, and the run on Tb2 is not complete.)
The resulting sorted run on Ta2 will be:
Ta2: 6, 11, 36, 43, 45
This finishes the first pass.
B.2. Building runs of length k*k*M
We have now only two tapes: Ta1 and Ta2.
o We build a heap of the first elements of the two tapes - 3 and 6,
and output the smallest element '3' to tape Tb1.
o Then we read the next record from the tape where the record '3' belonged - Ta1,
and insert it into the heap.
o Now the heap contains 6 and 4, and using the deleteMin operation
the smallest record - 4 is output to tape Tb1.
Proceeding in this way, the entire file will be sorted on tape Tb1.
Tb1: 3, 4, 6, 9, 10, 11, 17, 18, 24, 29, 36, 43, 45, 56
The number of passes for the multiway merging is logk(N/M).
In the example this is [log3(14/3)] + 1 = 2.

12.11.3. Replacement Selection


An alternative algorithm, replacement selection, allows for longer runs. A buffer is allocated in
memory to act as a holding place for several records. Initially, the buffer is filled. Then, the
following steps are repeated until the input is exhausted:
 Select the record with the smallest key that is >= the key of the last record written.
 If all keys are smaller than the key of the last record written, then we have reached the
end of a run. Select the record with the smallest key for the first record of the next run.

257
Chapter 12- Sorting

 Write the selected record.


 Replace the selected record with a new record from input.

Figure 12.9 illustrates replacement selection for a small file. The beginning of the file is to the
right of each frame. To keep things simple, I've allocated a 2-record buffer. Typically, such a
buffer would hold thousands of records. We load the buffer in step B, and write the record with
the smallest key (6) in step C. This is replaced with the next record (key 8). We select the
smallest key >= 6 in step D. This is key 7. After writing key 7, we replace it with key 4. This
process repeats until step F, where our last key written was 8, and all keys are less than 8. At this
point, we terminate the run, and start another.

Figure 12.9- Replacement Selection

This strategy simply utilizes an intermediate buffer to hold values until the appropriate time for
output. Using random numbers as input, the average length of a run is twice the length of the
buffer. However, if the data is somewhat ordered, runs can be extremely long. Thus, this method
is more effective than doing partial sorts.

12.12 Summary

For most general internal sorting applications, either insertion sort, Shell sort, or quick sort will
be the method of choice, and the decision of which to use will depend mostly on the size of the
input. Two versions of quick sort are given. The first uses a simple pivoting strategy and does
not do a cutoff. Fortunately, the input files were random. The second uses median-of-three
partitioning and a cutoff of ten. There are some other optimizations to the code that are fairly
tricky to implement, and of course we could have used an assembly language. We have made an
honest attempt to code all routines efficiently, but of course the performance can vary somewhat

258
Chapter 12- Sorting

from machine to machine. The highly optimized version of quick sort is as fast as Shell sort even
for very small input sizes. The improved version of quick sort still has an O(n2) worst case (one
exercise asks you to construct a small example), but the chances of this worst case appearing are
so negligible as to not be a factor. Heap sort, although an O (n log n) algorithm with an
apparently tight inner loop, is slower than Shell sort.

12.13 Key Terms

Bubble Sort, Insertion Sort, Selection Sort, Shell Sort, Heap Sort, Merge Sort, Quick Sort ,
Bucket Sort, Indirect Sorting

12.14 Review Questions ???????????

Two Mark Questions


1. Define sorting
2. What are the types of sorting?
3. Define Internal Sorting. Give an example.
4. What is external sorting?
5. Write an algorithm for Insertion Sort.
6. Define bubble sort.
7. Define shell sort.
8. Define Heap sort.
9. What are the properties of Heap?
10. What do you mean by Quick Sort?
11. Define Quick Sort.
12. What is selection sort?
13. Define bucket sort.
Big Questions
1. Explain in detail about Merge sort with an example.
2. Using Quick sort, sort the following elements 4, 12, 65, 87, 45, 3, 67, 21, 33, 67, 8
3. What is heap sort? Apply the heap sort algorithm, sort the following elements 3,
65, 21, 7, 56, 98, 45, 22
4. Explain in detail about external sorting.

259

Вам также может понравиться