Вы находитесь на странице: 1из 98

Sorting

Definition
Sorting is a basic operation in
computer science. sorting refers to
the operation of arranging data in
some given sequence i.e. increasing
order or decreasing order

Basic terminologies in
sorting
Internal and external sorting:
Internal sorting:
An internal sort is any data sorting process that takes place
entirely within the main memory of a computer. This is possible
whenever the data to be sorted is small enough to all be held in
the main memory.
External sorting:
External sorting is required when the data being sorted does
not fit into the main memory of a computing device (usually
RAM) and a slower kind of memory (usually a hard drive)
needs
to be used.

Sorting - what for ?

Example:

Accessing (finding a specific value in)


an unsorted and a sorted array:
Find the name of a person being 10
years old:

10
36
8
35
1
CIS 068

Bart
Homer
Lisa
Marge
Maggie

Sorting - what for ?

Unsorted:

Worst case: try n rows => order of


magnitude: O(n)
Average case: try n/2 rows => O(n)

CIS 068

10
36
1
35
8

Bart
Homer
Maggie
Marge
Lisa

Sorting
what
for
?
Sorted: Binary Search

Worst case: try log(n) <= k <= log(n)+1 rows =>


O(log n)
Average case: O(log n)

(for a proof see e.g.


http://www.mcs.sdsmt.edu/~ecorwin/cs251/binavg/binavg.htm)

1
8
10
35
36
CIS 068

Maggie
Lisa
Bart
Marge
Homer

Sorting - what for ?


Sorting and accessing is faster than accessing an
unsorted dataset (if multiple (=k) queries occur):

n*log(n) + k*log(n) < k * n

(if k is big enough)

Sorting is crucial to databases, databases are crucial


to data-management, data-management is crucial to
economy, economy is ... sorting seems to be pretty
important !
The question is WHAT (name or age ?) and HOW to
sort.

CIS 068

Classification of sorting methods


Comparison-Based Methods
Insertion Sorts
Selection Sorts
Heapsort (tree sorting) in future
lesson
Exchange sorts
Bubble sort
Quick sort
Merge sorts

Distribution Methods: Radix sorting

sorts

internal

external
*Natural
*Balanced
*Poly phase

Insertio
n

*Insertion
*Shell sort

selectio
n
*Heap

Excha
nge

*Selection *Bubble
*Quick

Complexity measure
The complexity of a sorting algorithm
measures the running time as a function
of the number n of items to be sorted.
If a1,a2,..an set of data to be
sorted and b is an auxiliary lacation
1) No. of comparison, test whether ai < aj
or ai<b
2) No. of interchange, switch the contents
of ai and aj or of ai and b.
3) Assignments which done b=ai , aj=b or
a =a

Algorithm Measures
Best Case
Often when data already sorted

Worst Case
Data completely disorganised, or in reverse
order.

Average Case
Random order

Some sorting algorithms are the same for


all three cases, others can be tailored to
suit certain cases.

Quadratic Algorithms

Bubble Sort

CIS 068

Bubble sort
To pass through the array n-1 times,
where n is the number of data in the
array
For each pass:
compare each element in the array with its
successor
bubble
(int x[], int n)
{
interchange the two
elements
if they
are
for (i=0; i<n-1; i++)
not in order
for (j=0; j<n-1; j++)

The algorithm

if (x[j] > x[j+1])


SWAP (x[j], x[j+1])
}

Bubble Sort: Example

The Famous Method: Bubble Sort

CIS 068

Bubble Sort: Example


One Pass

Array after
Completion
of Each Pass

CIS 068

Bubble Sort: Algorithm


(Bubble sort) BUBBLE (DATA,N)
DATA is an array with N elements.
1. Repeat Steps 2 and 3 for K= 1 to
N-1
2.
Set PTR:=1 ;(Initialize pass
pointer PTR)
3 Repeat while PTR<=N-K
(Executes Pass)
A) If DATA((PTR)>DATA(PTR+1)
then
Interchange DATA[PTR] and
DATA[PTR+1]
(end of if)
B) Set PTR:=PTR+1
(End of inner loop)
[end of Step 1 outer loop)

Bubble Sort: Analysis


Number of comparisons (worst case):

(n-1) + (n-2) + ... + 3 + 2 + 1 O(n)

Number of comparisons (best case):

n 1 O(n)

Number of exchanges (worst case):

(n-1) + (n-2) + ... + 3 + 2 + 1 O(n)

Number of exchanges (best case):

0 O(1)

Overall worst case: O(n) + O(n) = O(n)

Quadratic Algorithms

Selection Sort

CIS 068

Selection Sort: Example

The Brute Force Method: Selection


Sort

CIS 068

Selection Sort: Algorithm


0

last = n-1

Algorithm:
For i=0 .. last -1

find smallest element M in subarray i .. last

if M != element at i: swap elements

Next i
CIS 068

( this is for BASIC-freaks !)

Selection Sort: Analysis

Number of comparisons:
(n-1) + (n-2) + ... + 3 + 2 + 1 =
n * (n-1)/2 =
(n - n )/2
O(n)

Number of exchanges (worst case):


n1
O(n)
Overall (worst case) O(n) + O(n) = O(n)
(quadratic sort)
CIS 068

Insertion sort
Insertion Sort(A[MAXSIZE],N)
Let a e an array of n elements , a temp variable to
interchange two values,k be the total no. of passes and j
be another control variable
1. Set k=1.
2. For k= 1 to (n-1)
set temp=a[k]
set j= k-1
while temp<a[j] and (j>=0) perform the following steps.
set a[j+1]=a[j]
[End of loop structure]
Assign the value of temp to a[j+1].
[End of for loop structure]
3. exit

example

25, 15, 30, 9,,99,20,26


25

15

30

99

Pass
a[1]<
15 1:25
30 a[0]
9

20

26

interchange
99
20
26

Pass 2:a[1]>a[2] remains same


15

25

30

99

20

26

Pass 3: a[3]>a[0],a[1] and a[2] so insert a[3]


before
a[0],a[5] before a[2]
9

15

25

30

99

20

26

Cont.

Pass 4:a[4] >a[3] remains same


9

15

25

30

99

20

26

Pass 5: a[5] < a[2],a[3] and a[4]


9

15

20

25

30

99

26

Pass 6: a[6]<a[4],a[5] so
9

15

20

25

26

30

99

After this we get sorted array

Insertion Sort: Analysis

Number of comparisons (worst case):


(n-1) + (n-2) + ... + 3 + 2 + 1 O(n)

Number of comparisons (best case):


n 1 O(n)

Number of exchanges (worst case):


(n-1) + (n-2) + ... + 3 + 2 + 1 O(n)

Number of exchanges (best case):


0 O(1)
Overall worst case: O(n) + O(n) = O(n)

Insertion Sort Continue


Input Size
More input more time.

Running Time
The number of primitive operations or
steps executed during a program
execution is the running time of algorithm.

Comparison of Quadratic
Sorts
Comparisons

Exchanges

Best

Worst

Best

Worst

Selectio O(n)
nSort

O(n)

O(1)

O(n)

Bubble
Sort

O(n)

O(n)

O(1)

O(n)

Insertio
n
Sort

O(n)

O(n)

O(1)

O(n)

CIS 068

Result Quadratic Algorithms


`

advantage

disadvantage

Selection
Sort

If array is in
total disorder

If array is
presorted

Bubble Sort

If array is
presorted

If array is in total
disorder

Insertion
Sort

If array is
presorted

If array is in total
disorder

Overall: O(n) is not acceptable


since there are nLog(n)
algorithms !

n Log(n) Algorithms

Quick Sort

Quick sort

It is also called partition exchange sort


In each step, the original sequence is
partitioned into 3 parts:
a. all the items less than the partitioning
element
b. the partitioning element in its final position
c. all the items greater than the partitioning
element
The partitioning process continues in the left
and right partitions

OR Quicksort Algorithm
Given an array of n elements (e.g., integers):
If array only contains one element, return
Else
pick one element to use as pivot.
Partition elements into two sub-arrays:
Elements less than or equal to pivot
Elements greater than pivot

Quicksort two sub-arrays


Return results

The partitioning in each step of


quicksort
To pick one of the elements as the partitioning
element, p, usually the first element of the
sequence
To find the proper position for p while partitioning
the sequence into 3 parts
a) it employs two indexes, down and up
b) down goes from left to right to find elements greater
than p
c) up goes from right to left to find elements less than p
d) elements found by up and down are exchanged
e) process until up and down are matched or passed each
other
f) the position of p should be pointed by up
g) exchange p with the element pointed by up

Pick Pivot Element


There are a number of ways to pick the pivot
element. In this example, we will use the first
element in the array:
40

20

10

80

60

50

30

100

Partitioning Array
Given a pivot, partition the elements of
the array such that the resulting array
consists of:
1. One sub-array that contains elements >=
pivot
2. Another sub-array that contains elements
< pivot

The sub-arrays are stored in the original


data array.
Partitioning loops through, swapping

pivot_index = 0 40
[0]

20

10

[1] [2]

too_big_index

80

60

50

[3] [4] [5]

30

100

[6] [7] [8]


too_small_index

1.

While data[too_big_index] <= data[pivot]


++too_big_index

pivot_index = 0 40
[0]

20

10

[1] [2]

too_big_index

80

60

50

[3] [4] [5]

30

100

[6] [7] [8]


too_small_index

1.

While data[too_big_index] <= data[pivot]


++too_big_index

pivot_index = 0 40

20

[0]

10

[1] [2]

80

60

50

[3] [4] [5]

too_big_index

30

100

[6] [7] [8]


too_small_index

1.
2.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index

pivot_index = 0 40
[0]

20

10

[1] [2]

80

60

50

[3] [4] [5]

too_big_index

30

100

[6] [7] [8]


too_small_index

1.
2.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index

pivot_index = 0 40
[0]

20

10

[1] [2]

80

60

50

[3] [4] [5]

too_big_index

30

100

[6] [7] [8]


too_small_index

1.
2.
3.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]

pivot_index = 0 40
[0]

20

10

[1] [2]

80

60

50

[3] [4] [5]

too_big_index

30

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

60

50

[3] [4] [5]

too_big_index

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

60

50

[3] [4] [5]

too_big_index

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

60

50

[3] [4] [5]


too_big_index

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

60

50

[3] [4] [5]


too_big_index

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

60

50

[3] [4] [5]


too_big_index

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

60

50

[3] [4] [5]


too_big_index

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.
5.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.
Swap data[too_small_index] and data[pivot_index]

pivot_index = 0 40
[0]

20

10

[1] [2]

30

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

1.
2.
3.
4.
5.

While data[too_big_index] <= data[pivot]


++too_big_index
While data[too_small_index] > data[pivot]
--too_small_index
If too_big_index < too_small_index
swap data[too_big_index] and data[too_small_index]
While too_small_index > too_big_index, go to 1.
Swap data[too_small_index] and data[pivot_index]

pivot_index = 4

7
[0]

20

10

[1] [2]

30

40

50

[3] [4] [5]


too_big_index

60

80

100

[6] [7] [8]


too_small_index

Partition Result

20

[0]

10

[1] [2]

30

40

50

[3] [4] [5]

<= data[pivot]

60

80

100

[6] [7] [8]

> data[pivot]

Recursion: Quicksort Subarrays


7

20

[0]

10

[1] [2]

30

40

50

[3] [4] [5]

<= data[pivot]

60

80

100

[6] [7] [8]

> data[pivot]

Quicksort Analysis
Assume that keys are random,
uniformly distributed.
What is best case running time?

An example trace of
25 57 48 37 12 92 86 33quicksort
Subsequent processes:
25 57 48 37 12 92 86 33
12 25 (48 37 57 92 86 33)
25 57 48 37 12 92 86 33
25 57 48 37 12 92 86 33
25 57 48 37 12 92 86 33
25 57 48 37 12 92 86 33
25 12 48 37 57 92 86 33
25 12 48 37 57 92 86 33
25 12 48 37 57 92 86 33
25 12 48 37 57 92 86 33
25 12 48 37 57 92 86 33
(12) 25 (48 37 57 92 86 33)

12
12
12
12
12
12
12
12

25 (48 37 33 92 86 57)
25 (48 37 33 92 86 57)
25 (33 37) 48 (92 86 57)
25 (33 37) 48 (92 86 57)
25()33 (37) 48 (57 86) 92()
25 33 37 48 (57 86) 92
25 33 37 48()57(86) 92
25 33 37 48 57 86 92

_ down, _ up

Performance considerations of quicksort


Quciksort got its name because it quickly
puts an element into its proper position by
employing two indexes to speed up the
partioning process and to minimize the
exchange
Each pass reduces the comparisons about
a half total number of comparisons is
about O(nlog2n)
It requires spaces for the recursive process
or stacks for an iterative process,
it is about O(log2n)

Quick Sort: Analysis

Exact analysis is beyond scope of this course

The complexity is O(n * Log(n))


Optimal case: pivot-index splits array into equal sizes
Worst Case: size left = 0, size right = n-1 (presorted list)

Interesting case: presorted list:


Nothing is done, except (n+1) * n /2 comparisons
Complexity grows up to O(n) !
The better the list is presorted, the worse the algorithm
performs !

The pivot-selection is crucial.

In practical situations, a finely tuned


implementation of quicksort beats most sort algorithms, including sort algorithms
whose theoretical complexity is O(n log n) in the worst case.

Comparison to Merge Sort:


Comparable best case performance

CIS No
068 extra memory needed

Shellsort
We can look at the list as a set of interleaved
sublists
For example, the elements in the even
locations could be one list and the elements in
the odd locations the other list
Shellsort begins by sorting many small lists,
and increases their size and decreases their
number as it continues

78

Shellsort
One technique is to use decreasing powers
of 2, so that if the list has 64 elements, the
first pass would use 32 lists of 2 elements,
the second pass would use 16 lists of 4
elements, and so on
These lists would be sorted with an
insertion sort

79

Shellsort Example
8 sublists
2 elements /
sublist
Increment =
8
4 sublists
4 elements /
sublist
Increment =
4
2 sublists
8 elements /
sublist
Increment =
2
1 sublist
16 elements /
sublist

80

Shellsort Algorithm
passes = lg N
while (passes 1) do
increment = 2passes - 1
for start = 1 to increment do
InsertionSort(list, N, start, increment)
end for
passes = passes - 1
end while

N=15
Pass 1: increment = 7, 7 calls, size =
2
Pass 2: increment = 3, 3 calls, size =
5
Pass 3: increment = 1, 1 call, size =
15

81

Use
Radix sort only applies to integers,
fixed size strings, floating points and
to "less than", "greater than" or
"lexicographic order" comparison
predicates, whereas comparison
sorts can accommodate different
orders.

Books referred
Data structure through c(a practical
approach)-G.S.Baluja
Data structur

Вам также может понравиться