12 Mergesort

COMP 250 Fall 2012
lecture 12 - mergesort
Oct. 4, 2012
Mergesort
In lecture 3, we saw the insertion sort algorithm for sorting n items. We saw that, in the worst 2 n or operations. As discussed in the lecture slides, n2 case, this algorithm requires n(n21) or n 2 2 can be very prohibitively large if the number of items to be sorted is too large. e.g. if n = 220 106 , then n2 1012 . Todays machines run at about 109 operations per second (i.e. GHz), and so this means thousands of seconds to sort such a list (worst case). We now consider an alternative sorting algorithm that runs much faster in the worst case. This algorithm is called mergesort. Here is the idea. If there is just one number to sort (n = 1), then do nothing. Otherwise, partition the list of n elements into two lists of size about n/2 elements each, sort the two individual lists (recurively, using mergesort), and then merge the two sorted lists. For example, suppose we have a list < 8, 10, 3, 11, 6, 1, 9, 7, 13, 2, 5, 4, 12 > . We partition it into two lists < 8, 10, 3, 11, 6, 1 > < 9, 7, 13, 2, 5, 4, 12 > . and sort these (by applying mergesort recursively): < 1, 3, 6, 8, 10, 11 > < 2, 4, 5, 7, 9, 12, 13 > . Then, we merge these two lists 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13. Algorithm: mergesort(list) Input: list of elements that can be indexed by position Output: Sorted list if (list.length = 1) then return list else mid (list.size -1) / 2 l1 list.getElements(0,mid) l2 list.getElements(mid+1,list.size-1) l1 mergesort(l1) l2 mergesort(l2) return merge( l1, l2 ) end if
COMP 250 Fall 2012
Oct. 4, 2012
Algorithm: merge(l1, l2) Input: Sorted sequences l1 and l2 Output: Sorted sequence l containing the elements from l1 and l2 initialize empty list l while l1 is not empty & l2 is not empty do if l1.rst < l2.rst then l.addlast( l1.remove(l1.rst)) else l.addlast( l2.remove(l2.rst)) end if end while while l1 is not empty do l.addlast( l1.remove(l1.rst)) end while while l2 is not empty do l.addlast( l2.remove(l2.rst)) end while return l Note that I have written the algorithm using abstract List operations only. This has the advantage of getting you quickly to the main ideas of the algorithm: what is being computed and in which sequence. However, be aware that it has the disadvantage of hiding the implementation details, by which I mean the data structures 1 you use. As we have seen, sometimes the choice of data structure can be important. For example, compare an array versus a linked list implementation. The call getElements() is very dierent for these data structures. For the array, the partitioning into two lists could be done just by computing indices which could be passed as extra parameters to the mergesort calls. To do the merge, the most obvious way would be to copy the elements to a second array. There is a clever way to organize these partitions and copies which allows you to just use one extra array (of the same size n as the original one). But the details on how to do are not what I want to emphasize now, since they would obscure the more abstract ideas of the algorithm. So I have left them out. Similarly, for a linked list implementation, there are details that one needs to address. One example is that the partitioning of a list into two requires you to split the list at the middle. But with a linked list, you dont have immediate access to the middle element. To nd it, you have to scan for it by traversing the list from the beginning.
mergesort and the call stack

In the lecture slides, I went over al example with an set of elaborate gures showing how the partitions and merges are done. I also discussed the sequence of calls that are made to mergesort and merge and how the call stack evolves. I am not going to attempt to describe that example here. You should instead see the slides. Try to understand how the ordering of the calls (in red) is
1
rather than code in some programming language code
COMP 250 Fall 2012
Oct. 4, 2012
determined, and why the call stack evolves as I have drawn it. If you can do so, then you understand mergesort very well.
mergesort is n log n
Another point to note is that there are log n levels of the recursion, namely the number of levels is the number of times that you can divide n by 2 until you reach 1 element per list. As we will discuss a few lectures from now, the mergesort algorithm requires about n log n steps, namely at each of the log n levels of the recursion, the total number of operations you need to do is proportional to n. To appreciate the dierence between the worst case number of operations for insertion sort (say 2 n ) versus the worst case2 of mergeshort (n log n), consider the following table. n 10 210 106 220 109 230 ...
3
log n n log n 10 104 20 20 106 30 30 109 ... ...
n2 106 1012 1018 ...
Thus, the time it takes to run mergesort becomes signicantly less than the time it takes to run insertion sort, as n becomes large. Very roughly speaking, on a computer that runs 109 operations per second (which is typical these days), running mergesort on a problem of size n = 109 would take in the order of minutes, whereas running insertion sort would take centuries.
mergesort always takes cn log n operations, i.e. best case = worst case

12 Mergesort

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

12 Mergesort

Загружено:

Авторское право:

Доступные форматы

COMP 250 Fall 2012

COMP 250 Fall 2012

mergesort and the call stack

rather than code in some programming language code

COMP 250 Fall 2012

log n n log n 10 104 20 20 106 30 30 109 ... ...

n2 106 1012 1018 ...

Вам также может понравиться