ITK279 Paper1

Paul Sethi ITK279 - Califf Timing Results of Various Sorting Algorithms The sorting of data is extremely important in today's
computer-based world. We need to be able to see meaningful results from databases with thousands, or even millions of entries. Without data in sorted order, entries become random and extremely hard to find. It should be obvious that sorting data can be very time consuming and take a lot of computational power, especially when the number of entries becomes very large. Furthermore, finding the best algorithm for sorting is not a trivial task. A sorting algorithm is the exact method used to sort data with, and computer scientists and mathematicians have been creating and revising these algorithms for many years. The time and efficiency of sorting algorithms can vary greatly. The timings of different sorting algorithms will be examined. The different algorithms examined include: insertion sort, quicksort, shell sort, and bottom-up merge sort. Insertion sort is a very simple sorting algorithm that maintains two parts of the data set, one side being sorted, and the other side being unsorted. It starts the partition at the first piece of data, which is already sorted, then repeatedly moves to the next item, inserting it into its correct place until all of the items are in the sorted partition. Quicksort is a more complicated, recursive algorithm that picks a pivot element and splits the list into two smaller lists, one containing items greater than the pivot and one containing items less than the pivot. It then splits each of these lists into smaller lists and repeats the process until the whole list is sorted. Shell sort is basically an improved version of insertion sort, but instead it compares items much farther apart in the list. It makes shells of the list, each shell containing items a certain distance apart on the list, and then sorts each shell. It then repeats with different gaps between the items, until there is no gap between items, and after a final insertion sort the list is sorted. The basis of bottom-up merge sort is that it takes small parts of the list of items, sorts those, and then merges these smaller lists together. These merged lists are merged again repeatedly until the whole list is sorted. Timing tests were run on all of the mentioned sorting algorithms. Each algorithm was run on data sets containing one-thousand, ten-thousand, one-hundred thousand, and
one-million data elements. There were three different types of data sets for each different data set size: an already sorted list, a random list, and a reverse sorted list. This means that each sorting algorithm was run twelve times, three times on each of the four different sizes of lists. All data contained in the data sets were long integers. The timing data is attached and shown in Table 1. It is clear from the results of the tests that some sorting algorithms perform much better than others. Each sorting algorithm will be analyzed and compared. Insertion sort has a time complexity of O(n^2). This time complexity is much worse than the other more sophisticated sorting algorithms tested, so it is clear that this sort would run much slower. The test data provided reflects insertion sort's time complexity. On small data sets, insertion sort is acceptable, but for larger data sets it is extremely slow. The implementation of the insertion sort used was array-based. Another option is a linked-list insertion sort. A linked list allows insertion and deletion of elements into a list without having to move the rest of the elements in the list. One may think that a linked list allows for a faster insertion sort, but this is actually not the case. A linked list does not allow direct access to all elements of a list, like an array does. The list must be traversed every time access is needed to elements in the middle of the list, which makes the linked-list implementation much slower. While there is no test data provided, it is more than likely that the linked-list implementation would be much slower than the array-based implementation of insertion sort. Quicksort has a time complexity of O(n log n), which is much faster than one of the elementary sorts such as insertion sort. While this time complexity holds true much of the time, quicksort has a worst case complexity of O(n^2). This is just as slow as the elementary sorts. The worst case happens when the items in the list are already sorted in some order, reverse or regular. This is very clear from the data presented. With random numbers, quicksort performs extremely well, but when the list is ordered in some way it performs poorly. Furthermore, the quicksort program crashed on two of the data sets containing a million entries. The reason for the crash and the poor performance is the same. It is because on sorted data sets, the pivot chosen is the worst. The pivot chosen splits the data into two sets, the rest of the list and the pivot, causing very poor performance. This means that all of the items in the list will be added to the program
stack, and when this number of items can reach one-million, a stack overflow can occur. A better method for finding a pivot point may be necessary to prevent this, such as the median of three. If a better method was used to find a pivot, quicksort should perform at close to O(n log n), even on large data sets. Shell sort's time complexity is more complicated to determine. Its time complexity depends on the gap sequence used in the algorithm. The gap sequence used for these tests provided a worst case time complexity of O(n^4/3). Shell sort is technically an insertion sort, but it performs much better because of the gaps between the items sorted. A lot of time is taken in regular insertion sort doing comparisons and exchanges. The farther the item is away from its sorted place, the more comparisons and exchanges are needed to get it there. Shell sort solves this problem by comparing items much farther apart. While shell sort is a huge improvement on insertion sort, it still does not perform quite as well as the other non-elementary sorts that have a complexity of O(n log n). This is because the basis of shell sort, as stated before, is still an insertion sort. Overall though, shell sort still performs pretty well, even on large data sets. Merge sort has a best, worst, and average time complexity of O(n log n). This means that the overall performance of merge sort should be better than all of the other sorts tested, and the test data provided does reflect this. It seems that shell sort and quicksort did out perform merge sort on certain data sets. Shell sort performed better when the data sets were already sorted in regular or reverse order. Also, quicksort performed better when the data set was random. Even though this is true, the average sorting time using merge sort was still always the fastest. Quicksort may be able to out perform merge sort on average if a better method to choose the pivot was used, but this is beyond the scope of this experiment. Merge sort was found to be the best sort overall in this experiment, but the experiment still had limitations. Overall, each sorting algorithm has its own advantages and disadvantages. The time complexity of an algorithm does not necessarily dictate which algorithm is best for a specific situation. Other factors must be considered, such as how hard the algorithm is to implement, or extra memory requirements for certain algorithms. The elementary sorts are extremely easy to implement, and they will perform well on small data sets, but it is obvious that they should not be used on large data sets. If we know that our lists are close
to being sorted, then quicksort is not a good option either, because quicksort does not perform well at all on sorted lists. Shell sort is much better than insertion sort, given a good gap sequence, but it is harder to implement. Also, merge sort may seem like the best option compared to all others, but it requires a lot of extra memory or storage space. Regardless, the time complexity is a major, and most likely the largest, factor in determining what algorithm to use. Each sorting algorithm still may have some use, but in today's world, an O(n log n) algorithm is very likely to be the best choice.

ITK279 Paper1

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

ITK279 Paper1

Загружено:

Авторское право:

Доступные форматы

Paul Sethi ITK279 - Califf Timing Results of Various Sorting Algorithms The sorting of data is extremely important in today's

Вам также может понравиться