Efficiency

For any given problem, it is quite possible that there is more than one algorithm that represents a correct solution. A good example of this is the problem of sorting. Dozens of different algorithms have been written to solve this problem. Given such a wide range of solutions, how can we determine which algorithm is the best one to use? To do this, we must analyize our algorithms is such a way that we can gauge the efficiency of the algorithm. Once we have calculated the efficiency of an algorithm, we can compare our measurements and select the best solution.


Let's analyze the three sorting algorithms in this section and determine which one was the most efficient solution for sorting the list of seven numbers in our examples. To do this we need a way to measure the efficiency of our algorithms. We can actually measure the efficiency in two different ways: space efficiency and time efficiency. An algorithm that is space-efficient uses the least amount of computer memory to solve the problem of sorting. An algorithm that is time-efficient uses the least amount of time to solve the problem of sorting. Since most of the sorting operations are comparisons, copies, and swaps, we can count these operations and use our results as a measure of time efficiency. The table below summarizes our measures of time and space efficiency in sorting.



Let's begin our analysis by determining which sort was the most space-efficient. We discussed in our previous lesson that space efficiency can be measured by calculating the number of memory cells a particular sort requires. For simplicity, we will assume that a memory cell can hold one number. This means that a sort with five numbers would require at least five memory cells just to store the numbers in the computer. The total number of memory cells needed for sorting will depend on how many additional cells the algorithm requires to order the numbers.


Now let's determine which sort was the most time-efficient. To do this, we will count the number of operations each sort performed. Most of the operations that are done by the computer during sorting fall into two groups: copying numbers or comparing numbers. The algorithm which requires the least copying and comparing is the one that will execute the fastest.


For the Insertion Sort and the Selection Sort, it will be easier to count the number of swaps that are done rather than the number of copies. Remember that the swap operation requires three copies. We can find the total number of copies that the algorithms perform by counting the number of swaps and multiplying by three. The Simple Sort does not use the swap operation, so you can count the number of copies directly.


Now that you have completed your calculations, let's summarize the results. We already know that the Insertion Sort and the Selection Sort were the most space-efficient, but we have yet to determine which sort is the most time-efficient. We will see that this answer is a little more difficult to determine.



Notice that the Simple Sort required the least amount of copies. We would expect this to be true since it does not swap numbers while sorting. Instead the numbers are copied to a new list in the computer. This is a common tradeoff between time and space. Although the Simple Sort loses space efficiency by using two lists, it gains time efficiency because less copies are required. Of course this does not mean that it is always best to use the Simple Sort to gain more speed. If we are trying to sort a list of 5 million names the Simple Sort would use too much space in the computer's memory. It would be much better to swap items within the list rather than create two lists.


For number of comparisons, the Selection Sort and Insertion Sort were nearly the same. The Simple Sort, however, required twice as many comparisons. We can see the reason for this difference by thinking about how the algorithms work. Each algorithm repeatedly searches for the smallest number and then places this number in the correct position. For the Insertion Sort and the Selection Sort, each iteration of this process reduces the unsorted section by one number. During the next search, the computer does not need to make as many comparisons to find the smallest number. The Simple Sort, however, replaces sorted numbers with a marker called MAX. Each time the computer searches for the smallest number, it must compare all seven memory cells. This approach is much less efficient.


Given the particular set of seven numbers we sorted, the Selection Sort was the most time-efficient. However, it is important to understand that this may not be true for every set of seven numbers. Consider the following example.



If we use the Insertion Sort on these numbers only 8 comparisons and 1 swap would be needed to sort them. However, if we use the Selection Sort, 21 comparisons and 1 swap would be needed. In this case, the Insertion sort is more efficient.


In our last lesson we saw the measured time efficiency of a sorting algorithm can change depending upon the order of the items being sorted. Because of this, it is common to measure the performance of sorting algorithms using the worst possible conditions or the worst case. The worse case is the particular order of items that will require the greatest number of comparisons and swaps to sort.


The measured time efficiency of an algorithm can also change as we increase the number of items sorted. For example, what would happen if we used our sorting algorithms to order a million numbers rather than seven? Which algorithm would be the most time-efficient? How could we measure the time efficiency of the sorts? To answer these questions, we could count the number of swaps and comparisons, but this would be extremely tedious. What we really need is a way of describing the number of swaps and comparisons with a formula. With such a formula we could easily recompute the time efficiency of the algorithm each time we varied the number of items to be sorted.


Once again, here are the formulas we derived for the number of comparisons and swaps the Selection Sort requires in the worst case. The number of items to be sorted is represented by n.


Notice that the formula for the number of comparisons is a summation, a series of terms added together. Each term represents the comparisons from one step of our sort. As we increase the number of items to be sorted, we also increase the number of terms in our formula. For example, suppose we were sorting six items and we wanted to know the maximum number of comparisons needed to sort them (i.e. the worst case). Here is how we would find the answer.



1. First, we write out the correct number of terms in our formula. Remember that n = 6 in this example. The last term in our formula is (n - 5) because this is equal to 1

2. Now we substitute 6 for n and sum the values.



When sorting six items with the Selection Sort, the algorithm will need to perform 15 comparisons in the worst case. Since we computed the performance in the worst case, we know that the Selection Sort will never need more than 15 comparisons regardless of how the six numbers are originally ordered.


Now that we have formulas to describe the worst case time efficiency of our sorting algorithms, we can see how the algorithms will perform as we increase the number of items to be sorted. First, let's review our formulas. Notice that the formulas for swaps have been converted to copies by multiplying by 3. This allows us to correctly compare Simple Sort with the other two sorts.



Using our formulas, we will compute the number of comparisons and copies needed to sort several lists of items. The number of items in each list increases by a power of ten



From our table of comparisons, we can see two important facts. First, the Simple Sort always requires twice as many comparisons as the Insertion and Selection Sorts. This means we can rewrite the formula for these sorts as follows: (n * (n-1)) / 2. This expression is much easier to use than the summation (n-1) + (n-2) + ... + 1, especially when n is large.


Second, all three sorts require a tremendous number of comparisons as the number of items increases. In fact, we can see with the Simple Sort that the number of comparisons is very close to n2. Take another look at the formula you derived for the number of comparison the Simple Sort requires in the worst case. If we multiply the n into the parentheses, the formula becomes n2 - n. Now you see why our total comparisons is very near n2. You can also see why the number of comparisons needed is growing so quickly.



Now let's compare the number of copies each algorithm requires. Notice how quickly the number of copies required by the Insertion Sort is growing. This makes us suspect that the formula for the Insertion Sort contains an n2 term. In fact, the formula does contain such a term, but we must rewrite the formula to see it. Using the relationship we discovered earlier, we can replace the summation (n-1) + (n-2) + ... + 1 with (n * (n-1)) / 2 and simplify.



The simplified formula shows the n2 term that we suspected. Notice that the other two formulas only have an n term rather than a n2 term. This is the reason that they grow slowly as the number of items increases.


A website which demonstrates the different Sorting algorithms