CSC 105 Grinnell College Spring, 2005 An Algorithmic and Social Overview of Computer Science

# Laboratory Exercise on Run-Time for Sorting Algorithms

## Summary

Previously, you have considered the efficiency of sorting algorithms -- programs to search a collection of data for a specified item. Our work proceeded in three main steps:

1. We discussed in class how the linear search and binary search algorithms worked.
2. We analyzed the algorithms theoretically, to determine how the run time of the algorithms might change as the size of the data sets increased.
3. You conducted experiments to determine how long these algorithms took with actual data.

We now consider another common computing task: placing a collection of data in order -- in this case, in ascending order. Reordering of data as necessary, so the result is in ascending order, is called sorting. Here, we consider three sorting algorithms:

• insertion sort
• quicksort
• permutation sort

Each of these algorithms have been outlined in class.

## Insertion Sort and Quicksort

1. The insertion sort has been described in class. Write a paragraph or two explanation of how an insertion sort works.

A quicksort is an effective, but more complex, sorting algorithm. Developed by C. A. R. Hoare in 1962, the quicksort proceeds by trying to place a selected array element in its correct position, with items located before the selected element smaller than it and items located after being larger. With this basic step, the process then is repeated on the smaller first and last halves of the array.

While details require some care, the basic idea allows processing to divide the array in half with each main step -- in a way analogous to a binary search. The work involved with each main step is significantly greater for the quicksort than for the binary search. However, the approach of dividing the array in half repeated often yields considerable efficiencies.

### Efficiency and Stability

When analyzing sorting algorithms, it often happens that the analysis of efficiency depends considerably on the nature of the data. For example, if the original data set already is almost ordered, a sorting algorithm may behave rather differently than if the data set originally contains random data or is ordered in the reverse direction. For this reason, the analysis for sorting algorithms often considers separate cases, depending on the original nature of the data.

1. Consider the insertion sort:

1. Would the insertion sort be most efficient when the original data were already in ascending order, in random order, or in descending order? Briefly justify your answer.
2. Would the insertion sort be least efficient when the original data were already in ascending order, in random order, or in descending order? Briefly justify your answer.

### Experiments Regarding Insertion Sort and Quicksort

The program sortTest performs both an insertion sort and quicksort for three types of data sets: data initially already in ascending order, data in random order, and data initially in descending order. In each case, the user specifies the number of data items to be sorted. The program then sorts the three data sets and records the time required for this work. As with our searching experiments, due to the speed of the computer and the limitations of the clocking mechanism available, we repeat each experiment several times -- in this case 500 rather than the 60,000 times done in the searching experiment. As with searching, this means that times are magnified, so we can easily see differences.

The mechanics of running these search experiments are similar to those for the searching experiments:

• Log onto a computer in MathLAN, and open a terminal window.

• In the terminal window, type the command:

```
cd ~walker/105/labs
```
• The program itself can be run with the command

```
/opt/IBMJava2-142/bin/java sortTest
```
• The program will ask you to enter the size of the sequence to be searched. Suggested values might be selected from the range from 500 to 10,000, although you can choose smaller or larger sizes for the array.

• As noted for the experiments on searching, after you run the above command the first time, you can use the upward-arrow key at your keyboard to retrieve the same command again. After hitting the upward-arrow key to get the desired command, hit Return or Enter to run the program again.

In this part of the lab, you are to gather experimental data regarding the times for sorting various size arrays using both the insertion sort and quicksort. In running your experiments, you should record separately the results for the different types of data sets.

1. Run sortTest for a variety of array sizes between 500 and 10,000, recording your results in each case in a spreadsheet.

2. Use the spreadsheet to plot the times for various array sizes. The horizontal axis should indicate the size of the array, and the vertical axis should indicate time. Construct separate graphs for sorting data initially in ascending order, in random order, and in descending order. You should conduct sufficient experiments, so that a fairly consistent pattern emerges.

3. Describe (in words) the nature of the graphs you have observed.

4. Examine the graphs to determine how the efficiency of the two sorting algorithms varies with different types of initial data.

1. Which algorithm is better for data already almost ordered?
2. Which algorithm is better for random data?
3. Which algorithm is better for data initially in descending order?
4. Which algorithm seems best overall? That is, which algorithm would you choose if you did not know what type of data might be encountered?

### The Permutation Sort

The permutation sort generates and checks all permutations of a data set until it finds one that is ordered. If the permutation sort is lucky, it may happen upon an ordered arrangement quickly. If the algorithm is unlucky, the permutation sort may try all permutations of the data set before finding the ordered one.

If a data set has n elements, then the number of permutations may be computed as follows:

• Any of the n items might come first.
• With one item selected to come first, any of the remaining n-1 items might come second.
• With the first two items selected for the start, any of the remaining n-2 items might come third.
• etc.

Altogether, the number of permutations possible for a data set with n elements is n*(n-1)*(n-2)*(n-3)*...*3*2*1. This value is called n factorial, written n!. Altogether, a permutation sort might have to generate n! permutations of data before finding the one that is ordered.

For your information, the first 10 values of n! are given in the following table:

 n n! 1 1 2 2 3 6 4 24 5 120 6 720 7 5040 8 40,320 9 362,880 10 3,628,800

As you can see, the factorial function increases very rapidly!

In the permutation sort, once a permutation is generated, then it must be examined to determine if it is ordered. This typically requires about n steps -- one for each item in the array.

Altogether, with generating and checking permutations, the amount of work for a permutation sort can be proportional to n * n!

Program permutationSortTest performs a permutation sort on different types of data, given the size of the array. Note that this program does not do this work multiple times, in contrast to the previous experiments you have run. The timings you record in these experiments reflect the actual time required to perform the sort just once.

1. Run the program permutationSortTest, following the same steps followed for sortTest above. In collecting data, use array sizes of 4, 5, 6, 7, 8, 9. (You can try 10 as well, but have something else to do as you are waiting.)

2. As with insertion sort and quicksort, plot the times you experienced using a spreadsheet

3. In reviewing your data, explain how the permutation sort provides an example of an algorithm that is caught by the combinatorial explosion.

## Work To Be Turned In

• Explanations for steps 1, 2, 5, 6, and 9.
• Graph(s) for parts 4 and 8.

This laboratory exercise coordinates with Chapter 6 of Walker, Henry M., The Tao of Computing: A Down-to-earth Approach to Computer Fluency, Jones and Bartlett, 2005.

This document is available on the World Wide Web as
```    http://www.walker.cs.grinnell.edu/fluency-book/labs/run-time-sorting.shtml
```

 created December 31, 2003 last revised March 1, 2005