CSC 161 Grinnell College Spring, 2009 Imperative Problem Solving and Data Structures

# Pictorial Loop Invariants

## Goals

This laboratory exercise applies the concept of loop invariants to problems involving array structures, specifically to the binary search.

## Introduction

The idea of a binary search is reasonably straightforward, but 'Binary search is one of the trickiest "simple" algorithms to program correctly.' [e.g., Wikipedia, article on Binary Search]. In this reading, we will see that loop invariants provide a helpful tool that allows us to actually get the code right.

## The Binary Search Algorithm

Acknowledgement: The following four-paragraph description is a slightly edited version of Henry M. Walker, Computer Science 2: Principles of Software Engineering, Data Types, and Algorithms, Little, Brown, and Company, 1989, Section 10.1, p. 389, with programming examples translated from Pascal to C. This material is used with permission from the copyright holder.

### High-level Description

The binary search involves looking for an item within an array that has already been sorted. We begin with an array of data a, ..., a[size-1], and we wish to search for a particular item. The approach is to look for item in the middle of the array and make inferences about where to look next. Overall, the binary search allows us to divide the amount of data under consideration in half each time.

To understand how this is done, we consider how we might look up a name in a telephone book. We begin by opening the telephone book to the middle. If we are lucky, we see the name on the page in front of us. However, even if we are unlucky, we can tell which half of the book contains the name.

Once we know which half the name is in, we turn to the middle of that half. Again, we might be lucky and find the name immediately. Otherwise, we can restrict our attention to just that part where the name must be. (We are now looking at just one-quarter of the original book.)

As we proceed in subsequent steps, we continue looking at the middle page of the section remaining, and dividing that section in halves until we find the name or until we run out of pages to look at.

### Some Details of Specification

As we focus more closely on the binary search, we need to consider more clearly just what result we might want when we are done. Here are some two of the various possibilities:

• Return true or false according to whether or not the item is present.
• Return the array index where value is found or the index of the first array value larger than the item. (If item is larger than all items in the array, return the array size — the index after the last array element.)

Here, we ask for the second result. In practice, if data are in the first part of a large array, then the index returned will indicate where to insert a new item so the array will remain ordered; we would just slide larger elements to the right within the large array and insert the new item.

### Toward a Precise Loop Invariant

To describe processing, we first translate the algorithm to a general picture: In this picture, array elements on the left of the array have been determined to be smaller than the desired item, and elements on the right have been determined to be larger. The variables left and right mark the boundaries of these checked regions, and middle marks the location halfway between left and right.

Although this high-level picture presents a useful vision for the algorithm, three details require clarification:

1. Should left indicate the item just to the left or just to the right of the boundary of checked items? That is, has processing already checked a[left] and found a[left] < item or has a[left] not yet been compared with a[item]? 2. Should right indicate the item just to the left or just to the right of the boundary of checked items? That is, has processing already checked a[right] and found a[right] > item or has a[right] not yet been compared with a[item]? 3. If there are an odd number of items remaining unchecked, then middle can indicate exactly the middle array element to be checked. However, if there are an even number of items, should middle be rounded up or down? In C, the two likely computations are:

```   middle = (left + right) / 2; /* when dealing with integers, C rounds down */
middle = (left+right+1) / 2; /* adding 1 ensures rounding up in C */
```

For example, the following figure shows six unprocessed elements, so middle may be either the third or fourth element in the array segment. In coding the binary search, any combination of the above choices can lead to correct code. Difficulties arise, however, when a programmer does not carefully plan which picture to follow. When the meanings of variables change within the code, the code likely fails — at least in some cases, and fixing the identified errors often creates new ones.

### Choosing a Loop Invariant: Version 1

To illustrate the use of pictorial loop invariants in developing code, we choose one variation of assignments from above and develop the code. Then, to show other choices also might work, we choose a different variation and develop code for that as well.

In this variation, we choose left and right to be the unprocessed items next to the boundary; we defer the choice of computation for middle until later. #### Version 1: Initialization

With this choice of loop invariant, we initialize left and right to the extreme ends of the array which have not been processed:

```   left = 0;
right = size - 1;
middle = ???  /* one of the computations above, does it matter? */
```

#### Version 1: Loop Guard

When we consider a guard for our loop, we need to decide when to continue and when to exit. To determine the right conditions, we extend our picture of the loop invariant to when the unprocessed area has shrunk to nothing: At first, this diagram may seem peculiar — left and right have moved past each other, but let's examine this carefully.

• All elements to the left of a[left] are smaller than item, so left must be to the right of the boundary between the small and large items.
• All elements to the right of a[right] are larger than item, so right must be to the left of the boundary.
• If left==right, there would be one unprocessed element in the middle; in this case both a[left] and a[right] would not have been examined.
• At the end, we want middle to be the location of the first item larger than item if no match occurs. Thus, if we do not find the desired item, then middle == left.

Translating this picture into C code, we first identify the needed condition for continuing the loop. We only stop when right < left or when we have found the desired item, so the main loop should begin:

```   while ((left <= right) && (a[middle] != item)) {
```

Within the loop, we will compare a[middle] with item and update either left or right, but what should the update value be? In order to maintain the loop invariant, we need to change the left or right variable to an unprocessed value, and we have already checked a[middle]. Thus, we should move up or down from middle in our assignment:

```   if (a[middle] < item)
left = middle + 1;
else
right = middle - 1;
```

Finally, what about the computation of middle? We have already noted that at the end we want middle == left. Also, from the picture, we know that at the end left = right + 1. Let's try these values for left or right in the two computations above:

```   Rounding down:
middle = (left + right) / 2;
= (right + 1 + right) / 2  /* substitution */
= (2*right + 1) / 2
= right + 1/2
= right                    /* C's integer division rounds down */

Rounding up:
middle = (left      + right + 1) / 2;
= (right + 1 + right + 1) / 2  /* substitution */
= (2*right + 2) / 2
= right + 2/2
= right + 1
= left
```

This shows that if we round up, middle will have the needed value, but if we round down, our computation will be off by one.

Putting all the pieces together, we get the following code based on this loop invariant:

```   /* Binary Search, Version 1 */
left = 0;
right = size - 1;
middle = (left + right + 1) / 2;  /* we must round up */
while ((left <= right) && (a[middle] != item)) {
if (a[middle] < item)
left = middle + 1;
else
right = middle - 1;
middle = (left + right + 1) / 2;
}
```

As we have discussed, middle is the index where either a[middle] == item or middle is the place to insert item to keep the array elements ordered.

### Choosing a Loop Invariant: Version 2

In this variation, we choose left as in version 1, but we choose right to be the last processed item next to the boundary; as before, we defer the choice of computation for middle until later. #### Version 2: Initialization

With this choice of loop invariant, we initialize left to the extreme left end of the array which have not been processed, but we must initialize right to just to the right of the array. Again, we leave computation of middle until later.

```   left = 0;
right = size;
middle = ???  /* one of the computations above, does it matter? */
```

#### Version 2: Loop Guard

When we consider a guard for our loop, we need to decide when to continue and when to exit. To determine the right conditions, we extend our picture of the loop invariant to when the unprocessed area has shrunk to nothing: In this case, we want left, middle, and right all come together just after the small elements, and they designate the first large element. Again we look at the diagram carefully:

• All elements to the left of a[left] are smaller than item, so left must be to the right of the boundary between the small and large items.
• a[right] designates the first element larger than item, so right must be to the right of the boundary.
• If left==right, all array elements will have been processed.
• At the end, we want middle to be the location of the first item larger than item if no match occurs. Thus, if we do not find the desired item, then middle == left == right.

Translating this picture into C code, we first identify the needed condition for continuing the loop. We only stop when right == left or when we have found the desired item, so the main loop should begin:

```   while ((left < right) && (a[middle] != item)) {
```

Within the loop, we will compare a[middle] with item and update either left or right, but what should the update value be? In order to maintain the loop invariant, we need to change the left variable to an unprocessed value, but we should change right a processed one. In either case, we have already checked a[middle]. This gives rise to the following assignments:

```   if (a[middle] < item)
left = middle + 1;
else
right = middle;
```

Finally, what about the computation of middle? We have already noted that at the end we want middle == left == right. Let's try these these values for left or right in the two computations above:

```   Rounding down:
middle = (left + right) / 2;
= (right + right) / 2  /* substitution */
= (2*right) / 2
= right                /* C's integer division rounds down */

Rounding up:
middle = (left  + right + 1) / 2;
= (right + right + 1) / 2  /* substitution */
= (2*right + 1) / 2
= right + 1/2
= right                 /* C's integer division rounds down */
```

This shows that we will get the same result whether we round up or down, so the choice of rounding does not seem to matter. Typically, we round down because it seems a bit simpler.

Putting all the pieces together, we get the following code based on this loop invariant:

```   /* Binary Search, Version 2 */
left = 0;
right = size;
middle = (left + right) / 2;  /* rounding does not matter here, so we round down for simplicity */
while ((left < right) && (a[middle] != item)) {
if (a[middle] < item)
left = middle + 1;
else
right = middle;
middle = (left + right) / 2;
}
```

### Final Notes

1. Both versions of code developed for this lab are available in program ~walker/c/examples/binary-searches.c. Also, it is useful to observe that both binary search algorithms ran correctly the first time they were run.

2. We can follow a similar approach to develop code for the binary search, based on the other two loop invariants as well.

Such code development can be the basis for wonderful test questions.

This document is available on the World Wide Web as

```http://www.walker.cs.grinnell.edu/courses/161.sp09/readings/reading-loop-inv-pic..shtml
```

 created 20 April 2008 last revised 5 October 2011  For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.