CSC 161 Grinnell College Spring, 2009
 
Imperative Problem Solving and Data Structures
 

Pictorial Loop Invariants

Goals

This laboratory exercise applies the concept of loop invariants to problems involving array structures, specifically to the binary search.

Introduction

The idea of a binary search is reasonably straightforward, but 'Binary search is one of the trickiest "simple" algorithms to program correctly.' [e.g., Wikipedia, article on Binary Search]. In this reading, we will see that loop invariants provide a helpful tool that allows us to actually get the code right.

The Binary Search Algorithm

Acknowledgement: The following four-paragraph description is a slightly edited version of Henry M. Walker, Computer Science 2: Principles of Software Engineering, Data Types, and Algorithms, Little, Brown, and Company, 1989, Section 10.1, p. 389, with programming examples translated from Pascal to C. This material is used with permission from the copyright holder.

High-level Description

The binary search involves looking for an item within an array that has already been sorted. We begin with an array of data a[0], ..., a[size-1], and we wish to search for a particular item. The approach is to look for item in the middle of the array and make inferences about where to look next. Overall, the binary search allows us to divide the amount of data under consideration in half each time.

To understand how this is done, we consider how we might look up a name in a telephone book. We begin by opening the telephone book to the middle. If we are lucky, we see the name on the page in front of us. However, even if we are unlucky, we can tell which half of the book contains the name.

Once we know which half the name is in, we turn to the middle of that half. Again, we might be lucky and find the name immediately. Otherwise, we can restrict our attention to just that part where the name must be. (We are now looking at just one-quarter of the original book.)

As we proceed in subsequent steps, we continue looking at the middle page of the section remaining, and dividing that section in halves until we find the name or until we run out of pages to look at.

Some Details of Specification

As we focus more closely on the binary search, we need to consider more clearly just what result we might want when we are done. Here are some two of the various possibilities:

Here, we ask for the second result. In practice, if data are in the first part of a large array, then the index returned will indicate where to insert a new item so the array will remain ordered; we would just slide larger elements to the right within the large array and insert the new item.

Toward a Precise Loop Invariant

To describe processing, we first translate the algorithm to a general picture:

idea of a binary search

In this picture, array elements on the left of the array have been determined to be smaller than the desired item, and elements on the right have been determined to be larger. The variables left and right mark the boundaries of these checked regions, and middle marks the location halfway between left and right.

Although this high-level picture presents a useful vision for the algorithm, three details require clarification:

  1. Should left indicate the item just to the left or just to the right of the boundary of checked items? That is, has processing already checked a[left] and found a[left] < item or has a[left] not yet been compared with a[item]?
    two possible left indices
  2. Should right indicate the item just to the left or just to the right of the boundary of checked items? That is, has processing already checked a[right] and found a[right] > item or has a[right] not yet been compared with a[item]?
    two possible right indices
  3. If there are an odd number of items remaining unchecked, then middle can indicate exactly the middle array element to be checked. However, if there are an even number of items, should middle be rounded up or down? In C, the two likely computations are:

       middle = (left + right) / 2; /* when dealing with integers, C rounds down */
       middle = (left+right+1) / 2; /* adding 1 ensures rounding up in C */
    

    For example, the following figure shows six unprocessed elements, so middle may be either the third or fourth element in the array segment.

    two possible middle indices

In coding the binary search, any combination of the above choices can lead to correct code. Difficulties arise, however, when a programmer does not carefully plan which picture to follow. When the meanings of variables change within the code, the code likely fails — at least in some cases, and fixing the identified errors often creates new ones.

Choosing a Loop Invariant: Version 1

To illustrate the use of pictorial loop invariants in developing code, we choose one variation of assignments from above and develop the code. Then, to show other choices also might work, we choose a different variation and develop code for that as well.

In this variation, we choose left and right to be the unprocessed items next to the boundary; we defer the choice of computation for middle until later.

Variation 1 for the main loop

Version 1: Initialization

With this choice of loop invariant, we initialize left and right to the extreme ends of the array which have not been processed:

   left = 0;
   right = size - 1;
   middle = ???  /* one of the computations above, does it matter? */

Version 1: Loop Guard

When we consider a guard for our loop, we need to decide when to continue and when to exit. To determine the right conditions, we extend our picture of the loop invariant to when the unprocessed area has shrunk to nothing:

Variation 1 termination

At first, this diagram may seem peculiar — left and right have moved past each other, but let's examine this carefully.

Translating this picture into C code, we first identify the needed condition for continuing the loop. We only stop when right < left or when we have found the desired item, so the main loop should begin:

   while ((left <= right) && (a[middle] != item)) {

Within the loop, we will compare a[middle] with item and update either left or right, but what should the update value be? In order to maintain the loop invariant, we need to change the left or right variable to an unprocessed value, and we have already checked a[middle]. Thus, we should move up or down from middle in our assignment:

   if (a[middle] < item) 
      left = middle + 1;
   else
      right = middle - 1;

Finally, what about the computation of middle? We have already noted that at the end we want middle == left. Also, from the picture, we know that at the end left = right + 1. Let's try these values for left or right in the two computations above:

   Rounding down:
     middle = (left + right) / 2;
            = (right + 1 + right) / 2  /* substitution */
            = (2*right + 1) / 2
            = right + 1/2
            = right                    /* C's integer division rounds down */

   Rounding up:
     middle = (left      + right + 1) / 2;
            = (right + 1 + right + 1) / 2  /* substitution */
            = (2*right + 2) / 2
            = right + 2/2
            = right + 1   
            = left

This shows that if we round up, middle will have the needed value, but if we round down, our computation will be off by one.

Putting all the pieces together, we get the following code based on this loop invariant:

   /* Binary Search, Version 1 */
   left = 0;
   right = size - 1;
   middle = (left + right + 1) / 2;  /* we must round up */
   while ((left <= right) && (a[middle] != item)) {
      if (a[middle] < item) 
         left = middle + 1;
      else
         right = middle - 1;
      middle = (left + right + 1) / 2;
   }

As we have discussed, middle is the index where either a[middle] == item or middle is the place to insert item to keep the array elements ordered.

Choosing a Loop Invariant: Version 2

In this variation, we choose left as in version 1, but we choose right to be the last processed item next to the boundary; as before, we defer the choice of computation for middle until later.

Variation 2 for the main loop

Version 2: Initialization

With this choice of loop invariant, we initialize left to the extreme left end of the array which have not been processed, but we must initialize right to just to the right of the array. Again, we leave computation of middle until later.

   left = 0;
   right = size;
   middle = ???  /* one of the computations above, does it matter? */

Version 2: Loop Guard

When we consider a guard for our loop, we need to decide when to continue and when to exit. To determine the right conditions, we extend our picture of the loop invariant to when the unprocessed area has shrunk to nothing:

Variation 2 termination

In this case, we want left, middle, and right all come together just after the small elements, and they designate the first large element. Again we look at the diagram carefully:

Translating this picture into C code, we first identify the needed condition for continuing the loop. We only stop when right == left or when we have found the desired item, so the main loop should begin:

   while ((left < right) && (a[middle] != item)) {

Within the loop, we will compare a[middle] with item and update either left or right, but what should the update value be? In order to maintain the loop invariant, we need to change the left variable to an unprocessed value, but we should change right a processed one. In either case, we have already checked a[middle]. This gives rise to the following assignments:

   if (a[middle] < item) 
      left = middle + 1;
   else
      right = middle;

Finally, what about the computation of middle? We have already noted that at the end we want middle == left == right. Let's try these these values for left or right in the two computations above:

   Rounding down:
     middle = (left + right) / 2;
            = (right + right) / 2  /* substitution */
            = (2*right) / 2
            = right                /* C's integer division rounds down */

   Rounding up:
     middle = (left  + right + 1) / 2;
            = (right + right + 1) / 2  /* substitution */
            = (2*right + 1) / 2
            = right + 1/2
            = right                 /* C's integer division rounds down */

This shows that we will get the same result whether we round up or down, so the choice of rounding does not seem to matter. Typically, we round down because it seems a bit simpler.

Putting all the pieces together, we get the following code based on this loop invariant:

   /* Binary Search, Version 2 */
   left = 0;
   right = size;
   middle = (left + right) / 2;  /* rounding does not matter here, so we round down for simplicity */
   while ((left < right) && (a[middle] != item)) {
      if (a[middle] < item) 
         left = middle + 1;
      else
         right = middle;
      middle = (left + right) / 2;
   }

Final Notes

  1. Both versions of code developed for this lab are available in program ~walker/c/examples/binary-searches.c. Also, it is useful to observe that both binary search algorithms ran correctly the first time they were run.

  2. We can follow a similar approach to develop code for the binary search, based on the other two loop invariants as well.

    Such code development can be the basis for wonderful test questions.


This document is available on the World Wide Web as

http://www.walker.cs.grinnell.edu/courses/161.sp09/readings/reading-loop-inv-pic..shtml

created 20 April 2008
last revised 5 October 2011
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.