CSC 161 | Grinnell College | Spring, 2009 |
Imperative Problem Solving and Data Structures | ||
Our experience with Scheme through the first part of this course indicates that lists can be a particularly helpful structure for the storage and processing of a wide variety of data. Lists provide a very flexible context for processing, and unlike arrays, we do not need to specify a maximum size of a list when we create it. This reading discusses how lists might be implemented in C, following an approach analogous to the lists we studied previously in Scheme.
Since Scheme incorporates lists as a built-in data structure, Scheme supplies several built-in operations (e.g., cons, car, and cdr) for list processing, and we could program in Scheme using lists without considering mechanics of how lists were implemented. To process lists in C, however, we first need to consider some internal details of Scheme lists. We then can translate these details to C.
Lists in Scheme are implemented based on a graphical model, called a
box-and-pointer representation. The basic idea is to use a
rectangle - divided in half - to represent the result of the cons.
From the first half of the rectangle, we draw an arrow to the head of a
list; from the second half of the rectangle, we draw an arrow to the rest
of the list. For example, (cons 'a '()) would be represented as
follows:
Here, the line to a indicates that this is the head of the list.
The diagonal line through the right half of the rectangle indicates that
nothing comes later in this list. Since (cons 'a '()) gives
the list (a), this diagram represents (a) as well.
Now consider the list (cons 'b '(a)) or (b a). Here, we
draw another rectangle, where the head points to b and the tail
points to the representation of (a) that we already have seen.
The result is:
Similarly, the list (d c b a) is constructed as
(cons 'd (cons 'c (cons 'b (cons 'a '()))))and would be drawn as follows:
In computer science, this box-and-pointer representation is a primary mechanism used to describe lists — not just in Scheme, but in most contexts. An implementation of lists in C typically utilizes this graphical perspective and involves three main elements:
A typical listNode contains two elements — one for data and the other to identify the next node on a list. Since C requires that we declare the type of data fields, we must tailor the data to the application at hand. For the remainder of this lab, we assume the data will be a string. The following declarations capture these elements.
#define strMax 50 /* maximum size of an array */ typedef struct Node * listType; typedef struct Node { char data [strMax]; listType next; } listNode;
To clarify these lines, a Node is a structure with data and a next field, and listNode is a synonym for struct Node. (It is simpler to write listNode than the two keywords struct Node, and conceptually it seems cumbersome to have to write struct for each declaration.) Similarly, listType stands for struct Node * — a pointer to a Node structure.
While the struct Node or listNode provides appropriate support to build lists that implement box-and-pointer representations, the design of a list of listType may combine these listNodes in one of several ways. For example, here are some basic issues:
In Scheme, list processing follows a functional perspective: procedures such as cons, car, cdr, null?, and length take lists as parameters and return new lists or data. In C, two main choices are possible:
When modifying a list, perhaps with cons or cdr, should there be a connection between the old list and the new one; that is,
To clarify this second point, consider the Scheme statements:
(define x '(b c)) (define y (cons 'a x))
Thus, we can consider y to be the list (a b c). The following figure shows two possible structures that could result:
In the first option, the nodes of the original list are copied, and thus are explicitly distinct from those in the new list. In the second option, a new node is created for the cons node, a new value is added within that node, but the next part of that list refers to the old list.
For the example shown, both options may be reasonable. However, suppose we now change the second element of x from c to d, using Scheme's set-cdr! operation. (That is, the new x is the list (b d).) In the first option, y is not affected, while in the new approach y becomes the list (a b d). Since y refers to x when nodes are reused, any change to x also affects y. This may or may not be the desired result of changing x.
Overall both approaches have some advantages in certain cases. However, the first approach requires considerable overhead to duplicate nodes. Furthermore, in a purely functional context, lists are not altered during processing. In such a context, we could reuse nodes without fear of altering other lists unexpectedly, as old lists are never changed. Both of these observations explain why Scheme uses the second approach — reusing nodes when possible.
To illustrate how to implement Scheme-style list operations in C, program ~walker/c/lists/scheme-lists.c shows implementations of the operations cons, car, and cdr. In addition, the program contains function listInit that initializes a list and listPrint that prints the elements of a list in Scheme format. Finally, as C requires programmers to handle all issues of memory allocation and deallocation, the program contains function listDelete that deallocates all nodes in a list and then sets the list variable to NULL.
Before considering specific details of the C functions, we review some elements of C syntax, based on the box-and-pointer representation of the Scheme list (a b c).
In this diagram, first is a variable that points to a listNode. C notates this type by adding an asterisk * to the declaration:
struct Node * first;
Alternatively, since we used a typedef statement to define listNode as struct Node, we could declare first as
listNode * first;
And, since use used a typedef to define listNode * as a listType, we could define first as
listType first;
With any of these declarations, first is a pointer to a listNode, and *first accesses the listNode itself. Within this listNode, (*first).data yields the data field within the Node, and (*first).next yields the next field. Alternatively, an arrow notation accomplishes the same result in a slightly cleaner form: first->data and first->next.
With this notation, we now review various details of the C functions. Full details of these functions are in program ~walker/c/lists/scheme-lists.c.
Since a Node contains a string as data, the car function must return a pointer to a string (i.e., a char *) as its result. Altogether, we can access and return the car of a Node as:
return list->data;
The cdr operation returns the next — a pointer to a Node which has type listType. Accessing and returning this field follows the same approach as car.
For the cons, C first requires that we allocate space explicitly. C's malloc function accomplishes this task when we give it the amount of space to allocate. After allocated, we must specify that this points to a node of type listType. The relevant line is:
listType newNode = (listType)malloc(sizeof(listNode));
Once the space is allocated, we need to fill the data and next fields. Following the above discussion, the next field will point to the head of the next node. For the data field, we copy the head string into the array.
To print, we need a temporary variable listPtr that starts at the beginning of a list and then progresses node-by-node until the end. By convention in C, a pointer that does not specify any node is called NULL. Also, given one position in the list, the next node is obtained by looking in the next field. Putting these details together, the main structure of a printing loop is:
listType listPtr = list; while (listPtr != NULL) { /* printing details go here */ listPtr = listPtr->next; }
If we are to print results in the format given by Scheme, we should enclose an entire list in parentheses and separate successive list elements by a space. These details require a little care.
Initialization requires some thought and care. One approach would be to assign
first = NULL;
in the main program.
Although this will work fine, we might want to accomplish initialization in a procedure. In this case, passing first makes a copy of first within the listInit procedure. Instead, we must pass the address of first and within the function we must place NULL at that address.
Deletion requires that we explicitly deallocate space for each node and then set the first pointer to NULL. Since the last step changes the first variable, we must pass the address of first to listDelete paralleling the approach used for listInit.
For completeness, we should first deallocate space for the rest of a list before deallocating the space for the first node. (If we proceeded in the other order, once we deallocated the first node, we could not be confident that the next field had valid data, so working down the list would be unreliable.) Proceeding recursively down the list handles subsequent nodes cleanly and easily.
Finally, the deallocation of memory uses the standard C function free.
These definitions and methods combine to give program ~walker/c/lists/scheme-lists.c.
This document is available on the World Wide Web as
http://www.walker.cs.grinnell.edu/courses/161.sp09/readings/reading-lists-c-1.shtml
created 4 May 2000 last revised 15 April 2009 |
![]() ![]() |
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu. |