CSC 153: Computer Science Fundamentals | Grinnell College | Spring, 2005 |
Laboratory Exercise Reading | ||
This laboratory exercise introduces two new concepts: hash tables and inheritance.
The new topics within this reading are motivated by these factors.
During our functional problem solving with Scheme, we stored data in lists and in vectors. Lists are flexible structures, but data retrieval often requires moving step-by-step through the list. Vectors are not very flexible (their size must be declared at the start), but retrieval by index is fast. However, even vectors require slow access if a linear search is required. In Java, arrays provide the same capabilities as Scheme vectors. Retrieval by index is fast, but locating an item in an array may require a [slow] linear search. The first part of this lab introduces hash tables as a mechanism to speed storage and retrieval of unordered data.
In writing our previous Java programs, we have taken advantage of previously-existing classes to handle various tasks. The second part of this lab carries this idea one step further: modifying or expanding existing classes to meet new needs. In writing these new classes, we will not have to write any code already done in the existing classes; we will just note the desired modifications or extensions. In the jargon of object-oriented problem-solving, we say that our new classes inherit properties (data and methods) from the previous ones.
In order to frame our discussion within an application, we revisit the directory problem from the end of the lab on searching. In that lab, we considered the basic problem of storing name and telephone information and retrieving the numbers by name.
Note: Structures with these basic storage and retrieval operations arise in many contexts. For historical reasons, the indexing value often is called a key or symbol, the associated material is called a value, and the corresponding storage structure is called a symbol table.
When the key is a non-negative integer, such storage can be achieved through an array, where value[i] gives the information associated to the key i. When the key is a string, the lab on searching showed how to use parallel arrays. The Java program ~walker/java/examples/directory/DirectoryShell.java illustrates this use of parallel arrays.
In this lab, we follow yet another approach.
A hash table is a specific type of structure which supports two primary methods:
Conceptually, for a directory with names and telephone numbers, a hash table might store information in a large array-like structure, such as the following:
As suggested by the figure, the idea of a hash table is to spread the relevant directory entries throughout the array. The particular placement of an item is determined by some function, called a hash function. In practice, many such functions have been investigated.
(Aside: In the diagram, the function used computes the distance between the first letter of the first name and the first letter of the last name. To fit into the 13 spaces in the array, the distance then is taken modulo 13. For "Arnold Adelberg", the first and last names begin with the same letter, the distance between these letters is 0, and the entry for "Arnold Adelberg" is found by looking at position 0 in the array. For "Henry Walker", the letter H is 15 letters away from W. Taking this distance modulo 13 gives the remainder 2, so "Henry Walker" appears in the table by searching from position 2.)
Given a hash function, storage and retrieval from a hash table has two main steps. The hash function indicates where to start in the table. Searching begins from the specified place and continues until the item is found or the end of relevant data is reached. If the hash function spreads data out over a large array, then one can show that typical storage and retrieval operations are extremely fast and efficient.
Specific details of hash tables require considerable analysis of potential hashing functions, use of arrays, and maintenance of structures based on those arrays. With time in this course limited, such work is beyond the scope of this course.
When the keys of a symbol table are objects (for any class with a hashing function hashCode and an equals method), Java contains a predefined class Hashtable, found in class java.util. With this class already existing, we can take advantage of hash tables with little writing of code.
Java's Hashtable has several helpful methods. Here are a few basic ones (beyond creating a new one):
To illustrate how Java's Hashtable class can be used, consider the Scheme-based lab on Abstract Data Types. In that lab, we created a directory of names and telephone numbers. In particular, that lab created a directory class and utilized methods show, lookup, and add.
Program ~walker/java/examples/directory/DirectoryMain1.java achieves the similar operations using some of Java's Hashtable class. In this program, add operation translates directly to put, and lookup translates to get. lookup is related to keys, but is somewhat more complicated.
Program DirectoryMain1.java also illustrates the concept of enumerations -- an idea common to many object-oriented programming languages. Conceptually, an enumeration is simply a sequence of information. Pragmatically, an enumeration is a class which allows one to cycle through a collection of objects. Program Directory1.java shows the main elements in Java -- specifically for class java.util.Enumeration. The relevant code is
for (Enumeration e = table.elements(); e.hasMoreElements() ;) { out.println (" " + e.nextElement());
As noted earlier, table.elements() specifies a method that generates a sequence of elements -- specifically giving an object of Java's class Enumeration. Thus, the code
Enumeration e = table.keys();
creates an Enumeration variable e, and initializes it with the sequence of keys from our table. While enumerations are limited, they have two basic methods:
Program DirectoryMain1.java illustrates the most common use of enumerations -- using an enumeration in a loop to cycle through all elements in a collection.
Now that we have seen how Java's Hashtable might be helpful for a directory, we use it to build a simple directory class SimpleDirectory. A shell for such a class is found at ~walker/java/examples/directory/SimpleDirectory.java . Program ~walker/java/examples/directory/DirectoryMain2.java uses this SimpleDirectory, following the same test cases seen previously for Hashtable.
In class SimpleDirectory, the local variables table and out are listed as protected. The intention of this keyword is to limit the accessibility of table and out, so an application cannot tinker with these variables directly. Thus, conceptually, protected might be considered in a similar category as private. The details of protected access, however, are somewhat complex and thus are deferred to another lab.
While the SimpleDirectory class has some helpful capabilities, we might want to extend it by adding several methods:
Of course, one approach would be to redefine SimpleDirectory from scratch. However, most object-oriented languages, such as Java, provide a simpler way -- we simply extend the original class SimpleDirectory to get a new class BetterDirectory. This class is found in ~walker/java/examples/directory/BetterDirectory, with corresponding test program ~walker/java/examples/directory/DirectoryMain3.java.
As this example illustrates, we can extend a class in Java by defining a new class, based on the old, using an extends clause in the declaration of the new class. The new class then has access to all public and protected data of the old class. The body of the new class then contains only the different features.
When extending a class, the new class is called a subclass or derived class, and the old class is call a super class. Thus, in the example, BetterDirectory is a subclass of SimpleDirectory, and SimpleDirectory is a super class. We also say BetterDirectory inherits the variables and methods of its super class.
The SimpleDirectory and BetterDirectory classes contained a Hashtable as an internal variable. Specific public methods then were defined to provide desired operations: a constructor, add, lookup, printNames, remove, size, and PrintNumbers. Another approach derives a class AltDirectory directly from Hashtable. Since remove and size are already defined in Hashtable, these need not be redefined in AltDirectory.
Since AltDirectory is a subclass of Hashtable, all operations of Hashtable are available in AltDirectory. In contrast, BetterDirectory contains a Hashtable variable. While this variable can utilize methods of Hashtable, such methods cannot be applied directory to BetterDirectory.
Inheritance from a super class provides a collection of methods to a derived class. Suppose some of these methods are not desired in the subclass. Since the methods are already defined in the super class, methods by those names must be present in the subclass. One (inelegant) approach would be redefine the method in the subclass to do nothing. Alternatively, one might try to make the method in the derived class private, so it could not be used by applications.
This document is available on the World Wide Web as
http://www.walker.cs.grinnell.edu/courses/153.sp05/readings/reading-hashtables-inheritance.shtml
created April 16, 2000 last revised March 24, 2005 |
![]() ![]() |
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu. |