CSC 223 Grinnell College Fall, 2006
 
Software Design
 

An Introduction to Dynamic Web Pages with Java

Abstract

This laboratory reviews a general framework for World-Wide-Web documents and provides experience with simple CGI programming, which allows a Web developer to tailor documents to an individual Web user.

Introduction

As a reader of this lab, you already have interacted with the World-Wide Web in something like the following sequence:

  1. Within a Web browser (e.g., Firefox, Netscape, Internet Explorer, or Mosaic), you type an address (or URL or Uniform Resource Locator), such as http://www.walker.cs.grinnell.edu/courses/223.fa05/labs/html-and-cgi.shtml .
  2. Your browser sends a request to the Web server for that address.
  3. The server finds the file on a disk drive.
  4. The server retrieves the file from the disk.
  5. The server sends the file back to your browser.
  6. Your browser interprets the file and displays it on your screen.
This sequence of events is illustrated in the following diagram.

Client-server interaction for the World-Wide Web - 1

HTML Format

As a simple example, consider the document dynamic-pages.html. For reference, the original file dynamic-pages.html is shown below:



    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
    
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
        <title>Dynamic Web Pages</title>
        <link rev="made" href="mailto:walker@cs.grinnell.edu">
    </head>
    
    <body>
    
    <center>
    <h1>Dynamic Web Pages</h1>
    </center>
    
    <p>
    A dynamic document is produced by a program that can receive
    input from a user and modify the Web page based on that input.  A typical
    mechanism for communication between a user's browser and a Web server is
    called the <i>Common Gateway Interface (CGI)</i>.  Altogether, the Common
    Gateway Interface (CGI) is a set of conventions, supported by software,
    that facilitate the writing of programs that generate World Wide Web
    documents. CGI programs can be written in almost any programming language;
    naturally, though, we'll use Java.
    </p>
    
    <p>
    With CGI programming, the sequence of events for Web interaction has an
    extra step:
    </p>
    
    <ol>
    <li>
    Within a Web browser, you type a URL.
    <li>
    Your browser sends a request to the server for that address.
    <li>
    The server finds the file on a disk drive.
    <li>
    The server retrieves the file from the disk and notes that the file
    identifies a program to run.
    <li>
    The server runs the program, which produces an HTML document.
    <ul>
    <li>
    The first line of the HTML document tells both the Web server and your
    browser  that this will be a text-based html document, and thus clarifies
    how both the server and browser will communicate.
    <li>
    The rest of the HTML document contains formatting instructions and text for
    display.
    </ul>
    
    <li>
    The server sends the newly-produced HTML document back to your browser.
    <li>
    Your browser interprets the file and displays it on your screen.
    </ol>
    
    This sequence of events is illustrated in the following diagram.
    <p>
    <IMG SRC="www-2.gif" ALT="Client-server interaction for the World-Wide Web - 2"> 
    
    </body>
    </html>

In analyzing this material, all formatting commands are listed in angle brackets: < > . The first and fourth lines are special:

Beyond these two special lines, many commands apply for a section. For example, <i> indicates the beginning of a section which should be printed in an italic type face, and </i> indicates the end of the same section. As in this example, in many cases, the formatting commands at the start and at the end of a section have the same name, but an extra slash / is added to the end marker.

The following table gives some main formatting commands, many of which are illustrated in this example:

Tag Meaning
<html >begin an HTML document
<head>begin the header section
<title>begin a title
<body>begin the body of the document
<h1>begin a header1 section (headers can be h1, h2, h3, h4)
<p>begin a new paragraph
<br>break a line (begin a new line)
<b>begin bold type face
<i>begin italics type face
<hr>draw a horizontal line
<ol>begin an ordered [numbered] list
<ul>begin an unordered [bullet] list
<li>begin a new item within a list
<blockquote>display the section exactly as formatted
<img src="..." alt="...">insert an image [src gives the file name;
alt identifies text to display in case the file is not available]

[For more information about HTML, you might try the primer A beginner's guide to HTML, currently maintained by Marty Blase of the National Center for Supercomputing Applications.]

Creating and Editing an HTML Document

  1. Before anything in your MathLAN account can be accessed on the Web, you must make your home directory accessible. To do this, open a terminal window and give the command

    chmod 755 ~
    

    at the prompt. (The symbol ~ stands for your home directory.)

  2. Any materials related to the World Wide Web belong in a subdirectory of your home directory named public_html. If you have no such subdirectory, create one by giving the command

    mkdir ~/public_html
    

    in the terminal window. This directory, too, must be accessible; give the command

    chmod 755 ~/public_html
    

    to make it so.

  3. Copy the sample dynamic-pages.html page to your public_html directory in two steps: First move from your home directory to the public_html directory with the command:

     
    cd public_html 
    

    Then copy the file to your current directory (which is public_html) with the command:

    cp  ~walker/public_html/courses/223.fa05/labs/dynamic-pages.html  dynamic-pages.html
    

    The copy of the file will have the name dynamic-pages.html .

  4. Share your copy of dynamic-pages.html with the command:

    chmod 755 dynamic-pages.html
    
  5. Load this file into your Web browser by entering the URL:

    http://www.cs.grinnell.edu/~yourusername/dynamic-pages.html
    

    Note that when you specify a URL, the Web server automatically looks in your public_html directory, so you do not need to include that directory name in what you type.

  6. Edit this file, trying some variations of the wording and trying some of the formatting tags described above. At the very least, add your name and the date at the bottom of the page.

    After each modification, use the reload button on your browser to check your revised version of sample.html .

  7. Edit the file further, leaving out the initial < html > tag. Reload and describe what happens. Then reinsert this tag, and try omitting some other closing elements, reload, and describe what happens.

  8. Change the <h1> to <h2> or <h3> or <h4>, and describe what happens in each case. Do you see any progression in style or format from <h1> to <h2> to <h3> to <h4> ?

Dynamic Web Documents

Both dynamic-pages.html and virtually all of the labs for this course are static documents. That is, each document was created once with all information included at that time. Each document is static and does not adapt to user input.

In contrast, dynamic-pages.html describes pages produced by under program control that can change. Often, this approach utilizes the Common Gateway Interface (CGI).

To expand on this idea, consider the following Java program:



    /*A simple Java program that illustrates simple I/O for a browser form */
    
    public class sampleJavaProg {
      /**
       * Print the string "Hello world." to standard output.
       */
      public static void main(String[] args) 
             throws Exception {
    
          /********************** html/browser header ************************/ 
    
          System.out.println("Content-type: text/html");
          System.out.println();
          System.out.println("<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">");
          System.out.println("<html>");
          System.out.println("<head>");
          System.out.println("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\">");
          System.out.println("<title>");
          System.out.println("Lab Example");
          System.out.println("</title>");
          System.out.println("</head>");
          System.out.println("<body>");
    
          /*******************************************************************/
          /* the main text for the page */
    
          System.out.println("Welcome to the world of Internet programming!");
          System.out.println("<br>");
          System.out.println("<i>This page is created by a Java program.</i>");
    
          /********************** html/browser footer ************************/ 
    
          System.out.println("</body>");
          System.out.println("</html>");
    
      } // main(String[])
    } // sampleJavaProg


When this program runs, it prints the following


    Content-type: text/html
    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
    <head>
    
    <title>
    title
    <meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\">
    </title>
    </head>
    <body>
    <h2>Welcome to the world of Internet programming!</h2>
    <p>
    <i>This page is created by a Java program.</i>
    </body>
    </html>

In order to have this program run in response to a Web request to a server, our instructions to the Web server must specify running Java with the designated program. This is accomplished in two main steps:

  1. We type the above program into a file sampleJavaProg.java and compile it to obtain the corresponding class file.

  2. We write a special program which will tell the Web server to run our program. This is done with the following file:
    
    #!/bin/bash
    
    export CLASSPATH=":/home/walker/public_html/cgi-bin"
    /opt/jdk1.5.0/bin/java sampleJavaProg
    
    To understand this, we consider the pieces one-by-one:

    Altogether the above directions (with #!/bin/bash ) is called a cgi script which, in turn, runs a Java program ( sampleJavaProg ).

    Putting these pieces together, we run our Java program by typing the URL http://www.walker.cs.grinnell.edu/cgi-bin/sample-java-prog.cgi into our browser.

    By clicking on this link, we call the instructions (with the Bash-shell), which in turn runs our program.

Experimenting with CGI Scripts

By convention on Linux/Unix systems, cgi scripts are placed in a subdirectory cgi-bin of public_html. This public_html directory provides some level of privacy for your files. The Web server may access, display, and/or run files found within your public_html subdirectory, but it will not access files in other subdirectories of your log-in account.

  1. Create a subdirectory cgi-bin, and set its permissions to allow Web access.

  2. Set up copies of the sample cgi and Java programs to run in your account:

    1. Copy the cgi program ~walker/public_html/cgi-bin/sample-java-prog.cgi and the Java program ~walker/public_html/cgi-bin/sampleJavaProg.java to your cgi-bin directory.
    2. Edit sample-java-prog.cgi to reflect your copy of the Java program.
    3. Compile your copy of sampleJavaProg.java.
    4. Set the permission codes for both your cgi program and your compiled version of sampleJavaProg to allow Web access.
    5. Load your copy of sample-java-prog.cgi into your browser
  3. Make a few changes in sampleJavaProg.java, recompile, and reload your browser to observe those changes.

HTML Forms and Query Strings

CGI also provides a mechanism for a browser to explicitly pass information to the Web server for use in a CGI script and program. As an example, consider the common Web application of looking up information on a designated topic within a directory. Such a capability is illustrated with the interface http://www.walker.cs.grinnell.edu/cgi-bin/fac-directory-java.html. This example allows you to retrieve information about a member of the 1998-1999 Mathematics and Computer Science Department at Grinnell College. (The 1998-1999 directory is used in this example, because two people -- Nathaniel Borenstein and Pamela Ferguson -- shared an office that year; and two people -- Emily Moore and Thomas Moore -- had different offices, but the same last name.) This information is stored in the file ~walker/public_html/cgi-bin/math-cs-faculty-98

In outline, this full interaction works in several steps:

  1. You fill in blanks in the html document http://www.walker.cs.grinnell.edu/cgi-bin/fac-directory-java.html and press the submit button. For example, you might enter "Henry" and "Walker" for the name of the instructor.
  2. The html form includes this information in a URL request in the format: http://www.walker.cs.grinnell.edu/cgi-bin/fac-directory-java.cgi?firstname=Henry&lastname=Walker. Note that the end of this URL, following the file name fac-directory-java.cgi, there is a question mark ?, followed by the data: firstname=Henry&lastname=Walker.
  3. fac-directory-java.cgi obtains this special data as a query string, which is available through the shell variable $QUERY_STRING.
  4. The CGI script fac-directory-java.cgi calls the Java program facDirectoryJava and passes along the query string.
  5. Program facDirectoryJava extracts specific name information, and looks up data in a data file http://www.walker.cs.grinnell.edu/cgi-bin/math-cs-faculty-98 .

We now look at each of these steps in somewhat more detail. The html document /~walker/cgi-bin/fac-directory-java.html contains a special formatting element, called a form which sets up the blocks for data input and the buttons for responses and which specifies what action should accompany the user's typing.


     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
     <html>
     <head>
     <title>Form test</title>
     <link rev="made" href="mailto:walker@math.grin.edu">
     </head>

     <body>
     <h1>A Directory Form </h1>
     <h1>for the Mathematics and Computer Science Department</h1>
     <h1>of Grinnell College</h1>
     <hr>

     Enter the person's first and last names below:


     <form action="fac-directory-java.cgi" method="GET">
     First Name: <input type="text" name="firstname" size=12>
     Last Name:  <input type="text" name="lastname"  size=12>

     <p>
     <input type="submit" value="Submit">
     <input type="reset" value="Reset">
     </form>

     <hr>
     <em>created November 2, 1998</em><br>
     <em>last revised March 7, 2005</em><p>

     <a href="mailto:walker@cs.grinnell.edu">(walker@cs.grinnell.edu)</a>
     </body>
     </html>

When Henry is entered into the First Name box and Walker is entered into the Last Name box, then when the user clicks the submit button, a request is generated to the Web server -- as shown in step B above. More precisely, the form generates a request to the Web server that includes both the URL and the information from the form:


http://www.walker.cs.grinnell.edu/cgi-bin/fac-directory-java.cgi?firstname=Henry&lastname=Walker

You would get the same result by typing this URL into your browser as by using the form. That is, the html form within fac-directory-java.html simply provides a convenient way to generate URL addresses with query string information added.

Program facDirectoryJava begins with the String firstname=Henry&lastname=Walker assigned parameter args[0]. Processing then proceeds in a few main steps:

  1. The Web page header is printed.

  2. Information from the directory file is read into an array directoryArray.

  3. Information from the form is retrieved from the form.

    In CGI programming, a query string usually consists of a sequence of equations separated by ampersands, with some attribute on the left-hand side of each equation and the value of that attribute on the right-hand side. For instance, in our form example, the query string had the form firstname=Henry&lastname=Walker

    In this program, the StringTokenizer class provides a convenient mechanism to split the query string into the field-name=value pieces. Substring operations then identify the field-name and the value, and this information is stored in a hashtable for convenient later retrieval.

  4. A linear search locates the individual, if present, from the array.

  5. The Web page footer is printed.

Experiments with Forms and CGI Programming

  1. Copy ~walker/public_html/cgi-bin/fac-directory-java.html, ~walker/public_html/cgi-bin/fac-directory-java.cgi, and ~walker/public_html/cgi-bin/facDirectoryJava.java to your cgi-bin directory, compile the Java program, and set the permissions to allow these files to be accessed over the Web.

  2. Load your copy of fac-directory-java.html into your browser, to determine information about Henry Walker and about John Stone. (Also, look under Nathaniel Borenstein and Pamela Ferguson -- who shared an office during the 1998-1999 academic year.)

  3. Before you precede further, review the html, cgi, and java files to be sure you know how each piece works.

  4. Modify the program facDirectoryJava to allow only the partial specification of a name. In particular, if a user enters both a first and last name (in fac-directory-java.html), then the program will respond in its current way. However, if the user enters only a last name, then the program will return all people who have the given last name.

  5. Modify the interface fac-directory-java.html and the program facDirectoryJava to retrieve all people with a given telephone number. That is, fac-directory-java.html should be revised so that it asks the user for a telephone number; program facDirectoryJava then should return all entries in the directory which match that number. (Note, this reverse lookup is very common in various Web-based directories.)

Comma-delimited Files

Data file math-cs-faculty-98 organized faculty data, by putting each data field on a separate line. In contrast, file math-cs-faculty-98-alt places all data for a faculty member on a single line, with fields separated by commas. This is called a comma-delimited file organization and is very common for many applications.

Files ~walker/public_html/cgi-bin/fac-dir-java-alt.html, ~walker/public_html/cgi-bin/fac-dir-java-alt.cgi, and ~walker/public_html/cgi-bin/facDirJavaAlt.java support this alternative file structure. In reviewing these files, the html and cgi files are almost identical to their previous counterparts; only a file name is changed. However, program facDirJavaAlt.java is different in its section that reads the file.

  1. Copy files ~walker/public_html/cgi-bin/fac-dir-java-alt.html, ~walker/public_html/cgi-bin/fac-dir-java-alt.cgi, and ~walker/public_html/cgi-bin/facDirJavaAlt.java to your cgi-bin directory, compile the Java program, and set the permissions to allow Web access.
  2. Use the fac-dir-java-alt.html interface to check that this program works as expected.
  3. Review the section of facDirJavaAlt.java and explain in a paragraph or two how this code works. For example, what initialization is needed, what processing is repeated in the loop, and why is a final code segment needed after the loop for processing a line?

Work to Turn In


This document is available on the World Wide Web as

http://www.walker.cs.grinnell.edu/courses/223.fa05/labs/html-and-cgi.shtml

created 18 September 20025
last revised 31 October 2005
Valid HTML 4.01! Valid CSS!
For more information, please contact Henry M. Walker at walker@cs.grinnell.edu.