Laboratory Exercises For Computer Science 153

Character Data

Goals: To gain experience working with character data in Scheme.

Reference: Throughout this lab, you may want to refer to the Revised (5) report on the algorithmic language Scheme for information on various procedures involving characer data.

Character Data

Scheme differentiates between symbols and characters. Quoted characters like 'a are treated as symbols and are not recognized as character data:

    'a --> a
    (symbol? 'a) --> #t
    (char? 'a) --> #f
Character data in Scheme must be entered using the #\name notation instead of quoting, where name is the name of a character:

    #\a --> #\a
    (symbol? #\a) --> #f
    (char? #\a) --> #t
    (char? #\space) --> #t

Character Coding

For many purposes, we can use characters directly. However, behind the scenes, characters are stored in a coded form, and that form may vary from one machine to another. Thus, Scheme provides the char->integer procedure, that converts an individual character into its corresponding integer representation. Similarly, the procedure integer->char converts the integer back to a character. For example, (char->integer #\a) gives the character coding for the lowercase letter a, (char->integer #\G) gives the character coding for uppercase letter G, (char->integer #\;) gives the character coding for a semicolon, and (char->integer #\5) gives the character coding for the digit 5. (Note the character 5 is considered to be different from the number 5.)

Steps for this Lab:

  1. Determine the character codes for various characters, including:
    
        a
        A
        b
        B
        G
        ;
        ]
        0
        1
        5 
        9
    
    1. Are the character codes for uppercase letters the same as those for lowercase letters?
    2. How are the character codes for digits related?

Comparing Characters

Scheme allows characters to be compared in several ways. Normally, a comparison examines the underlying code. That is, one character is considered less than another if the code of the first is smaller than the code for the second. Some common comparison procedures are given in the following table:
Procedure Comment
char=? Are two characters equal?
char Does first character come first?
char>? Does first character come after?
char<=? Is first character equal the second or does the first come before the second?
char>=? Are the characters equal or does the first come after the second?

  1. Use the comparison
    
    (char
    to check whether the character a comes before the character b.

    Next, use various comparison operations to determine whether an uppercase A comes before, comes after, or is equal to a lowercase a. Similarly, use these character comparison operations to confirm the relative orders of the characters from step 1.

Ignoring Capitalization

While it sometimes is convenient to distinguish between uppercase and lowercase letters, at other times, one wants to ignore capitalization. Thus, Scheme also provides character predicates which are case-insensitive:

char-ci=? Same as char=?, but ignoring case
char-ci Same as char, but considering uppercase and lowercase letters to be equivalent
char-ci>? Same as char>?, but ignoring case
char-ci<=? Same as char<=?, but ignoring case
char-ci>=? Same as char>=?, but ignoring case

  1. Compare the case-insensitive versions of the comparison with various combinations of characters, including uppercase letters, lowercase letters, punctuation, and digits. For example, what are the results of the following?
    
    (char-ci
    Try with several other pairs of characters. Conclude your experiments with the following:
    
    (char
    Describe and explain your observations.
To help process characters and strings of characters, Scheme provides several procedures to accomplish common tasks. For example, string->list converts a character string to a list of characters. Similarly, list->string converts a list of characters to a string. Such procedures allow a simple mechanism to analyze characters individually.

For example, the following procedure counts the number of times the letter A appears within a string of characters in either uppercase or lowercase form:


(define count-As
   (lambda (str)
   ;Pre-condition:  str is a character string
   ;Post-condition:  returns number of As in str
      (count-As-kernel (string->list str))
   )
)

(define count-As-kernel
   (lambda (ls)
   ;Pre-condition:  str is a list of characters
   ;Post-condition:  returns number of As in ls, ignoring case differences
      (cond ((null? ls) 0)
            ((char-ci=? #\a (car ls)) (+ 1 (count-As-kernel (cdr ls))))
            (else (count-As-kernel (cdr ls)))
      )
   )
)
  1. Check that count-As works by testing it with several strings, including
    (count-As "This is a string with 2 A's").

  2. Define a procedure vowel? which has a single parameter and which returns true if the parameter's value is a vowel and false otherwise.

  3. Modify count-As to obtain a procedure count-vowels which determines the number of vowels in a string.

  4. Modify count-As and count-As-kernel, so that count-As-kernel is tail recursive.

  5. Refer to the Revised Report (5) to identify other character procedures that are build into Scheme. Then, modify count-As to count the number of uppercase letters that appear in a string.

  6. (Optional, and a bit more of a challenge) Modify count-As to obtain a procedure count-punc which determines the number of punctuation marks within a string. [For this problem, you first may need to decide just what should be considered as a punctuation mark.]


This document is available on the World Wide Web as

http://www.math.grin.edu/~walker/courses/153.sp00/lab-characters.html

created February 26, 1997
last revised February 9, 2000