Character Data
Goals: To gain experience working with character data in Scheme.
Reference: Throughout this lab, you may want to refer to
the Revised (5)
report on the algorithmic language Scheme for information on
various procedures involving characer data.
Character Data
Scheme differentiates between symbols and characters.
Quoted characters like 'a are treated as symbols and are not
recognized as character data:
'a --> a
(symbol? 'a) --> #t
(char? 'a) --> #f
Character data in Scheme must be entered using the #\name notation
instead of quoting, where name is the name of a character:
#\a --> #\a
(symbol? #\a) --> #f
(char? #\a) --> #t
(char? #\space) --> #t
Character Coding
For many purposes, we can use characters directly. However, behind the
scenes, characters are stored in a coded form, and that form may vary from
one machine to another. Thus, Scheme provides the char->integer
procedure, that converts an individual character into its corresponding
integer representation. Similarly, the procedure integer->char
converts the integer back to a character. For example, (char->integer
#\a) gives the character coding for the lowercase letter a,
(char->integer #\G) gives the character coding for uppercase
letter G, (char->integer #\;) gives the character coding
for a semicolon, and (char->integer #\5) gives the character
coding for the digit 5. (Note the character 5 is considered to be
different from the number 5.)
Steps for this Lab:
-
Determine the character codes for various characters, including:
a
A
b
B
G
;
]
0
1
5
9
-
Are the character codes for uppercase letters the same as those for
lowercase letters?
-
How are the character codes for digits related?
Comparing Characters
Scheme allows characters to be compared in several ways. Normally, a
comparison examines the underlying code. That is, one character is
considered less than another if the code of the first is smaller than the
code for the second. Some common comparison procedures are given in the
following table:
Procedure | Comment |
char=? | Are two characters equal? |
char | Does first character come first? |
char>? | Does first character come after? |
char<=? | Is first character equal the second
or does the first come before the
second? |
char>=? | Are the characters equal or does the first
come after the second? |
-
Use the comparison
(char #\a #\b)
to check whether the character a comes before the character
b.
Next, use various comparison operations to determine whether an uppercase
A comes before, comes after, or is equal to a lowercase
a. Similarly, use these character comparison operations to
confirm the relative orders of the characters from step 1.
Ignoring Capitalization
While it sometimes is convenient to distinguish between uppercase and
lowercase letters, at other times, one wants to ignore capitalization.
Thus, Scheme also provides character predicates which are
case-insensitive:
char-ci=? | Same as char=?, but ignoring
case |
char-ci | Same as char, but
considering uppercase and lowercase
letters to be equivalent |
char-ci>? | Same as char>?, but
ignoring case |
char-ci<=? | Same as char<=?, but
ignoring case |
char-ci>=? | Same as char>=?, but
ignoring case |
-
Compare the case-insensitive versions of the comparison with various
combinations of characters, including uppercase letters, lowercase letters,
punctuation, and digits. For example, what are the results of the
following?
(char-ci #\0 #\A)
(char-ci #\0 #\a)
Try with several other pairs of characters. Conclude your experiments
with the following:
(char #\] #\A)
(char #\] #\a)
(char-ci #\] #\A)
(char-ci #\] #\a)
Describe and explain your observations.
To help process characters and strings of characters, Scheme provides
several procedures to accomplish common tasks. For example,
string->list converts a character string to a list of characters.
Similarly, list->string converts a list of characters to a string.
Such procedures allow a simple mechanism to analyze characters
individually.
For example, the following procedure counts the number of times the letter
A appears within a string of characters in either uppercase or
lowercase form:
(define count-As
(lambda (str)
;Pre-condition: str is a character string
;Post-condition: returns number of As in str
(count-As-kernel (string->list str))
)
)
(define count-As-kernel
(lambda (ls)
;Pre-condition: str is a list of characters
;Post-condition: returns number of As in ls, ignoring case differences
(cond ((null? ls) 0)
((char-ci=? #\a (car ls)) (+ 1 (count-As-kernel (cdr ls))))
(else (count-As-kernel (cdr ls)))
)
)
)
-
Check that count-As works by testing it with several strings,
including
(count-As "This is a string with 2 A's").
-
Define a procedure vowel? which has a single parameter and which
returns true if the parameter's value is a vowel and false otherwise.
-
Modify count-As to obtain a procedure count-vowels which
determines the number of vowels in a string.
-
Modify count-As and count-As-kernel, so that
count-As-kernel is tail recursive.
-
Refer to the Revised Report (5) to identify other character procedures that
are build into Scheme. Then, modify count-As to count the number
of uppercase letters that appear in a string.
-
(Optional, and a bit more of a challenge) Modify count-As
to obtain a procedure count-punc which determines the number of
punctuation marks within a string. [For this problem, you first may need
to decide just what should be considered as a punctuation mark.]
This document is available on the World Wide Web as
http://www.math.grin.edu/~walker/courses/153.sp00/lab-characters.html
created February 26, 1997
last revised February 9, 2000