Laboratory Exercises For Computer Science 153

Strings

Strings

Goals: This laboratory builds upon background material on character data and discusses string processing within Scheme. Such processing includes string literals, zero-based indexing, string procedures, and string predicates.

Review of Character Processing:

  1. Write a predicate vowel? which determines whether a given character is a vowel. Thus, (vowel? #\e) and (vowel? #\E) should both return true (#t), while (vowel? #\r), (vowel? #\R), and (vowel? #\?) should return false (#f).

Literal Strings: A string is a sequence of characters. The external form of a string is the characters enclosed in double-quotes:

   "This is a string!"
Special characters can be included in a string by escaping them with a back-slash:
   "Type \"stop\" to quit."
Zero-based Indexing: While much work with strings does not require access to individual characters within a string, some procedures reference positions in a string. In such cases, Scheme numbers character positions within strings starting at position 0. For example, consider the string:

   "I am very excited by the Scheme programming language!!!"
Scheme regards the first character (I) as being in position 0, followed by a blank or space character in position 1. The letters 'a' and 'm' follow in positions 2 and 3, respectively.

Some String Procedures: Some common string procedures are shown in the following table:

Procedure Sample Call Result of Example Comment
string? (string?
"sample string")
True (#t) is argument a string?
string-length (string-length
"sample string")
13 number of characters in string
string-append (string-append "Big" "Small") "BigSmall" concatenate two strings
substring (substring
"sample string" 3 10)
"ple str" extract characters from first to before second designated position from string
string-ref (string-ref
"sample string" 4)
#\l return character at given position
string->list(string->list "example") (#\e #\x #\a #\m #\p #\l #\e) makes a list of the characters in a string
list->string (list->string '(#\e #\x #\a #\m #\p #\l #\e)) "example" makes a string of the characters in a list
symbol->string (symbol->string 'example) "example" change a given symbol to a string
string->symbol (string->symbol "example") example convert a given string to a symbol

Some Comparisons of Strings: Scheme also provides various predicates to compare two strings are equal:

Procedure Comment
string=? Are two strings equal?
string Does first string come first?
string>? Does first string come after?
string<=? Is first string equal the second or does the first come before the second?
string>=? Are the strings equal or does the first come after the second?

Scheme also provides string predicates which are case-insensitive:

string-ci=? Same as string=?, but ignoring case
string-ci Same as string, but considering uppercase and lowercase letters to be equivalent
string-ci>? Same as string>?, but ignoring case
string-ci<=? Same as string<=?, but ignoring case
string-ci>=? Same as string>=?, but ignoring case

Example: Consider the problem of counting the number of vowels within a string.

Approach 1: Convert the letters of the string to a list, and recursively count the vowels on the list. This might lead to the following code (which uses vowel? from earlier in this lab).


(define number-vowels
   (lambda (str)
   ;Pre-condition:  str is a character string
   ;Post-condition:  returns number of vowels in str
      (number-vowels-kernel (string->list str))
   )
)

(define number-vowels-kernel
   (lambda (ls)
   ;Pre-condition:  ls is a list of characters
   ;Post-condition:  returns number of vowels in ls
      (cond ((null? ls) 0)
            ((vowel? (car ls)) (+ 1 (number-vowels-kernel (cdr ls))))
            (else (number-vowels-kernel (cdr ls)))
      )
   )
)
  1. Why is the work divided into two procedures, number-vowels and number-vowels-kernel?

Approach 2: Examine each letter in the string, and increase your count (from 0) each time a vowel is encountered. This approach motivates the following code, which moves position by position from the start of the string to the end:

(define number-vowels
   (lambda (str)
   ;Pre-condition:  str is a character string
   ;Post-condition:  returns number of vowels in str
      (count-vowels-by-position str 0 0)
   )
)

(define count-vowels-by-position
   (lambda (str current-count current-position)
   ;Pre-condition:  str is a character string; counts are 0
   ;Post-condition:  returns number of vowels in str
      (cond ((= current-position (string-length str)) current-count)
            ((vowel? (string-ref str current-position))
                  (count-vowels-by-position str 
                              (+ 1 current-count)
                              (+ 1 current-position)))
            (else (count-vowels-by-position str current-count
                              (+ 1 current-position)))
      )
   )
)
  1. Write a paragraph describing (in English) how this program works.
  2. In this code, characters are examined by moving from the beginning of the string to the end of the string. Rewrite this code, so processing proceeds from the end of the string to the start.

Approach 3: Proceed with recursion directly. The base case involves the empty string, which contains zero vowels. For other cases, examine the first letter and add one, if necessary, to the result of applying the procedure to the substring consisting of all letters except the first.

  1. Write a procedure which solves this problem using this third approach.

Encryption

A common approach for encoding messages involves replacing one letter by another throughout the message. Such an encoding method is called monoalphabetic substitution. As an example, consider the following encoding scheme:


Plain alphabet:   ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher alphabet:  XDQTVBKRAUGMZHYWCJOSENILPF

Now consider the message, "THIS IS A MESSAGE TO ENCODE." For each letter in the message, we encode it by looking up each letter in the plain alphabet and replacing it by the corresponding in the cipher alphabet. Characters not in the plain alphabet (e.g., punctuation) are left unchanged. Thus, the letter T is replaced by the letter S, "THIS" becomes "SRAO", and the entire message is encoded as "SRAO AO X ZVOOXKV SY VHQYTV." Note that the space and period characters are not changed.

The following procedure encodes a letter following this approach:


(define encode-char
   (lambda (ch plain cipher)
   ;Pre-condition:  ???
   ;Post-condition: ???
      (encode-char-kernel ch plain cipher 0)
   )
)

(define encode-char-kernel
   (lambda (ch plain cipher position)
   ;Pre-condition:  ???
   ;Post-condition: ???
      (cond ((= position (string-length plain)) ch)
            ((char-ci=? ch (string-ref plain position))
                 (string-ref cipher position))
            (else (encode-char-kernel ch plain cipher (+ position 1))))
   )
)

Using these procedures, a message may be encoded as follows:


(define encode-message
   (lambda (str plain cipher)
   ;Pre-condition:  str is a character string
   ;                plain and cipher are as in encode-char
   ;Post-condition: returns transformation of str 
   ;                    using monoalphabetic substitution
      (list->string (encode-message-kernel (string->list str) plain cipher))
   )
)

(define encode-message-kernel
   (lambda (lst plain cipher)
   ;Pre-condition:  lst is a character string
   ;                plain and cipher are as in encode-char
   ;Post-condition: returns transformation of lst
   ;                    using monoalphabetic substitution
      (if (null? lst) 
          '()
          (cons (encode-char (car lst) plain cipher)
                (encode-message-kernel (cdr lst) plain cipher))
      )
   )
)
  1. Check that encode-message works correctly with the data from the above example:

    
    (encode-message "THIS IS A MESSAGE TO ENCODE."
                    "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                    "XDQTVBKRAUGMZHYWCJOSENILPF")
    
  2. Add pre- and post-conditions to the above procedures. Be careful to specify all assumptions regarding the plain and cipher alphabets.

  3. Write the corresponding procedure

    
    (define dencode-message
       (lambda (str plain cipher)
       ;Pre-condition:  str is a character string
       ;                plain and cipher are as in encode-char
       ;Post-condition: returns a decoding of str 
       ;                    using monoalphabetic substitution
          '(--- your part replaces this line ---)
       )
    )
    

    Note this is VERY EASY. In particular, you solution should fit easily on a single line.

  4. Suppose upper case letters were to be replaced following the above cipher alphabet, but lower cases letters were to be replaced as follows:
    
    Plain alphabet:   abcdefghijklmnopqrstuvwxyz
    Cipher alphabet:  rstlnejpaxkdzvqmhbyuofcgiw
    

    Thus, an upper case T would be replaced by S as before, but a lower case t would be replaced by the letter t.

    How would you change the code and/or the call to encode-message and decode-message to allow these different substitutions for upper and lower case letters? Be sure to run tests to check your conclusions!

  5. In the above procedures, the user must supply the plain and cipher alphabets. However, in all cases, one would expect that the plain alphabet would be always be the same (i.e., the alphabet with both uppercase and lower case letters in their usual order). Revise encode-message and decode-message, so only a cipher alphabet must be supplied by the use. Again, be sure to test the resulting code.

Further String Processing

  1. Write a procedure that reverses the letters in a string. Thus, (string-reverse "this is a string") should return "gnirts a si siht"

  2. A palindrome is a string which reads the same from front to back and from back to front. For example, "this is a palindromemordnilap a si siht" is a palindrome. Write a procedure palindrome that checks if a string is a palindrome.

    1. Convert the string to a list and analyze the list elements.
    2. Do NOT use any auxiliary data structures (e.g., lists) for your procedures. Rather, access the letters directly in the original string.


This document is available on the World Wide Web as

http://www.math.grin.edu/~walker/courses/153.sp00/lab-strings.html

created March 5, 1997
last revised January 10, 2000