Scheme from Scratch - Bootstrap v0.3 - Characters
Characters are implemented similarly to integers. We need to update the model, read, and print layers of the interpreter. Characters are self-evaluating so we still don’t need much of an eval layer yet.
Because Bootstrap Scheme is a quick and dirty interpreter, and a small, readable implementation is one of the goals, implementing ASCII characters is fine. We don’t need to enter the world of unicode.
Character literals in Scheme use a prefix notation: #\a, #\9. The trouble makers for parsing are the special literals for newlines and spaces: #\newline, #\space. Here is a sample REPL session with characters:
$ ./scheme
Welcome to Bootstrap Scheme. Use ctrl-c to exit.
> #\a
#\a
> #\newline
#\newline
> #\
#\newline
> #\space
#\space
> #\
#\space
Note that the second newline example may be considered bad. This is all part of the “dirty” aspect of a bootstrap interpreter. It is more important to have a small readable implementation than cover every single boundary case.
In the second space example above there is a space after the backslash before pressing enter.
It seems a bit odd that in R5RS there is was standard for a tab character. You can implement #\tab if you want.
Implementing a language encourages examination of the language’s design decisions. I am not a big fan of character literals in Scheme. We write #\newline for a newline character literal but to write a newline in a Scheme string we write "hello, world\n". The lack of parallelism between special characters as character literals and in strings is a bit unfortunate.
I have always liked C’s character literals. Part of the reason is that in C the character literal for a newline '\n' is the same as the escape character for a new line in a string "hello, world\n". The single quote character is not really available for this purpose in Scheme. It could be used but then characters would not have a prefix notation.
For a Scheme-like language of my own design, I would consider character literals half way between Scheme’s and C’s: #'a', #'\n', #' '.
There is a v0.3 branch on github for this version.
Comments
Have something to write? Comment on this article.
Have something to write? Comment on this article.
feed
mzscheme has the same behaviour with newlines :)