Scheme from Scratch - Bootstrap v0.3 - Characters
Characters are implemented similarly to integers. We need to update the model, read, and print layers of the interpreter. Characters are self-evaluating so we still don’t need much of an eval layer yet.
Because Bootstrap Scheme is a quick and dirty interpreter, and a small, readable implementation is one of the goals, implementing ASCII characters is fine. We don’t need to enter the world of unicode.
Character literals in Scheme use a prefix notation: #\a
, #\9
. The trouble makers for parsing are the special literals for newlines and spaces: #\newline
, #\space
. Here is a sample REPL session with characters:
$ ./scheme
Welcome to Bootstrap Scheme. Use ctrl-c to exit.
> #\a
#\a
> #\newline
#\newline
> #\
#\newline
> #\space
#\space
> #\
#\space
Note that the second newline example may be considered bad. This is all part of the “dirty” aspect of a bootstrap interpreter. It is more important to have a small readable implementation than cover every single boundary case.
In the second space example above there is a space after the backslash before pressing enter.
It seems a bit odd that in R5RS there is was standard for a tab character. You can implement #\tab
if you want.
Implementing a language encourages examination of the language’s design decisions. I am not a big fan of character literals in Scheme. We write #\newline
for a newline character literal but to write a newline in a Scheme string we write "hello, world\n"
. The lack of parallelism between special characters as character literals and in strings is a bit unfortunate.
I have always liked C’s character literals. Part of the reason is that in C the character literal for a newline '\n'
is the same as the escape character for a new line in a string "hello, world\n"
. The single quote character is not really available for this purpose in Scheme. It could be used but then characters would not have a prefix notation.
For a Scheme-like language of my own design, I would consider character literals half way between Scheme’s and C’s: #'a'
, #'\n'
, #' '
.
There is a v0.3 branch on github for this version.
Comments
Have something to write? Comment on this article.
Have something to write? Comment on this article.
mzscheme has the same behaviour with newlines :)