In subsequent discussions, PScheme means this particular implementation of a Scheme-like interpreter (Perl-Scheme). The implementation lacks a number of the features of a complete implementation, and differs from the Scheme standard at a number of points. However it could be argued that it is close enough to call itself “Scheme-like”, it's certainly closer to a reference implementation of Scheme than it is to any other language in the Lisp family.
PScheme has a very simple syntax. A PScheme expression is either a number, a string, a symbol, or a list of expressions separated by spaces and enclosed in round brackets (where an expression is either a number, a string, a symbol, or a list of...). We can write this recursive definition in a special purpose notation for describing programming language grammars called Backus-Naur Format (BNF) as follows:
‹expression› | ::= | ‹number› |
| | ‹string› |
|
| | ‹symbol› |
|
| | '(' ‹expression› ... ')' |
Read “::=
” as “is a”, and
“|
” as “or”.
A PScheme number is a
sequence of digits, optionally preceded by a “+
” or a
“-
”. PScheme does not support floating point or other
more complex number types.3
A PScheme string is any sequence of characters enclosed by double quotes. Within a string, a double quote may be escaped with a backslash.
PScheme has a rather more relaxed idea of what constitutes
a symbol than most languages, essentially it's
anything that isn't an open or close brace and doesn't look like
a number or a string, up to the next whitespace character or round bracket. So
“x
”,
“this-is-a-symbol
”, “<
”,
“&foo
”,
and “$%*@!{}
” are all symbols.
PScheme reads an expression, then evaluates it, then prints the result. The rules for evaluation are also very simple:
PScheme is an interactive language, it presents a prompt, and the user types in expressions. The interpreter evaluates those expressions then prints the results:
> 2 2
The “>” is the PScheme prompt.
We gave the interpreter a 2
, and it replied with 2, because
2 is 2 is 2.
Let's try something a bit more adventurous:
> x Error: no binding for x in PScm::Env
We asked for the value of a symbol, x
, and because
the interpreter doesn't know what x
is, we get an error.
Here's something that does work:
> (* 2 2) 4
Now that might look strange at first, but remember the first subexpression in a list should evaluate to a function. The multiplication symbol “*” does indeed evaluate to the internal primitive definition of how to multiply; we told PScheme to multiply 2 by 2, and it replied 4. In detail what it has done is:
*
to get its value: the
multiplication function.2
to get 2;2
to get 2;4
.One important thing to note here is that PScheme makes no distinction between functions and operators, the operation always comes first. This has some advantages; because the operation always comes first, it can often apply to variable numbers of arguments:
> (* 2 2 2 2) 16
A more syntax-rich language would require something like 2 * 2 * 2 * 2
to get the same result.
Now for something just a little more complex:
> (* (- 8 3) 2) 10
Here we told the interpreter to subtract 3 from 8, then multiply the result by 2. It did it by:
*
to get the multiplication
function;(- 8 3)
to get 5, which
it did by:
-
” to get the subtraction
function;8
to get 8;3
to get 3;2
to get 2;10
.Hopefully it is obvious that the interpreter is following a very simple set of rules here, albeit recursively.
This incidentally
demonstrates another big simplification that PScheme makes: it is
impossible for there to be any ambiguity about operator precedence,
because the language forces the precedence to be explicit. In fact
there is no notion of operator precedence in PScheme.
In a more syntax-rich language, to achieve the above result
one would have to write
(8 - 3) * 2
because the equally legal
8 - 3 * 2
would be misinterpreted (a lovely expression)
as 8 - (3 * 2)
.
The keyword if introduces a conditional statement.
The general form of an if
expression is:
(if ‹test› ‹true-result› ‹false-result›)
This is simple enough, if
expects (in this implementation at
least) three arguments: a test, a consequent (true result)
and an alternative (false result). For example:
> (if 0 > 3 > (- 8 3)) 5
In this example since the test, 0
, is false (again, in this
implementation) the alternative (8 - 3 = 5
) is returned.
Even here we can start to see some of the power of the language:
> ((if 0 - *) 4 5) 20
In the author's opinion this is a beautiful example of “removing the weaknesses
and restrictions that make additional features appear necessary”;
because the language treats the operator position just like any
other expression, any expression that evaluates to an
operation is valid in that position. Furthermore because primitive operations
are represented by symbols just like anything else, they can be
treated just like any other variable: the if
with
a false (0
) test argument selects the value of “*
”
to return, rather than the value of “-
”.
So it's the multiplication
function that gets applied to the arguments 4 and 5.
However there is a slight complication, Consider this:
> (if 0 > (a-long-calculation) > (- 8 3)) 5
Were if
a normal function, the normal rules for
evaluation would apply: evaluate all the components of the
list, then apply the if
function to the evaluated
arguments. That would mean (a-long-calculation)
and
(- 8 3)
would both get evaluated, then if
would pick the result. Although the value of the whole if
expression is unaffected, provided (a-long-calculation)
doesn't have any side-effects, we still don't want to have that
calculation executed unnecessarily. Now remember it was said that
PScheme evaluates each component of the list in a list expression?
Well that's not entirely the case. It always evaluates the
first component of the list, and if the result is a simple
function like multiplication, it then goes on to evaluate the other
items on the list and passes the results to the function just as
has already been described. However if the first component is what
is called a special form, such as the definition of
if
, PScheme passes the un-evaluated arguments to
the special form and that special form can do what it likes with
them.
In the case of if
, if
evaluates its first
argument (the test) and if the result is true it evaluates and
returns its
second argument (the consequent), otherwise it evaluates and returns
its third argument (the alternative). We can demonstrate that
with a simple example:
> (if 1 > 10 > x) 10
Because the test result was true, the if
only evaluated the
consequent expression, there was no error from the undefined
symbol x
in the alternative.
define is the way we associate values with global variables in PScheme. It has the general form:
(define ‹symbol› ‹expression›)
where ‹symbol› is being bound to the value of ‹expression›.
For example:
> (define x 5) x > x 5
In the above example we defined x
to be 5
. Then
when we asked for the value of x
PScheme replied 5
. Note
again that the operation (define
in this case) always comes
first. Note also that define
must be a special form, because
we didn't get an error attempting to evaluate x
during the
definition. define
does however evaluate its second argument
so:
> (define a b) Error: no binding for b in PScm::Env
causes an immediate error attempting to evaluate the undefined symbol
b
before assigning the result to a
.
lambda, another special form, creates a function.
The general form of a lambda
expression is:
(lambda (‹symbol› ...) ‹expression›)
The (‹symbol› ...)
part is the
names of the arguments to the function, and the ‹expression›
is the body of the function.
Here's an example:
> (define square > (lambda (x) (* x x))) square
Now that may also look a bit strange at first, but simply put,
lambda
creates an anonymous function, and that is
separate from giving that function a name with define
.
The function being defined in this example takes one argument
x
and its function body is (* x x)
. The
function body will execute when the function is invoked. This is
more or less equivalent to this Perl snippet:
our $square = sub { my ($x) = @_; $x * $x; };
In fact, Perl's anonymous sub {...}
syntax can be considered
pretty much synonymous with PScheme's (lambda ...)
. The
big difference is that in PScheme that's the only way to create
a function4.
Having created a square
function, it can be called:
> (square 4) 16
Although square
was created by assignment,
when it is used it is syntactically indistinguishable from any
built-in function.
Anonymous functions can also be called directly without giving them a name first:
> ((lambda (x) (* x x)) 3) 9
Again this is much simpler than it might first appear. The first
term of the list expression, the lambda
expression, gets evaluated
resulting in a function which will square it's argument.
That function then immediately gets applied to
3
resulting in 9
. It is
possible to do something similar in perl, like this:
sub { my ($x) = @_; $x * $x }->(3);
but it's not a common idiom.
As an aside, you may be wondering what the eleventh letter of the Greek alphabet has to do with the creation of a function. The term comes from a branch of mathemetics called the lambda calculus which is concerned with describing and reasoning about the behaviour of mathematical functions in general. Even though the lambda calculus was devised before the creation of the first computer, it turns out that it provides a sound theoretical basis for the implementation of programming languages, and Lisp was the first programming language to exploit that fact. There is a good introduction to the lambda calculus in [cipl], and a more detailed and rigorous treatment in [tapl].
Moving on, how can PScheme create local variables limited
(lexically) to a given scope?
This is done with the let special
form. The general form of a let
expression is:
(let (‹binding› ...) ‹expression›)
where ‹binding› is:
(‹symbol› ‹expression›)
let
takes a list of bindings (symbol-value pairs) and a
body to execute with those bindings in effect.
For example:
> (let ((a 10) > (b (+ 10 10))) > (+ a b)) 30
That can be read aloud as “let a = 10
and b
= 10 + 10
in the expression a + b
”.
Symbol a
is given the value 10
and symbol b
the value 20 while the body is evaluated.
However if a later expression was to ask for the value of a
or
b
outside of the scope (the last closing brace) of the
let
, there would be an error (assuming there weren't global
bindings of a
and b
in effect.)
The careful reader will have noticed that these were described as lexically scoped variables, and yes, any functions defined in the scope of those variables are closures just like Perl closures and have access to those variables when executed even if executed outside of that scope. For example:
> (define times2 > (let ((n 2)) > (lambda (x) (* n x)))) times2 > (times2 4) 8
When reading this it's useful to remember that define
does evaluate its second argument. That means that this
expression defines times2
to be the result of evaluating
the let
expression. Now that let
expression binds
n
to 2, then returns the result of evaluating the
lambda
expression (creating a function) with that binding
in effect. It is that newly created function that gets bound to the
symbol times2
.
When times2
is later used, for example in (times2 4)
,
the body of the function (* n x)
can still “see”
the value of n
that was supplied by the let
,
even though the function is executed outside of that scope.
This is similar to the common Perl trick to
get a private static variable:
{ my $n = 2; sub times2 { my ($x) = @_; $n * $x; } }
but to be truthful it's closer to the more obtuse:
our $times2 = do { my $n = 2; sub { my ($x) = @_; $n * $x; } };
And that's pretty much all that is needed for now. Of course the final language has many other interesting features, but these will be introduced in later sections as the need arises. Let's take a look at our first cut at an interpreter5.
A full Scheme implementation supports a large range of numeric types, from arbitrarily large integers through floating point, precision-preserving fractions, and complex numbers.
There are examples of Scheme code that show things like:
(define (square x) (* x x))
This form of define
, where the
expression being defined is a list, is just syntactic sugar for the
underlying form. define
essentially re-writes it into the
simpler lambda
statement before evaluating it.
Since the definition here mimics the
intended usage of the function
it is certainly a little bit easier to read,
but personally I find that since I have
to use lambda
in some expressions anyway, it makes sense to
always use it. Plus the syntactic sugar tends to obscure what is really
going on. In any case PScheme does not support this alternative syntax for
function definition.