Chapter 2. An Introduction to PScheme

In subsequent discussions, PScheme means this particular implementation of a Scheme-like interpreter (Perl-Scheme). The implementation lacks a number of the features of a complete implementation, and differs from the Scheme standard at a number of points. However it could be argued that it is close enough to call itself “Scheme-like”, it's certainly closer to a reference implementation of Scheme than it is to any other language in the Lisp family.

2.1. PScheme Syntax

PScheme has a very simple syntax. A PScheme expression is either a number, a string, a symbol, or a list of expressions separated by spaces and enclosed in round brackets (where an expression is either a number, a string, a symbol, or a list of...). We can write this recursive definition in a special purpose notation for describing programming language grammars called Backus-Naur Format (BNF) as follows:

‹expression›::= ‹number›
| ‹string›
| ‹symbol›
| '(' ‹expression› ... ')'

Read “::=” as “is a”, and “|” as “or”.

A PScheme number is a sequence of digits, optionally preceded by a “+” or a “-”. PScheme does not support floating point or other more complex number types.3

A PScheme string is any sequence of characters enclosed by double quotes. Within a string, a double quote may be escaped with a backslash.

PScheme has a rather more relaxed idea of what constitutes a symbol than most languages, essentially it's anything that isn't an open or close brace and doesn't look like a number or a string, up to the next whitespace character or round bracket. So “x”, “this-is-a-symbol”, “<”, “&foo”, and “$%*@!{}” are all symbols.

PScheme reads an expression, then evaluates it, then prints the result. The rules for evaluation are also very simple:

2.2. Simple Expressions

PScheme is an interactive language, it presents a prompt, and the user types in expressions. The interpreter evaluates those expressions then prints the results:

> 2
2

The “>” is the PScheme prompt. We gave the interpreter a 2, and it replied with 2, because 2 is 2 is 2.

Let's try something a bit more adventurous:

> x
Error: no binding for x in PScm::Env

We asked for the value of a symbol, x, and because the interpreter doesn't know what x is, we get an error.

Here's something that does work:

> (* 2 2)
4

Now that might look strange at first, but remember the first subexpression in a list should evaluate to a function. The multiplication symbol “*” does indeed evaluate to the internal primitive definition of how to multiply; we told PScheme to multiply 2 by 2, and it replied 4. In detail what it has done is:

  1. evaluate the symbol * to get its value: the multiplication function.
  2. evaluate the first 2 to get 2;
  3. evaluate the second 2 to get 2;
  4. applied the multiplication function to arguments 2 and 2;
  5. printed the result: 4.

One important thing to note here is that PScheme makes no distinction between functions and operators, the operation always comes first. This has some advantages; because the operation always comes first, it can often apply to variable numbers of arguments:

> (* 2 2 2 2)
16

A more syntax-rich language would require something like 2 * 2 * 2 * 2 to get the same result.

Now for something just a little more complex:

> (* (- 8 3) 2)
10

Here we told the interpreter to subtract 3 from 8, then multiply the result by 2. It did it by:

  1. Evaluating the symbol * to get the multiplication function;
  2. Evaluating the expression (- 8 3) to get 5, which it did by:
    1. Evaluating the symbol “-” to get the subtraction function;
    2. Evaluating 8 to get 8;
    3. Evaluating 3 to get 3;
    4. Applying the subtraction function to arguments 8 and 3.
  3. Evaluating 2 to get 2;
  4. Applying the multiplication function to arguments 5 and 2;
  5. Printing the result: 10.

Hopefully it is obvious that the interpreter is following a very simple set of rules here, albeit recursively.

This incidentally demonstrates another big simplification that PScheme makes: it is impossible for there to be any ambiguity about operator precedence, because the language forces the precedence to be explicit. In fact there is no notion of operator precedence in PScheme. In a more syntax-rich language, to achieve the above result one would have to write (8 - 3) * 2 because the equally legal 8 - 3 * 2 would be misinterpreted (a lovely expression) as 8 - (3 * 2).

2.3. Conditionals

The keyword if introduces a conditional statement. The general form of an if expression is:

(if ‹test›
    ‹true-result›
    ‹false-result›)

This is simple enough, if expects (in this implementation at least) three arguments: a test, a consequent (true result) and an alternative (false result). For example:

> (if 0
>     3
>     (- 8 3))
5

In this example since the test, 0, is false (again, in this implementation) the alternative (8 - 3 = 5) is returned.

Even here we can start to see some of the power of the language:

> ((if 0 - *) 4 5)
20

In the author's opinion this is a beautiful example of “removing the weaknesses and restrictions that make additional features appear necessary”; because the language treats the operator position just like any other expression, any expression that evaluates to an operation is valid in that position. Furthermore because primitive operations are represented by symbols just like anything else, they can be treated just like any other variable: the if with a false (0) test argument selects the value of “*” to return, rather than the value of “-”. So it's the multiplication function that gets applied to the arguments 4 and 5.

However there is a slight complication, Consider this:

> (if 0
>     (a-long-calculation)
>     (- 8 3))
5

Were if a normal function, the normal rules for evaluation would apply: evaluate all the components of the list, then apply the if function to the evaluated arguments. That would mean (a-long-calculation) and (- 8 3) would both get evaluated, then if would pick the result. Although the value of the whole if expression is unaffected, provided (a-long-calculation) doesn't have any side-effects, we still don't want to have that calculation executed unnecessarily. Now remember it was said that PScheme evaluates each component of the list in a list expression? Well that's not entirely the case. It always evaluates the first component of the list, and if the result is a simple function like multiplication, it then goes on to evaluate the other items on the list and passes the results to the function just as has already been described. However if the first component is what is called a special form, such as the definition of if, PScheme passes the un-evaluated arguments to the special form and that special form can do what it likes with them.

In the case of if, if evaluates its first argument (the test) and if the result is true it evaluates and returns its second argument (the consequent), otherwise it evaluates and returns its third argument (the alternative). We can demonstrate that with a simple example:

> (if 1
>     10
>     x)
10

Because the test result was true, the if only evaluated the consequent expression, there was no error from the undefined symbol x in the alternative.

2.4. Global Variables

define is the way we associate values with global variables in PScheme. It has the general form:

(define ‹symbol› ‹expression›)

where ‹symbol› is being bound to the value of ‹expression›.

For example:

> (define x 5)
x
> x
5

In the above example we defined x to be 5. Then when we asked for the value of x PScheme replied 5. Note again that the operation (define in this case) always comes first. Note also that define must be a special form, because we didn't get an error attempting to evaluate x during the definition. define does however evaluate its second argument so:

> (define a b)
Error: no binding for b in PScm::Env

causes an immediate error attempting to evaluate the undefined symbol b before assigning the result to a.

2.5. Functions

lambda, another special form, creates a function. The general form of a lambda expression is:

(lambda (‹symbol› ...) ‹expression›)

The (‹symbol› ...) part is the names of the arguments to the function, and the ‹expression› is the body of the function.

Here's an example:

> (define square
>   (lambda (x) (* x x)))
square

Now that may also look a bit strange at first, but simply put, lambda creates an anonymous function, and that is separate from giving that function a name with define. The function being defined in this example takes one argument x and its function body is (* x x). The function body will execute when the function is invoked. This is more or less equivalent to this Perl snippet:

our $square = sub {
    my ($x) = @_;
    $x * $x;
};

In fact, Perl's anonymous sub {...} syntax can be considered pretty much synonymous with PScheme's (lambda ...). The big difference is that in PScheme that's the only way to create a function4.

Having created a square function, it can be called:

> (square 4)
16

Although square was created by assignment, when it is used it is syntactically indistinguishable from any built-in function.

Anonymous functions can also be called directly without giving them a name first:

> ((lambda (x) (* x x)) 3)
9

Again this is much simpler than it might first appear. The first term of the list expression, the lambda expression, gets evaluated resulting in a function which will square it's argument. That function then immediately gets applied to 3 resulting in 9. It is possible to do something similar in perl, like this:

sub { my ($x) = @_; $x * $x }->(3);

but it's not a common idiom.


As an aside, you may be wondering what the eleventh letter of the Greek alphabet has to do with the creation of a function. The term comes from a branch of mathemetics called the lambda calculus which is concerned with describing and reasoning about the behaviour of mathematical functions in general. Even though the lambda calculus was devised before the creation of the first computer, it turns out that it provides a sound theoretical basis for the implementation of programming languages, and Lisp was the first programming language to exploit that fact. There is a good introduction to the lambda calculus in [cipl], and a more detailed and rigorous treatment in [tapl].

2.6. Local Variables

Moving on, how can PScheme create local variables limited (lexically) to a given scope? This is done with the let special form. The general form of a let expression is:

(let (‹binding› ...) ‹expression›)

where ‹binding› is:

(‹symbol› ‹expression›)

let takes a list of bindings (symbol-value pairs) and a body to execute with those bindings in effect.

For example:

> (let ((a 10)
>       (b (+ 10 10)))
>   (+ a b))
30

That can be read aloud as “let a = 10 and b = 10 + 10 in the expression a + b”. Symbol a is given the value 10 and symbol b the value 20 while the body is evaluated. However if a later expression was to ask for the value of a or b outside of the scope (the last closing brace) of the let, there would be an error (assuming there weren't global bindings of a and b in effect.)

The careful reader will have noticed that these were described as lexically scoped variables, and yes, any functions defined in the scope of those variables are closures just like Perl closures and have access to those variables when executed even if executed outside of that scope. For example:

> (define times2
>   (let ((n 2))
>     (lambda (x) (* n x))))
times2
> (times2 4)
8

When reading this it's useful to remember that define does evaluate its second argument. That means that this expression defines times2 to be the result of evaluating the let expression. Now that let expression binds n to 2, then returns the result of evaluating the lambda expression (creating a function) with that binding in effect. It is that newly created function that gets bound to the symbol times2. When times2 is later used, for example in (times2 4), the body of the function (* n x) can still “see” the value of n that was supplied by the let, even though the function is executed outside of that scope. This is similar to the common Perl trick to get a private static variable:

{
    my $n = 2;
    sub times2 {
        my ($x) = @_;
        $n * $x;
    }
}

but to be truthful it's closer to the more obtuse:

our $times2 = do {
    my $n = 2;
    sub {
        my ($x) = @_;
        $n * $x;
    }
};

And that's pretty much all that is needed for now. Of course the final language has many other interesting features, but these will be introduced in later sections as the need arises. Let's take a look at our first cut at an interpreter5.

Last updated Sun Mar 14 10:43:08 2010 UST
3

A full Scheme implementation supports a large range of numeric types, from arbitrarily large integers through floating point, precision-preserving fractions, and complex numbers.

4

There are examples of Scheme code that show things like:

(define (square x)
        (* x x))

This form of define, where the expression being defined is a list, is just syntactic sugar for the underlying form. define essentially re-writes it into the simpler lambda statement before evaluating it. Since the definition here mimics the intended usage of the function it is certainly a little bit easier to read, but personally I find that since I have to use lambda in some expressions anyway, it makes sense to always use it. Plus the syntactic sugar tends to obscure what is really going on. In any case PScheme does not support this alternative syntax for function definition.

5

If you want more of an introduction to Scheme in general, you could do worse than look at [tls].