Interpreter Version 0.0.0

This preliminary version of the interpreter supports only three operations, namely multiplication (*), subtraction (-), and conditional evaluation (if). It does however lay the groundwork for more sophisticated interpreters later on.

Scheme lisp interpreters, being interactive, are based around what is called a “read eval print loop”: first read an expression, then evaluate it, then print the result, then loop. This long-winded term is often abbreviated to repl. In order for the repl to evaluate the expression, there must additionally be an environment in which symbols can be given values and in which values can be looked up. All this means that there are six principle components to such an interpreter.

The implementation we're about to discuss takes a fairly strict OO approach, with each of these components and pretty much everything else represented by classes of objects. As a consequence of this the Evaluator and the Print system are distributed throughout the Structure component. This means that for example to evaluate an expression you call its Eval method, and to print a result you call the Print method on the result object. There is a good deal of scope for polymorphism with this approach, since different types of object can respond differently to the same message.

3.1. The Read-Eval-Print Loop

The top-level read-eval-print loop (repl) for the PScheme interpreter is in the package PScm in Listing 1. All other packages inherit from this package, although that's mainly just a convenience.

Listing 1. PScm.pm

  1 package PScm;
  2 
  3 use strict;
  4 use warnings;
  5 use PScm::Read;
  6 use PScm::Env;
  7 use PScm::Primitive;
  8 use PScm::SpecialForm;
  9 use FileHandle;
 10 
 11 require Exporter;
 12 
 13 our @ISA    = qw(Exporter);
 14 our @EXPORT = qw(ReadEvalPrint);
 15 
 16 =head1 NAME
 17 
 18 PScm - Scheme-like interpreter written in Perl
 19 
 20 =head1 SYNOPSIS
 21 
 22   use PScm;
 23   ReadEvalPrint($in_filehandle[, $out_filehandle]);
 24 
 25 =head1 DESCRIPTION
 26 
 27 Just messing about, A toy lisp interpreter.
 28 
 29 =cut
 30 
 31 our $GlobalEnv = new PScm::Env(
 32     '*' => new PScm::Primitive::Multiply(),
 33     '-' => new PScm::Primitive::Subtract(),
 34     if  => new PScm::SpecialForm::If(),
 35 );
 36 
 37 sub ReadEvalPrint {
 38     my ($infh, $outfh) = @_;
 39 
 40     $outfh ||= new FileHandle(">-");
 41     my $reader = new PScm::Read($infh);
 42     while (defined(my $expr = $reader->Read)) {
 43         my $result = $expr->Eval();
 44         $result->Print($outfh);
 45     }
 46 }
 47 
 48 sub Print {
 49     my ($self, $outfh) = @_;
 50     print $outfh $self->as_string, "\n";
 51 }
 52 
 53 sub as_string { ref($_[0]); }
 54 
 55 sub new { bless {}, $_[0] }
 56 
 57 1;

Firstly, on Lines 31–35 a global environment, $PScm::GlobalEnv is initialised to a new PScm::Env object.

 31 our $GlobalEnv = new PScm::Env(
 32     '*' => new PScm::Primitive::Multiply(),
 33     '-' => new PScm::Primitive::Subtract(),
 34     if  => new PScm::SpecialForm::If(),
 35 );

There are only three things in that environment. They are the objects that will perform the primitive operations of multiplication, subtraction and conditional evaluation, and they're bound to “*”, “-” and “if” respectively. We'll see how they work presently.

ReadEvalPrint() on Lines 37–46 is the central control routine of the whole interpreter. It takes an input file handle and an output file handle as arguments. Starting on Line 40 it defaults the output file handle to stdout, then on Line 41 it creates a new PScm::Read object on the input file handle, and on Lines 42–45 it enters its main loop. The loop repeatedly collects an expression from the Reader, then evaluates the expression by calling its Eval() method, then prints the result by calling its Print() method:

 37 sub ReadEvalPrint {
 38     my ($infh, $outfh) = @_;
 39 
 40     $outfh ||= new FileHandle(">-");
 41     my $reader = new PScm::Read($infh);
 42     while (defined(my $expr = $reader->Read)) {
 43         my $result = $expr->Eval();
 44         $result->Print($outfh);
 45     }
 46 }

The basis of the print system can be seen in the Print() and as_string() methods in PScm.pm, but we're going to leave discussion of the print system until later on. In the next section we'll look at our first, very simple, implementation of an environment.

3.2. The Environment

All an environment has to do is to return the current value for an argument symbol. Perl hashes are ideal for this task, and our implementation uses them. Our environment is implemented by PScm::Env in Listing 2.

Listing 2. PScm/Env.pm

  1 package PScm::Env;
  2 
  3 use strict;
  4 use warnings;
  5 use base qw(PScm);
  6 
  7 sub new {
  8     my ($class, %bindings) = @_;
  9 
 10     bless { bindings => {%bindings}, }, $class;
 11 }
 12 
 13 sub LookUp {
 14     my ($self, $symbol) = @_;
 15 
 16     if (exists($self->{bindings}{ $symbol->value })) {
 17         return $self->{bindings}{ $symbol->value };
 18     } else {
 19         die "no binding for @{[$symbol->value]} ",
 20           "in @{[ref($self)]}\n";
 21     }
 22 }
 23 
 24 1;

It is no more than an object wrapper around a Perl hash. The new() method (Lines 7–11) creates an object with a set of bindings (name to value mappings) that were passed in as arguments:

  7 sub new {
  8     my ($class, %bindings) = @_;
  9 
 10     bless { bindings => {%bindings}, }, $class;
 11 }

The LookUp() method on Lines 13–22 looks up a symbol in the bindings, die-ing if the symbol does not have a binding:

 13 sub LookUp {
 14     my ($self, $symbol) = @_;
 15 
 16     if (exists($self->{bindings}{ $symbol->value })) {
 17         return $self->{bindings}{ $symbol->value };
 18     } else {
 19         die "no binding for @{[$symbol->value]} ",
 20           "in @{[ref($self)]}\n";
 21     }
 22 }

Note that the $symbol passed in is an object, and LookUp() must call the symbol's value() method to get a string suitable for a hash key. The value() method for a symbol just returns the name of the symbol as a perl string.

Because this first version of the interpreter has no support for local variables, this class doesn't provide any methods for adding values to the environment. That will come later.

And that's all there is to our environment class. Let's move on to look at the Reader.

3.3. The Reader

The job of the Reader is to take a stream of text and convert it into a structure that the evaluator can more easily work with. So for example we want to take an expression such as (foo ("bar" 10) baz) and convert it into an equivalent structure such as shown in Figure 3.1.

Figure 3.1. Example PScheme Structure for (foo ("bar" 10) baz)

In this figure, showing the result of parsing that expression, the top-level list object has three components. Reading left to right it contains the symbol object foo, another list object and the symbol object baz. The sub-list contains the string object "bar" and the number object 10. It is apparent that that the structure is a direct representation of the text, where each list corresponds to the contents of a matching pair of braces. It should also be obvious that these structures are practically identical to Perl list references. The scheme list (foo ("bar" 10) baz) corresponds directly to the nested perl listref [$foo, ["bar", 10], $baz]⁶.

To simplify the creation of such a structure from an input stream, it is often convenient to split the process into two parts:

A tokeniser: which recognises and returns the basic tokens of the text (braces, symbols, numbers and strings);
A builder or parser: which assembles those tokens into meaningful structures (lists).

That is the approach taken by the Reader described here.

It was mentioned earlier that Scheme was extremely easy to parse, well here's the proof. The code for the Reader, PScm::Read in Listing 3 is only 63 lines long.

Listing 3. PScm/Read.pm

  1 package PScm::Read;
  2 
  3 use strict;
  4 use warnings;
  5 use PScm::Expr;
  6 use PScm::Token;
  7 use base qw(PScm);
  8 
  9 sub new {
 10     my ($class, $fh) = @_;
 11     bless {
 12         FileHandle => $fh,
 13         Line       => '',
 14     }, $class;
 15 }
 16 
 17 sub Read {
 18     my ($self) = @_;
 19 
 20     my $token = $self->_next_token();
 21     return undef unless defined $token;
 22 
 23     return $token unless $token->is_open_token;
 24 
 25     my @res = ();
 26 
 27     while (1) {
 28         $token = $self->Read;
 29         die "unexpected EOF"
 30           if !defined $token;
 31         last if $token->is_close_token;
 32         push @res, $token;
 33     }
 34 
 35     return new PScm::Expr::List(@res);
 36 }
 37 
 38 sub _next_token {
 39     my ($self) = @_;
 40 
 41     while (!$self->{Line}) {
 42         $self->{Line} = $self->{FileHandle}->getline();
 43         return undef unless defined $self->{Line};
 44         $self->{Line} =~ s/^\s+//s;
 45     }
 46 
 47     for ($self->{Line}) {
 48         s/^\(\s*// && return PScm::Token::Open->new();
 49         s/^\)\s*// && return PScm::Token::Close->new();
 50         s/^([-+]?\d+)\s*//
 51           && return PScm::Expr::Number->new($1);
 52         s/^"((?:(?:\\.)|([^"]))*)"\s*// && do {
 53             my $string = $1;
 54             $string =~ s/\\//g;
 55             return PScm::Expr::String->new($string);
 56         };
 57         s/^([^\s\(\)]+)\s*//
 58           && return PScm::Expr::Symbol->new($1);
 59     }
 60     die "can't parse: $self->{Line}";
 61 }
 62 
 63 1;

As with the rest of the implementation, it uses an OO style, so the Reader is an object that is created with an argument FileHandle and behaves as an iterator returning the next parsed expression from the stream on each call to Read(). The new() method (Lines 9–15) simply stashes its input file handle argument along with an empty string representing the current line, and returns them in the new object.

  9 sub new {
 10     my ($class, $fh) = @_;
 11     bless {
 12         FileHandle => $fh,
 13         Line       => '',
 14     }, $class;
 15 }

Apart from new() the only other publicly available method is Read(), which returns the next complete expression, as a structure, from the input file. The Read() method calls the private _next_token() method (the tokeniser) for its tokens.

Skipping over the Read() method for now, _next_token() on Lines 38–61 simply chomps the next token off the input stream and returns it. It knows enough to skip whitespace and blank lines and to return undef at EOF (Lines 41–45). If there is a line left to tokenise, then a few simple regexes are tried in turn to strip the next token from it. As soon as a token of a particular type is recognised, it is returned to the caller.

 38 sub _next_token {
 39     my ($self) = @_;
 40 
 41     while (!$self->{Line}) {
 42         $self->{Line} = $self->{FileHandle}->getline();
 43         return undef unless defined $self->{Line};
 44         $self->{Line} =~ s/^\s+//s;
 45     }
 46 
 47     for ($self->{Line}) {
 48         s/^\(\s*// && return PScm::Token::Open->new();
 49         s/^\)\s*// && return PScm::Token::Close->new();
 50         s/^([-+]?\d+)\s*//
 51           && return PScm::Expr::Number->new($1);
 52         s/^"((?:(?:\\.)|([^"]))*)"\s*// && do {
 53             my $string = $1;
 54             $string =~ s/\\//g;
 55             return PScm::Expr::String->new($string);
 56         };
 57         s/^([^\s\(\)]+)\s*//
 58           && return PScm::Expr::Symbol->new($1);
 59     }
 60     die "can't parse: $self->{Line}";
 61 }

Lines 47–59 do the actual tokenisation. The tokeniser only needs to distinguish open and close braces, numbers, strings and symbols, where anything that doesn't look like an open or close brace, a number or a string must be a symbol. _next_token() returns its data in objects, which incidentally happens to be a very convenient way of tagging the type of token returned. The objects are of two basic types: PScm::Token; and PScm::Expr.

The PScm::Token types PScm::Token::Open and PScm::Token::Close represent an open and a close brace respectively, and contain no data. The three PScm::Expr types, PScm::Expr::Number, PScm::Expr::String and PScm::Expr::Symbol contain the relevant number, string or symbol.

Now that we know how _next_token() works, we can go back and take a look at Read().

The Read() method (Lines 17–36) has to return the next complete expression from the input stream. That could be a simple symbol, string or number, or an arbitrarily nested list. It starts by calling _next_token() at Line 20 and returning undef if _next_token() returned undef (signifying end of file).

 17 sub Read {
 18     my ($self) = @_;
 19 
 20     my $token = $self->_next_token();
 21     return undef unless defined $token;
 22 
 23     return $token unless $token->is_open_token;
 24 
 25     my @res = ();
 26 
 27     while (1) {
 28         $token = $self->Read;
 29         die "unexpected EOF"
 30           if !defined $token;
 31         last if $token->is_close_token;
 32         push @res, $token;
 33     }
 34 
 35     return new PScm::Expr::List(@res);
 36 }

Then, at Line 23 if the token is anything other than an open brace (determined by the call to is_open_token()⁷), Read() just returns it. Otherwise, the token just read is an open brace, so Read() initialises an empty result @res to hold the list it expects to accumulate then enters a loop calling itself recursively to collect the (possibly nested) components of the list. It is an error if it detects EOF while a list is unclosed, and if it detects a close brace (is_close_token()) it knows its work is done and it returns the accumulated list as a new PScm::Expr::List object.

The structure returned by Read() is completely composed of subtypes of PScm::Expr, since the PScm::Token types do not actually get entered into the structure. Let's work through the parsing of that simple expression (foo ("bar" 10) baz). In the following, the subscript number keeps track of which particular invocation of Read() we are talking about.

Read₁ calls _next_token() and gets a ( so it enters its loop.
Read₁ calls Read₂ from within its loop.
- Read₂ calls _next_token() and gets a foo, so it returns it.
Read₁ puts the foo at the start of its list: (foo.
Read₁ calls Read₃.
- Read₃ calls _next_token() and gets a ( so it enters its loop.
- Read₃ calls Read₄.
  - Read₄ calls _next_token() and gets a "bar" so it returns it.
- Read₃ puts the "bar" at the start of its list: ("bar".
- Read₃ calls Read₅.
  - Read₅ calls _next_token() and gets a 10 so it returns it.
- Read₃ adds the 10 to its growing list: ("bar" 10.
- Read₃ calls Read₆.
  - Read₆ calls _next_token() and gets a ) so it returns it.
- Read₃ gets the ) so it knows it has reached the end of its list and returns the result: ("bar" 10).
Read₁ adds the ("bar" 10) to the end of its own growing list: (foo ("bar" 10).
Read₁ calls Read₇.
- Read₇ calls _next_token() and gets a baz so it returns it.
Read₁ adds the baz to the end of its own growing list: (foo ("bar" 10) baz.
Read₁ calls Read₈.
- Read₈ calls _next_token() and gets a ) so it returns it.
Read₁ gets the ) so it knows it has reached the end of its list and returns the result: (foo ("bar" 10) baz).

So the Reader does indeed return the structure expected.

The PScm::Token and PScm::Expr classes are in their eponymous files. The PScm::Token classes in Listing 4 are purely parse-related. As mentioned earlier, they are returned by the tokeniser to indicate open and close braces. These tokens are used to guide the parser, but it does not actually include them in the result. PScm::Token::Open and PScm::Token::Close both inherit from PScm::Token. PScm::Token defines default implementations for is_open_token() and is_close_token(), which the two derived classes override appropriately. PScm::Token is just:

  1 package PScm::Token;
  2 
  3 use strict;
  4 use warnings;
  5 use base qw(PScm);
  6 
  7 sub is_open_token { 0 }
  8 sub is_close_token { 0 }

PScm::Token::Open overrides is_open_token():

 11 package PScm::Token::Open;
 12 
 13 use base qw(PScm::Token);
 14 
 15 sub is_open_token { 1 }

and PScm::Token::Close overrides is_close_token():

 18 package PScm::Token::Close;
 19 
 20 use base qw(PScm::Token);
 21 
 22 sub is_close_token { 1 }
 23 
 24 1;

PScm::Token inherits a stub new() method from the PScm class that just blesses an empty hash with the argument class.

Listing 4. PScm/Token.pm

  1 package PScm::Token;
  2 
  3 use strict;
  4 use warnings;
  5 use base qw(PScm);
  6 
  7 sub is_open_token { 0 }
  8 sub is_close_token { 0 }
  9 
 10 ##########################
 11 package PScm::Token::Open;
 12 
 13 use base qw(PScm::Token);
 14 
 15 sub is_open_token { 1 }
 16 
 17 ###########################
 18 package PScm::Token::Close;
 19 
 20 use base qw(PScm::Token);
 21 
 22 sub is_close_token { 1 }
 23 
 24 1;

As for the PScm::Expr objects that Read() accumulates and returns, as noted Read() has done all of the work in constructing a tree of them for us, so they are more properly discussed in the next section where we look at expressions.

3.4. PScheme Expressions

The various PScm::Expr objects are defined in PScm/Expr.pm. These objects represent the basic data types that are visible to the user: strings; numbers; symbols; and lists. They are the types returned by the Reader and printed by the print system. It would be premature to go into all the details of the PScm::Expr package right now, but it is worth pointing out a few salient features about it.

Firstly the classes arrange themselves in a Composite Pattern [gof pp163–173] according to the hierarchy of PScheme types as in Figure 3.2.

Figure 3.2. PScm::Expr classes

This figure is drawn using a standard set of conventions for diagramming the relationships between classes in an OO design, called “the Unified Modelling Language”, or UML. [umld]

For those who don't know UML, the triangular shape means “inherits from” or “is a subclass of”, and the black arrow and circle coming from the white diamond means “aggregates zero or more of”. The classes with names in italics are “abstract” classes. As far as Perl is concerned, calling a class “abstract” just means that we promise not to create any actual object instances of that particular class. The unterminated dotted line simply implies that we will be deriving other classes from PScm::Expr later on.

The root of the hierarchy is PScm::Expr, representing any and all expressions. That divides into lists (PScm::Expr::List) and atoms (PScm::Expr::Atom).

Lists are composed of expressions (the aggregation relationship.)

Atoms represent any data type that cannot be trivially taken apart, anything that's not a list in other words. Atoms are subclassed into literals (PScm::Expr::Literal) and symbols (PScm::Expr::Symbol), and literals are subclassed into strings (PScm::Expr::String) and numbers (PScm::Expr::Number).

We'll see a lot of this diagram in various guises as we progress. Here's the same diagram, in Figure 3.3 with the location of the new() and value() methods added.

Figure 3.3. PScm::Expr new() and value() methods

As you can see, there are three new() methods in the class structure. The PScm::Expr::Atom abstract class is the parent class for strings and numbers (via PScm::Expr::Literal) and for symbols. Since all of these types are simple scalars, the new() method in PScm::Expr::Atom does for most of them: it blesses a reference to the scalar into the appropriate class.

 23 sub new {
 24     my ($class, $value) = @_;
 25     bless \$value, $class;
 26 }

However the PScm::Expr::Number package supplies its own new() method, because we avail ourselves of the core Math::BigInt package for our integers. While it is nice to have arbitrary sized integers by default, the main reason for doing this is to avoid the embarrassment of Perl's automatic type conversion to floating point on integer overflow when implementing a language that is only supposed to support integer arithmetic.

 82 package PScm::Expr::Number;
 83 use base qw(PScm::Expr::Literal);
 84 
 85 use Math::BigInt;
 86 
 87 sub new {
 88     my ($class, $value) = @_;
 89     $value = new Math::BigInt($value) unless ref($value);
 90     $class->SUPER::new($value);
 91 }

The PScm::Expr::List class has the other new() method that simply bundles up its argument Perl list in a new object:

 36 sub new {
 37     my ($class, @list) = @_;
 38 
 39     $class = ref($class) || $class;
 40     bless [@list], $class;
 41 }

All three of these new() methods have already been seen in action in the Reader.

Alongside most of the new() methods is a value() method that does the exact reverse of new() and retrieves the underlying value from the object. In the case of atoms, it dereferences the scalar value:

 28 sub value { ${ $_[0] } }

and in the case of lists, it dereferences the list:

 43 sub value { @{ $_[0] } }

Even though PScm::Expr::Number has its own new() method, we don't need a separate value() method for numbers, we never need to retrieve the actual perl number from the Math::BigInt object so we just inherit value() from PScm::Expr::Atom. We do however provide a default value() method in PScm::Expr. This default method just returns $self.

 17 sub value { $_[0] }

This is solely for the benefit of those as-yet undescribed additional PScm::Expr subclasses, which will all stand for their own values.

We've seen that the various PScheme expression types (lists, numbers, strings and symbols) arrange themselves naturally into a hierachy of types and also form a recognised design pattern called “Composite”. Next we're going to look at how those expressions are evaluated.

3.5. Evaluation

To evaluate a PScm::Expr, as mentioned earlier, the top level ReadEvalPrint() loop just calls the expression's Eval() method. The Eval() methods of PScm::Expr are located in three of its subclasses as shown in Figure 3.4.

Figure 3.4. PScm::Expr Eval() Methods

The figure shows that there is a separate Eval() method for lists and for symbols, and a default method for all other PScm::Expr.

3.5.1. Evaluation of Literals

Let's look first at the default Eval() method in PScm::Expr which currently applies to literals. PScm::Expr::String and PScm::Expr::Number share this default Eval() method, which just returns $self:

 12 sub Eval {
 13     my ($self) = @_;
 14     return $self;
 15 }

This means that numbers and strings evaluate to themselves, as they should, and if we were to add other types of expression later on, they too would by default evaluate to themselves.

3.5.2. Evaluation of Symbols

Evaluation of a symbol is only slightly more complex. The Eval() method in PScm::Expr::Symbol looks up its value in the global environment $PScm::GlobalEnv:

 72 sub Eval {
 73     my ($self) = @_;
 74     return $PScm::GlobalEnv->LookUp($self);
 75 }

Remember that LookUp() from PScm::Env expects a symbol object as argument and calls its value() method to get a string that it can then use to retrieve the actual value from the hash representing the environment.

3.5.3. Evaluation of Lists

Before showing how PScm::Expr::List objects are evaluated, we need to consider a couple of support methods for lists: first() and rest().

The first() method of PScm::Expr::List just returns the first component of the list:

 45 sub first { $_[0][0] }

The rest() method of PScm::Expr::List returns all but the first component of the list as a new PScm::Expr::List object:

 47 sub rest {
 48     my ($self) = @_;
 49 
 50     my @value = $self->value;
 51     shift @value;
 52     return $self->new(@value);
 53 }

Now we can look at the evaluation of list expressions. Here's PScm::Expr::List::Eval():

 62 sub Eval {
 63     my ($self) = @_;
 64     my $op = $self->first()->Eval();
 65     return $op->Apply($self->rest);
 66 }

It's surprisingly simple. a PScm::Expr::List just evaluates its first element (Line 64). That should return one of PScm::Primitive::Multiply, PScm::Primitive::Subtract or PScm::SpecialForm::If, which gets assigned to $op. Of course because we're not doing any error checking, first() could return anything, so we're assuming valid input.

Because PScm::Expr::List's Eval() does not know or care whether the operation $op it derived on Line 64 is a simple primitive or a special form, on Line 65 it passes the rest of itself (the list of arguments) unevaluated to that operations Apply() method which applies itself to those arguments. Each individual operation's Apply() method will decide whether or not to evaluate its arguments, and what to do with them afterwards⁸.

So we've seen how PScm::Expr objects evaluate themselves. In particular we've seen how a list evaluates itself by evaluating its first component to get a primitive operation or special form, then calling that object's Apply() method with the rest of the list, unevaluated, as argument. Next we're going to look at one of those Apply() methods, the PScm::Primitive Apply() method.

3.6. Primitive Operations

The primitive built-in functions all live in PScm/Primitive.pm, shown in Listing 5.

Listing 5. PScm/Primitive.pm

  1 package PScm::Primitive;
  2 
  3 use strict;
  4 use warnings;
  5 use base qw(PScm::Expr);
  6 
  7 sub Apply {
  8     my ($self, $form) = @_;
  9 
 10     my @unevaluated_args = $form->value;
 11     my @evaluated_args = map { $_->Eval() } @unevaluated_args;
 12     return $self->_apply(@evaluated_args);
 13 }
 14 
 15 sub _check_type {
 16     my ($self, $thing, $type) = @_;
 17 
 18     die "wrong type argument(", ref($thing), ") to ", ref($self),
 19       "\n"
 20       unless $thing->isa($type);
 21 }
 22 
 23 ##################################
 24 package PScm::Primitive::Multiply;
 25 
 26 use base qw(PScm::Primitive);
 27 
 28 sub _apply {
 29     my ($self, @args) = @_;
 30 
 31     my $result = PScm::Expr::Number->new(1)->value();
 32 
 33     while (@args) {
 34         my $arg = shift @args;
 35         $self->_check_type($arg, 'PScm::Expr::Number');
 36         $result *= $arg->value;
 37     }
 38 
 39     return new PScm::Expr::Number($result);
 40 }
 41 
 42 ##################################
 43 package PScm::Primitive::Subtract;
 44 
 45 use base qw(PScm::Primitive);
 46 
 47 sub _apply {
 48     my ($self, @args) = @_;
 49 
 50     unshift @args, PScm::Expr::Number->new(0) if @args < 2;
 51 
 52     my $arg = shift @args;
 53     $self->_check_type($arg, 'PScm::Expr::Number');
 54 
 55     my $result = $arg->value;
 56 
 57     while (@args) {
 58         $arg = shift @args;
 59         $self->_check_type($arg, 'PScm::Expr::Number');
 60         $result -= $arg->value;
 61     }
 62 
 63     return new PScm::Expr::Number($result);
 64 }
 65 
 66 1;

This class holds all of the code for simple functions that can be passed already evaluated arguments. You can see that it in fact inherits from PScm::Expr rather than directly from PScm, which explains the dotted line in the various PScm::Expr figures.

This base PScm::Primitive class provides the Apply() method for all simple functions:

  7 sub Apply {
  8     my ($self, $form) = @_;
  9 
 10     my @unevaluated_args = $form->value;
 11     my @evaluated_args = map { $_->Eval() } @unevaluated_args;
 12     return $self->_apply(@evaluated_args);
 13 }

On Line 10 it extracts the arguments to the operation from the $form by calling the $form's value() method. $form is a PScm::Expr::List and we've already seen that the value() method for a list object dereferences and returns the underlying list. Then, on Line 11, Apply() evaluates each argument by mapping a call to each one's Eval() method. Finally, on Line 12, it passes the resulting list of evaluated arguments to a private _apply() method and returns the result.

_apply() is implemented differently by each primitive operation. So each primitive operation—each subclass of PScm::Primitive—only needs an _apply() method which will be called with a list of already evaluated arguments.

The _apply() in PScm::Primitive::Multiply is very straightforward. It simply multiplies its arguments together and returns the result as a new PScm::Expr::Number. Note that, somewhat accidentally, if only given one argument it will simply return it, and if given no arguments it will return 1.

 28 sub _apply {
 29     my ($self, @args) = @_;
 30 
 31     my $result = PScm::Expr::Number->new(1)->value();
 32 
 33     while (@args) {
 34         my $arg = shift @args;
 35         $self->_check_type($arg, 'PScm::Expr::Number');
 36         $result *= $arg->value;
 37     }
 38 
 39     return new PScm::Expr::Number($result);
 40 }

On Line 31 the rather convoluted trick to get an initial value will work whether or not the underlying implementation of PScm::Expr::Number uses Math::BigInt or not.

The _check_type() method in the base class just saves us some typing, since we are checking the type of argument to the primitive:

 15 sub _check_type {
 16     my ($self, $thing, $type) = @_;
 17 
 18     die "wrong type argument(", ref($thing), ") to ", ref($self),
 19       "\n"
 20       unless $thing->isa($type);
 21 }

PScm::Primitive::Subtract's _apply() method is more complicated only because it distinguishes between unary negation (- x) and subtraction. If it gets only one argument it returns its negation, otherwise it subtracts subsequent arguments from the first one. It will return 0 if called with no arguments.

 47 sub _apply {
 48     my ($self, @args) = @_;
 49 
 50     unshift @args, PScm::Expr::Number->new(0) if @args < 2;
 51 
 52     my $arg = shift @args;
 53     $self->_check_type($arg, 'PScm::Expr::Number');
 54 
 55     my $result = $arg->value;
 56 
 57     while (@args) {
 58         $arg = shift @args;
 59         $self->_check_type($arg, 'PScm::Expr::Number');
 60         $result -= $arg->value;
 61     }
 62 
 63     return new PScm::Expr::Number($result);
 64 }

That's all the primitive operations we support. There are a whole host of others that could trivially be added here and it might be entertaining to add them, but all the really interesting stuff is happening over in the special forms, discussed next.

3.7. Special Forms

All the code for special forms is in PScm/SpecialForm.pm in Listing 6. Like PScm::Primitive it descends from PScm::Expr.

Listing 6. PScm/SpecialForm.pm

  1 package PScm::SpecialForm;
  2 
  3 use strict;
  4 use warnings;
  5 use base qw(PScm::Expr);
  6 
  7 ##############################
  8 package PScm::SpecialForm::If;
  9 
 10 use base qw(PScm::SpecialForm);
 11 
 12 sub Apply {
 13     my ($self, $form) = @_;
 14 
 15     my ($condition, $true_branch, $false_branch) = $form->value;
 16 
 17     if ($condition->Eval()->isTrue) {
 18         return $true_branch->Eval();
 19     } else {
 20         return $false_branch->Eval();
 21     }
 22 }
 23 
 24 1;

At the moment there is only one special form, if, so the listing is short. It will get longer in subsequent versions though.

For special forms, the Apply() method is in the individual operation's class. On Line 15 PScm::SpecialForm::If's Apply() method extracts the condition, the expression to evaluate if the condition is true, and the expression to evaluate if the condition is false, from the argument $form. Then on Line 17 it evaluates the condition, and calls the result's isTrue() method to determine which branch to evaluate:

 12 sub Apply {
 13     my ($self, $form) = @_;
 14 
 15     my ($condition, $true_branch, $false_branch) = $form->value;
 16 
 17     if ($condition->Eval()->isTrue) {
 18         return $true_branch->Eval();
 19     } else {
 20         return $false_branch->Eval();
 21     }
 22 }

If the condition is true, PScm::SpecialForm::If::Apply() evaluates and returns the true branch (Line 18), otherwise it evaluates and returns the false branch (Line 20). The decision of what is true or false is delegated to an isTrue() method. The one and only isTrue() method is defined in PScm/Expr.pm right at the top of the data type hierarchy, in the PScm::Expr class as:

  7 sub isTrue {
  8     my ($self) = @_;
  9     scalar($self->value);
 10 }

Remembering that value() just dereferences the underlying list or scalar, isTrue() then pretty much agrees with Perl's idea of truth, namely that zero, the empty string, and the empty list are false, everything else is true⁹.

That really is all there is to evaluation. Next we're going to take a look at the print system.

3.8. Output

After Eval() returns the result to the repl, ReadEvalPrint() calls the result's Print() method with the output handle as argument. That method is defined in PScm.pm

 48 sub Print {
 49     my ($self, $outfh) = @_;
 50     print $outfh $self->as_string, "\n";
 51 }

All it does is print the string representation of the object obtained by calling its as_string() method. A fallback as_string() method is provided in this class at Line 53.

 53 sub as_string { ref($_[0]); }

It just returns the class name of the object. This is needed occasionally in the case where internals such as primitive operations might be returned by the evaluator, for example:

> *
PScm::Primitive::Multiply
[download]

But that is an unusual and usually unintentional situation. The main as_string() methods are strategically placed around the by now familiar PScm::Expr hierarchy, as shown in Figure 3.5.

Figure 3.5. PScm::Expr as_string() methods

The as_string() method in PScm::Expr::Atom is just a call to value():

 30 sub as_string { $_[0]->value }

That method works for both symbols and numbers.

PScm::Expr::List's as_string() method returns a string representation of the list by recursively calling as_string() on each of its components and concatenating the result, separated by spaces and wrapped in braces¹⁰.

 55 sub as_string {
 56     my ($self) = @_;
 57     return '('
 58       . join(' ', map { $_->as_string } $self->value)
 59       . ')';
 60 }

Finally, PScm::Expr::String's as_string() method at Lines 97–104 overrides the one in PScm::Expr::Atom because it needs to put back any backslashes that the parser took out, and wrap itself in double quotes.

 97 sub as_string {
 98     my ($self) = @_;
 99 
100     my $copy = $self->value;
101     $copy =~ s/\\/\\\\/sg;
102     $copy =~ s/"/\\"/sg;
103     return qq'"$copy"';
104 }

3.9. Summary

We're finally in a position to understand the whole of PScm::Expr as shown in Listing 7.

Listing 7. PScm/Expr.pm

  1 package PScm::Expr;
  2 
  3 use strict;
  4 use warnings;
  5 use base qw(PScm::Token);
  6 
  7 sub isTrue {
  8     my ($self) = @_;
  9     scalar($self->value);
 10 }
 11 
 12 sub Eval {
 13     my ($self) = @_;
 14     return $self;
 15 }
 16 
 17 sub value { $_[0] }
 18 
 19 #########################
 20 package PScm::Expr::Atom;
 21 use base qw(PScm::Expr);
 22 
 23 sub new {
 24     my ($class, $value) = @_;
 25     bless \$value, $class;
 26 }
 27 
 28 sub value { ${ $_[0] } }
 29 
 30 sub as_string { $_[0]->value }
 31 
 32 #########################
 33 package PScm::Expr::List;
 34 use base qw(PScm::Expr);
 35 
 36 sub new {
 37     my ($class, @list) = @_;
 38 
 39     $class = ref($class) || $class;
 40     bless [@list], $class;
 41 }
 42 
 43 sub value { @{ $_[0] } }
 44 
 45 sub first { $_[0][0] }
 46 
 47 sub rest {
 48     my ($self) = @_;
 49 
 50     my @value = $self->value;
 51     shift @value;
 52     return $self->new(@value);
 53 }
 54 
 55 sub as_string {
 56     my ($self) = @_;
 57     return '('
 58       . join(' ', map { $_->as_string } $self->value)
 59       . ')';
 60 }
 61 
 62 sub Eval {
 63     my ($self) = @_;
 64     my $op = $self->first()->Eval();
 65     return $op->Apply($self->rest);
 66 }
 67 
 68 ###########################
 69 package PScm::Expr::Symbol;
 70 use base qw(PScm::Expr::Atom);
 71 
 72 sub Eval {
 73     my ($self) = @_;
 74     return $PScm::GlobalEnv->LookUp($self);
 75 }
 76 
 77 ############################
 78 package PScm::Expr::Literal;
 79 use base qw(PScm::Expr::Atom);
 80 
 81 ###########################
 82 package PScm::Expr::Number;
 83 use base qw(PScm::Expr::Literal);
 84 
 85 use Math::BigInt;
 86 
 87 sub new {
 88     my ($class, $value) = @_;
 89     $value = new Math::BigInt($value) unless ref($value);
 90     $class->SUPER::new($value);
 91 }
 92 
 93 ###########################
 94 package PScm::Expr::String;
 95 use base qw(PScm::Expr::Literal);
 96 
 97 sub as_string {
 98     my ($self) = @_;
 99 
100     my $copy = $self->value;
101     $copy =~ s/\\/\\\\/sg;
102     $copy =~ s/"/\\"/sg;
103     return qq'"$copy"';
104 }
105 
106 1;

The final version of our diagram, with all of the methods from PScm::Expr in place is shown in Figure 3.6.

Figure 3.6. PScm::Expr methods

That may seem like a lot of code for what is effectively just a pocket calculator¹¹, but what has been done is to lay the groundwork for a much more powerful set of language constructs that will be added in subsequent chapters. Let's recap with an overview of the whole thing.

A global environment is set up in the PScm package, containing bindings for defined operations.
The top-level read-eval-print loop (repl) in the PScm package creates a PScm::Read object and repeatedly calls its Read() method.
That Read() method returns PScm::Expr objects which the repl evaluates. It evaluates them by calling their Eval() method.
- PScm::Expr::Number and PScm::Expr::String objects both share an Eval() method that just returns the object unevaluated.
- PScm::Expr::Symbol objects have an Eval() method that looks up the value of the symbol in the global environment.
- PScm::Expr::List objects have an Eval() method that evaluates the first component of the list, which should return a primitive operation or special form, then calls that operations Apply() method with the remaining unevaluated components of the list as argument. What happens next depends on the type of the operation.
  - PScm::Primitive objects share an Apply() method that evaluates each of the arguments and then passes them to the individual primitive's private _apply() method.
  - PScm::SpecialForm objects each have their own Apply() method that decides whether, and how, to evaluate the arguments.
The repl then takes the result of the evaluation and calls its Print() method, which is defined in the PScm base class.
- That Print() method just calls $self->as_string() and prints the result.
  - The PScm::Expr::Atom class has an as_string() method that returns the underlying scalar, but PScm::Expr::String provides an override that wraps the result in double quotes.
  - The PScm::Expr::List class has an as_string() method that recursively calls as_string() on its components and returns the result wrapped in braces.

At the heart of the whole interpreter is the dynamic between Eval() which evaluates expressions, and Apply() which applies operations to their arguments.

⁶

but looks a lot prettier.

⁷

It could have just said

return $token unless $token->isa('PScm::Token::Open');
[download]

but I always think it's a bit rude to peep into the implementation like that, much better to ask it what it thinks it is, not forcibly extract its data type.

⁸

Lisp purists might raise an eyebrow at this point, because Eval() is supposed to know what kind of form it is evaluating and decide whether or not to evaluate the arguments. But this is an object-oriented application, and it makes much more sense to leave that decision to the objects that need to know.

⁹

This differs from a true Scheme implementation where special boolean values #t and #f represent truth and falsehood, and everything else is true. The reason for having an isTrue() is to encapsulate the chosen behaviour. If we wanted to change the meaning of truth, we need only do so here.

¹⁰

We haven't seen anything yet that might, when evaluated, return a list for printing. That's for later.

¹¹

especially one where the + and / keys don't work.

¹²

Ok, I should have used IO::String, so sue me.

Chapter 3. Interpreter Version 0.0.0