This preliminary version of the interpreter supports only three
operations, namely multiplication (*
), subtraction (-
), and conditional
evaluation (if
). It does however lay the groundwork for more
sophisticated interpreters later on.
Scheme lisp interpreters, being interactive, are based around what is called a “read eval print loop”: first read an expression, then evaluate it, then print the result, then loop. This long-winded term is often abbreviated to repl. In order for the repl to evaluate the expression, there must additionally be an environment in which symbols can be given values and in which values can be looked up. All this means that there are six principle components to such an interpreter.
The implementation we're about to discuss takes a fairly strict OO approach, with each of these components and pretty much everything else represented by classes of objects. As a consequence of this the Evaluator and the Print system are distributed throughout the Structure component. This means that for example to evaluate an expression you call its Eval method, and to print a result you call the Print method on the result object. There is a good deal of scope for polymorphism with this approach, since different types of object can respond differently to the same message.
The top-level read-eval-print loop (repl) for the PScheme interpreter is in the package PScm in Listing 1. All other packages inherit from this package, although that's mainly just a convenience.
PScm.pm
1 package PScm; 2 3 use strict; 4 use warnings; 5 use PScm::Read; 6 use PScm::Env; 7 use PScm::Primitive; 8 use PScm::SpecialForm; 9 use FileHandle; 10 11 require Exporter; 12 13 our @ISA = qw(Exporter); 14 our @EXPORT = qw(ReadEvalPrint); 15 16 =head1 NAME 17 18 PScm - Scheme-like interpreter written in Perl 19 20 =head1 SYNOPSIS 21 22 use PScm; 23 ReadEvalPrint($in_filehandle[, $out_filehandle]); 24 25 =head1 DESCRIPTION 26 27 Just messing about, A toy lisp interpreter. 28 29 =cut 30 31 our $GlobalEnv = new PScm::Env( 32 '*' => new PScm::Primitive::Multiply(), 33 '-' => new PScm::Primitive::Subtract(), 34 if => new PScm::SpecialForm::If(), 35 ); 36 37 sub ReadEvalPrint { 38 my ($infh, $outfh) = @_; 39 40 $outfh ||= new FileHandle(">-"); 41 my $reader = new PScm::Read($infh); 42 while (defined(my $expr = $reader->Read)) { 43 my $result = $expr->Eval(); 44 $result->Print($outfh); 45 } 46 } 47 48 sub Print { 49 my ($self, $outfh) = @_; 50 print $outfh $self->as_string, "\n"; 51 } 52 53 sub as_string { ref($_[0]); } 54 55 sub new { bless {}, $_[0] } 56 57 1;
Firstly, on
Lines 31–35
a global environment, $PScm::GlobalEnv
is initialised
to a new PScm::Env object.
31 our $GlobalEnv = new PScm::Env( 32 '*' => new PScm::Primitive::Multiply(), 33 '-' => new PScm::Primitive::Subtract(), 34 if => new PScm::SpecialForm::If(), 35 );
There are only three things in that environment. They are the
objects that will perform the primitive operations of multiplication,
subtraction and conditional evaluation, and they're bound to “*
”,
“-
” and “if
” respectively. We'll see how they work presently.
ReadEvalPrint()
on
Lines 37–46
is the central control routine of the
whole interpreter. It takes an input file handle and an output file
handle as arguments.
Starting on
Line 40
it defaults the output file handle to stdout,
then on
Line 41
it creates a new PScm::Read object on the input file handle,
and on
Lines 42–45
it enters
its main loop. The loop repeatedly collects an expression from the
Reader, then evaluates the expression by calling its Eval()
method,
then prints the result by calling its Print()
method:
37 sub ReadEvalPrint { 38 my ($infh, $outfh) = @_; 39 40 $outfh ||= new FileHandle(">-"); 41 my $reader = new PScm::Read($infh); 42 while (defined(my $expr = $reader->Read)) { 43 my $result = $expr->Eval(); 44 $result->Print($outfh); 45 } 46 }
The basis of the print system can be seen in the Print()
and
as_string()
methods in PScm.pm
, but we're going to leave discussion
of the print system until later on. In the next section we'll look
at our first, very simple, implementation of an environment.
All an environment has to do is to return the current value for an argument symbol. Perl hashes are ideal for this task, and our implementation uses them. Our environment is implemented by PScm::Env in Listing 2.
PScm/Env.pm
1 package PScm::Env; 2 3 use strict; 4 use warnings; 5 use base qw(PScm); 6 7 sub new { 8 my ($class, %bindings) = @_; 9 10 bless { bindings => {%bindings}, }, $class; 11 } 12 13 sub LookUp { 14 my ($self, $symbol) = @_; 15 16 if (exists($self->{bindings}{ $symbol->value })) { 17 return $self->{bindings}{ $symbol->value }; 18 } else { 19 die "no binding for @{[$symbol->value]} ", 20 "in @{[ref($self)]}\n"; 21 } 22 } 23 24 1;
It is no more than an
object wrapper around a Perl hash. The new()
method
(Lines 7–11)
creates an object with a set of bindings (name to value mappings)
that were passed in as arguments:
7 sub new { 8 my ($class, %bindings) = @_; 9 10 bless { bindings => {%bindings}, }, $class; 11 }
The LookUp()
method on
Lines 13–22
looks up a symbol in the
bindings, die
-ing if the symbol does not have a binding:
13 sub LookUp { 14 my ($self, $symbol) = @_; 15 16 if (exists($self->{bindings}{ $symbol->value })) { 17 return $self->{bindings}{ $symbol->value }; 18 } else { 19 die "no binding for @{[$symbol->value]} ", 20 "in @{[ref($self)]}\n"; 21 } 22 }
Note that the $symbol
passed in is an object,
and LookUp()
must call the symbol's value()
method to get
a string suitable for a hash key. The value()
method for a symbol
just returns the name of the symbol as a perl string.
Because this first version of the interpreter has no support for local variables, this class doesn't provide any methods for adding values to the environment. That will come later.
And that's all there is to our environment class. Let's move on to look at the Reader.
The job of the Reader is to take a stream of text and convert
it into a structure that the evaluator can more easily work with.
So for example we want to take an expression such as (foo
("bar" 10) baz)
and convert it into an equivalent structure such
as shown in Figure 3.1.
(foo ("bar" 10) baz)
In this figure, showing the result of parsing that expression,
the top-level list object has three components. Reading left to right
it contains the symbol object foo
, another list object
and the symbol object baz
.
The sub-list contains the string object "bar"
and the
number object 10
. It is apparent that that the structure
is a direct representation of the text, where each list corresponds
to the contents of a matching pair of braces. It should also be
obvious that these structures are practically identical to Perl
list references. The scheme list (foo ("bar" 10) baz)
corresponds
directly to the nested perl listref [$foo, ["bar",
10], $baz]
6.
To simplify the creation of such a structure from an input stream, it is often convenient to split the process into two parts:
That is the approach taken by the Reader described here.
It was mentioned earlier that Scheme was extremely easy to parse, well here's the proof. The code for the Reader, PScm::Read in Listing 3 is only 63 lines long.
PScm/Read.pm
1 package PScm::Read; 2 3 use strict; 4 use warnings; 5 use PScm::Expr; 6 use PScm::Token; 7 use base qw(PScm); 8 9 sub new { 10 my ($class, $fh) = @_; 11 bless { 12 FileHandle => $fh, 13 Line => '', 14 }, $class; 15 } 16 17 sub Read { 18 my ($self) = @_; 19 20 my $token = $self->_next_token(); 21 return undef unless defined $token; 22 23 return $token unless $token->is_open_token; 24 25 my @res = (); 26 27 while (1) { 28 $token = $self->Read; 29 die "unexpected EOF" 30 if !defined $token; 31 last if $token->is_close_token; 32 push @res, $token; 33 } 34 35 return new PScm::Expr::List(@res); 36 } 37 38 sub _next_token { 39 my ($self) = @_; 40 41 while (!$self->{Line}) { 42 $self->{Line} = $self->{FileHandle}->getline(); 43 return undef unless defined $self->{Line}; 44 $self->{Line} =~ s/^\s+//s; 45 } 46 47 for ($self->{Line}) { 48 s/^\(\s*// && return PScm::Token::Open->new(); 49 s/^\)\s*// && return PScm::Token::Close->new(); 50 s/^([-+]?\d+)\s*// 51 && return PScm::Expr::Number->new($1); 52 s/^"((?:(?:\\.)|([^"]))*)"\s*// && do { 53 my $string = $1; 54 $string =~ s/\\//g; 55 return PScm::Expr::String->new($string); 56 }; 57 s/^([^\s\(\)]+)\s*// 58 && return PScm::Expr::Symbol->new($1); 59 } 60 die "can't parse: $self->{Line}"; 61 } 62 63 1;
As with the rest of the
implementation, it uses an OO style, so the Reader
is an object that is created with an argument FileHandle
and behaves
as an iterator returning the next parsed expression from the stream
on each call to Read()
. The new()
method
(Lines 9–15)
simply stashes its input
file handle argument along with an empty string representing
the current line, and returns them in the new object.
9 sub new { 10 my ($class, $fh) = @_; 11 bless { 12 FileHandle => $fh, 13 Line => '', 14 }, $class; 15 }
Apart from new()
the only other publicly available method is
Read()
, which returns the next complete expression, as a structure,
from the input file.
The Read()
method calls the private _next_token()
method (the
tokeniser) for its tokens.
Skipping over the Read()
method for now,
_next_token()
on
Lines 38–61
simply chomps the next token off the
input stream and returns it. It knows enough to skip whitespace and
blank lines and to return undef at EOF
(Lines 41–45).
If there is
a line left to tokenise, then a few simple regexes are tried in
turn to strip the next token from it. As soon as a token of a
particular type is recognised, it is returned to the caller.
38 sub _next_token { 39 my ($self) = @_; 40 41 while (!$self->{Line}) { 42 $self->{Line} = $self->{FileHandle}->getline(); 43 return undef unless defined $self->{Line}; 44 $self->{Line} =~ s/^\s+//s; 45 } 46 47 for ($self->{Line}) { 48 s/^\(\s*// && return PScm::Token::Open->new(); 49 s/^\)\s*// && return PScm::Token::Close->new(); 50 s/^([-+]?\d+)\s*// 51 && return PScm::Expr::Number->new($1); 52 s/^"((?:(?:\\.)|([^"]))*)"\s*// && do { 53 my $string = $1; 54 $string =~ s/\\//g; 55 return PScm::Expr::String->new($string); 56 }; 57 s/^([^\s\(\)]+)\s*// 58 && return PScm::Expr::Symbol->new($1); 59 } 60 die "can't parse: $self->{Line}"; 61 }
Lines 47–59
do the actual tokenisation. The tokeniser only
needs to distinguish open and close braces, numbers, strings and
symbols, where anything that doesn't look like an open or close
brace, a number or a string must be a symbol.
_next_token()
returns its data in objects, which incidentally happens
to be a very convenient way of tagging the type of token returned.
The objects are of two basic types: PScm::Token; and PScm::Expr.
The PScm::Token types PScm::Token::Open and PScm::Token::Close represent an open and a close brace respectively, and contain no data. The three PScm::Expr types, PScm::Expr::Number, PScm::Expr::String and PScm::Expr::Symbol contain the relevant number, string or symbol.
Now that we know how _next_token()
works, we can go
back and take a look at Read()
.
The Read()
method
(Lines 17–36)
has to return the next
complete expression from the input stream. That could be a simple
symbol, string or number, or an arbitrarily nested list. It starts by calling _next_token()
at
Line 20
and returning undef if _next_token()
returned undef
(signifying end of file).
17 sub Read { 18 my ($self) = @_; 19 20 my $token = $self->_next_token(); 21 return undef unless defined $token; 22 23 return $token unless $token->is_open_token; 24 25 my @res = (); 26 27 while (1) { 28 $token = $self->Read; 29 die "unexpected EOF" 30 if !defined $token; 31 last if $token->is_close_token; 32 push @res, $token; 33 } 34 35 return new PScm::Expr::List(@res); 36 }
Then, at
Line 23
if the token is anything other than an open brace (determined by the call
to is_open_token()
7), Read()
just returns it.
Otherwise, the token just read is an open brace, so Read()
initialises an empty result @res
to hold the list it expects
to accumulate then enters a loop calling itself recursively to
collect the (possibly nested) components of the list. It is an
error if it detects EOF while a list is unclosed, and if
it detects a close brace (is_close_token()
) it knows its work
is done and it returns
the accumulated list as a new PScm::Expr::List object.
The structure returned by Read()
is completely composed
of subtypes of PScm::Expr, since the PScm::Token
types do not actually get entered into the structure. Let's work
through the parsing of that simple expression (foo ("bar" 10)
baz)
. In the following, the subscript number keeps track of
which particular invocation of Read()
we are talking about.
Read
1 calls _next_token()
and gets a (
so it enters its loop.Read
1 calls Read
2 from within its loop.
Read
2 calls _next_token()
and gets a foo
, so it returns it.Read
1 puts the foo
at the start of its list: (foo
.Read
1 calls Read
3.
Read
3 calls _next_token()
and gets a (
so it enters its loop.Read
3 calls Read
4.
Read
4 calls _next_token()
and gets a "bar"
so it returns it.Read
3 puts the "bar"
at the start of its list: ("bar"
.Read
3 calls Read
5.
Read
5 calls _next_token()
and gets a 10
so it returns it.Read
3 adds the 10
to its growing list: ("bar" 10
.Read
3 calls Read
6.
Read
6 calls _next_token()
and gets a )
so it returns it.Read
3 gets the )
so it knows it has reached the end of its list
and returns the result: ("bar" 10)
.Read
1 adds the ("bar" 10)
to the end of its own growing list:
(foo ("bar" 10)
.Read
1 calls Read
7.
Read
7 calls _next_token()
and gets a baz
so it returns it.Read
1 adds the baz
to the end of its own growing list:
(foo ("bar" 10) baz
.Read
1 calls Read
8.
Read
8 calls _next_token()
and gets a )
so it returns it.Read
1 gets the )
so it knows it has reached the end of its list
and returns the result: (foo ("bar" 10) baz)
.So the Reader does indeed return the structure expected.
The PScm::Token and PScm::Expr classes are
in their eponymous files. The PScm::Token classes in
Listing 4 are purely parse-related. As mentioned earlier,
they are returned by the tokeniser to indicate open and close
braces. These tokens are used to guide the parser, but it does
not actually include them in the result.
PScm::Token::Open and PScm::Token::Close
both inherit from PScm::Token.
PScm::Token defines default implementations for
is_open_token()
and is_close_token()
, which
the two derived classes override appropriately. PScm::Token
is just:
1 package PScm::Token; 2 3 use strict; 4 use warnings; 5 use base qw(PScm); 6 7 sub is_open_token { 0 } 8 sub is_close_token { 0 }
PScm::Token::Open overrides is_open_token()
:
11 package PScm::Token::Open; 12 13 use base qw(PScm::Token); 14 15 sub is_open_token { 1 }
and PScm::Token::Close overrides is_close_token()
:
18 package PScm::Token::Close; 19 20 use base qw(PScm::Token); 21 22 sub is_close_token { 1 } 23 24 1;
PScm::Token inherits a stub
new()
method from the PScm class that just blesses an empty
hash with the argument class.
PScm/Token.pm
1 package PScm::Token; 2 3 use strict; 4 use warnings; 5 use base qw(PScm); 6 7 sub is_open_token { 0 } 8 sub is_close_token { 0 } 9 10 ########################## 11 package PScm::Token::Open; 12 13 use base qw(PScm::Token); 14 15 sub is_open_token { 1 } 16 17 ########################### 18 package PScm::Token::Close; 19 20 use base qw(PScm::Token); 21 22 sub is_close_token { 1 } 23 24 1;
As for the PScm::Expr objects that Read()
accumulates and returns, as noted Read()
has done all of the
work in constructing a tree of them for us, so they are more properly
discussed in the next section where we look at expressions.
The various PScm::Expr objects
are defined in PScm/Expr.pm
. These objects represent
the basic data types that are visible to the user: strings; numbers;
symbols; and lists. They are the types returned by the
Reader and printed by the print system. It would be premature to go into
all the details of the PScm::Expr package right now,
but it is worth pointing out a few salient features about it.
Firstly the classes arrange themselves in a Composite Pattern [gof pp163–173] according to the hierarchy of PScheme types as in Figure 3.2.
This figure is drawn using a standard set of conventions for diagramming the relationships between classes in an OO design, called “the Unified Modelling Language”, or UML. [umld]
For those who don't know UML, the triangular shape means “inherits from” or “is a subclass of”, and the black arrow and circle coming from the white diamond means “aggregates zero or more of”. The classes with names in italics are “abstract” classes. As far as Perl is concerned, calling a class “abstract” just means that we promise not to create any actual object instances of that particular class. The unterminated dotted line simply implies that we will be deriving other classes from PScm::Expr later on.
The root of the hierarchy is PScm::Expr, representing any and all expressions. That divides into lists (PScm::Expr::List) and atoms (PScm::Expr::Atom).
Lists are composed of expressions (the aggregation relationship.)
Atoms represent any data type that cannot be trivially taken apart, anything that's not a list in other words. Atoms are subclassed into literals (PScm::Expr::Literal) and symbols (PScm::Expr::Symbol), and literals are subclassed into strings (PScm::Expr::String) and numbers (PScm::Expr::Number).
We'll see a lot of this diagram in various guises
as we progress. Here's the same diagram,
in Figure 3.3
with the location of the
new()
and value()
methods added.
new()
and value()
methodsAs you can see, there are
three new()
methods in the class structure.
The PScm::Expr::Atom abstract class is the parent class
for strings and numbers (via PScm::Expr::Literal) and for
symbols. Since all of these types are simple scalars, the new()
method in PScm::Expr::Atom does for most of them: it
blesses a reference to the scalar into the appropriate class.
23 sub new { 24 my ($class, $value) = @_; 25 bless \$value, $class; 26 }
However the PScm::Expr::Number package supplies its own
new()
method, because we avail ourselves of the core
Math::BigInt package for our integers. While it is nice
to have arbitrary sized integers by default, the main reason for
doing this is to avoid the embarrassment of Perl's automatic type conversion to floating
point on integer overflow when implementing a language that is only supposed to
support integer arithmetic.
82 package PScm::Expr::Number; 83 use base qw(PScm::Expr::Literal); 84 85 use Math::BigInt; 86 87 sub new { 88 my ($class, $value) = @_; 89 $value = new Math::BigInt($value) unless ref($value); 90 $class->SUPER::new($value); 91 }
The PScm::Expr::List class has the other new()
method that simply bundles up its argument Perl list in a new object:
36 sub new { 37 my ($class, @list) = @_; 38 39 $class = ref($class) || $class; 40 bless [@list], $class; 41 }
All three of these new()
methods have already been seen in action in the Reader.
Alongside most of the new()
methods is a value()
method that does the exact reverse of new()
and retrieves the
underlying value from the object. In the case of atoms, it dereferences
the scalar value:
28 sub value { ${ $_[0] } }
and in the case of lists, it dereferences the list:
43 sub value { @{ $_[0] } }
Even though PScm::Expr::Number has its own new()
method,
we don't need a separate value()
method for numbers, we
never need to retrieve the actual perl number from the
Math::BigInt object so we just inherit value()
from PScm::Expr::Atom. We do however provide a default
value()
method in PScm::Expr. This default
method just returns $self
.
17 sub value { $_[0] }
This is solely for the benefit of those as-yet undescribed additional PScm::Expr subclasses, which will all stand for their own values.
We've seen that the various PScheme expression types (lists, numbers, strings and symbols) arrange themselves naturally into a hierachy of types and also form a recognised design pattern called “Composite”. Next we're going to look at how those expressions are evaluated.
To evaluate a PScm::Expr, as mentioned earlier,
the top level ReadEvalPrint()
loop just calls the expression's
Eval()
method. The Eval()
methods of PScm::Expr
are located in
three of its subclasses as shown in Figure 3.4.
Eval()
MethodsThe figure shows that there is a separate Eval()
method
for lists and for symbols, and a default method for all other PScm::Expr.
Let's look
first at the default Eval()
method in PScm::Expr
which currently applies to literals.
PScm::Expr::String and PScm::Expr::Number share this
default Eval()
method, which just returns $self
:
12 sub Eval { 13 my ($self) = @_; 14 return $self; 15 }
This means that numbers and strings evaluate to themselves, as they should, and if we were to add other types of expression later on, they too would by default evaluate to themselves.
Evaluation of a symbol is only slightly more complex. The
Eval()
method in PScm::Expr::Symbol looks up its
value in the global environment $PScm::GlobalEnv
:
72 sub Eval { 73 my ($self) = @_; 74 return $PScm::GlobalEnv->LookUp($self); 75 }
Remember that LookUp()
from PScm::Env
expects a symbol object as
argument and calls its value()
method to get a string that
it can then use to retrieve the actual value from the hash
representing the environment.
Before showing how PScm::Expr::List objects are evaluated,
we need to consider a couple of support methods for lists:
first()
and rest()
.
The first()
method of PScm::Expr::List just
returns the first component of the list:
45 sub first { $_[0][0] }
The rest()
method of PScm::Expr::List returns all
but the first component of the list as a new PScm::Expr::List
object:
47 sub rest { 48 my ($self) = @_; 49 50 my @value = $self->value; 51 shift @value; 52 return $self->new(@value); 53 }
Now we can look at the evaluation of list expressions. Here's
PScm::Expr::List::Eval()
:
62 sub Eval { 63 my ($self) = @_; 64 my $op = $self->first()->Eval(); 65 return $op->Apply($self->rest); 66 }
It's surprisingly simple. a PScm::Expr::List just evaluates
its first element
(Line 64).
That should return one of
PScm::Primitive::Multiply, PScm::Primitive::Subtract
or PScm::SpecialForm::If, which gets assigned to $op
.
Of course because we're not doing any error checking,
first()
could return anything, so we're assuming
valid input.
Because PScm::Expr::List's Eval()
does not know or care whether the operation
$op
it derived on Line 64
is a simple primitive or a special form, on
Line 65
it passes the rest of itself (the list of arguments)
unevaluated to that operations Apply()
method
which applies itself to those arguments.
Each individual operation's Apply()
method will decide whether
or not to evaluate its arguments, and what
to do with them afterwards8.
So we've seen how PScm::Expr objects evaluate themselves.
In particular we've seen how a list evaluates itself by evaluating its
first component to get a primitive operation or special form, then calling
that object's Apply()
method with the rest of the list, unevaluated,
as argument. Next we're going to look at one of those Apply()
methods,
the PScm::Primitive Apply()
method.
The primitive built-in functions all live in
PScm/Primitive.pm
, shown in Listing 5.
PScm/Primitive.pm
1 package PScm::Primitive; 2 3 use strict; 4 use warnings; 5 use base qw(PScm::Expr); 6 7 sub Apply { 8 my ($self, $form) = @_; 9 10 my @unevaluated_args = $form->value; 11 my @evaluated_args = map { $_->Eval() } @unevaluated_args; 12 return $self->_apply(@evaluated_args); 13 } 14 15 sub _check_type { 16 my ($self, $thing, $type) = @_; 17 18 die "wrong type argument(", ref($thing), ") to ", ref($self), 19 "\n" 20 unless $thing->isa($type); 21 } 22 23 ################################## 24 package PScm::Primitive::Multiply; 25 26 use base qw(PScm::Primitive); 27 28 sub _apply { 29 my ($self, @args) = @_; 30 31 my $result = PScm::Expr::Number->new(1)->value(); 32 33 while (@args) { 34 my $arg = shift @args; 35 $self->_check_type($arg, 'PScm::Expr::Number'); 36 $result *= $arg->value; 37 } 38 39 return new PScm::Expr::Number($result); 40 } 41 42 ################################## 43 package PScm::Primitive::Subtract; 44 45 use base qw(PScm::Primitive); 46 47 sub _apply { 48 my ($self, @args) = @_; 49 50 unshift @args, PScm::Expr::Number->new(0) if @args < 2; 51 52 my $arg = shift @args; 53 $self->_check_type($arg, 'PScm::Expr::Number'); 54 55 my $result = $arg->value; 56 57 while (@args) { 58 $arg = shift @args; 59 $self->_check_type($arg, 'PScm::Expr::Number'); 60 $result -= $arg->value; 61 } 62 63 return new PScm::Expr::Number($result); 64 } 65 66 1;
This class holds all of the code for simple functions that can be passed already evaluated arguments. You can see that it in fact inherits from PScm::Expr rather than directly from PScm, which explains the dotted line in the various PScm::Expr figures.
This base PScm::Primitive class provides the Apply()
method for all simple functions:
7 sub Apply { 8 my ($self, $form) = @_; 9 10 my @unevaluated_args = $form->value; 11 my @evaluated_args = map { $_->Eval() } @unevaluated_args; 12 return $self->_apply(@evaluated_args); 13 }
On
Line 10
it extracts the arguments to the operation from the $form
by calling the $form
's value()
method.
$form
is a PScm::Expr::List and we've already seen
that the value()
method for a list object dereferences and
returns the underlying list.
Then, on
Line 11,
Apply()
evaluates each argument by mapping a call to
each one's Eval()
method.
Finally, on
Line 12,
it passes the resulting
list of evaluated arguments to a private _apply()
method
and returns the result.
_apply()
is
implemented differently by each primitive operation. So each primitive
operation—each subclass
of PScm::Primitive—only needs an _apply()
method
which will be called with a list of already evaluated arguments.
The _apply()
in PScm::Primitive::Multiply is
very straightforward. It simply multiplies its arguments together and
returns the result as a new PScm::Expr::Number. Note that,
somewhat accidentally, if only given one argument it will simply return
it, and if given no arguments it will return 1.
28 sub _apply { 29 my ($self, @args) = @_; 30 31 my $result = PScm::Expr::Number->new(1)->value(); 32 33 while (@args) { 34 my $arg = shift @args; 35 $self->_check_type($arg, 'PScm::Expr::Number'); 36 $result *= $arg->value; 37 } 38 39 return new PScm::Expr::Number($result); 40 }
On Line 31 the rather convoluted trick to get an initial value will work whether or not the underlying implementation of PScm::Expr::Number uses Math::BigInt or not.
The _check_type()
method in the base class just saves us some
typing, since we are checking the type of argument to the primitive:
15 sub _check_type { 16 my ($self, $thing, $type) = @_; 17 18 die "wrong type argument(", ref($thing), ") to ", ref($self), 19 "\n" 20 unless $thing->isa($type); 21 }
PScm::Primitive::Subtract's _apply()
method is more complicated
only because it distinguishes between unary negation (- x)
and
subtraction. If it gets only one argument it returns its negation,
otherwise it subtracts subsequent arguments from the first one.
It will return 0 if called with no arguments.
47 sub _apply { 48 my ($self, @args) = @_; 49 50 unshift @args, PScm::Expr::Number->new(0) if @args < 2; 51 52 my $arg = shift @args; 53 $self->_check_type($arg, 'PScm::Expr::Number'); 54 55 my $result = $arg->value; 56 57 while (@args) { 58 $arg = shift @args; 59 $self->_check_type($arg, 'PScm::Expr::Number'); 60 $result -= $arg->value; 61 } 62 63 return new PScm::Expr::Number($result); 64 }
That's all the primitive operations we support. There are a whole host of others that could trivially be added here and it might be entertaining to add them, but all the really interesting stuff is happening over in the special forms, discussed next.
All the code for special forms is in PScm/SpecialForm.pm
in Listing 6. Like PScm::Primitive
it descends from PScm::Expr.
PScm/SpecialForm.pm
1 package PScm::SpecialForm; 2 3 use strict; 4 use warnings; 5 use base qw(PScm::Expr); 6 7 ############################## 8 package PScm::SpecialForm::If; 9 10 use base qw(PScm::SpecialForm); 11 12 sub Apply { 13 my ($self, $form) = @_; 14 15 my ($condition, $true_branch, $false_branch) = $form->value; 16 17 if ($condition->Eval()->isTrue) { 18 return $true_branch->Eval(); 19 } else { 20 return $false_branch->Eval(); 21 } 22 } 23 24 1;
At the moment there is only one special form, if
, so
the listing is short. It will get longer in subsequent versions
though.
For special forms, the Apply()
method is in the individual
operation's class. On
Line 15
PScm::SpecialForm::If's Apply()
method extracts the condition, the expression to evaluate if the condition
is true, and the expression to evaluate if the condition is false,
from the argument $form
.
Then on
Line 17
it evaluates the condition, and calls
the result's isTrue()
method to determine which branch to
evaluate:
12 sub Apply { 13 my ($self, $form) = @_; 14 15 my ($condition, $true_branch, $false_branch) = $form->value; 16 17 if ($condition->Eval()->isTrue) { 18 return $true_branch->Eval(); 19 } else { 20 return $false_branch->Eval(); 21 } 22 }
If the condition is true, PScm::SpecialForm::If::Apply()
evaluates and returns the true branch
(Line 18),
otherwise it evaluates and
returns the false branch
(Line 20).
The decision of what is true or false is
delegated to an isTrue()
method. The one and only
isTrue()
method is defined in
PScm/Expr.pm
right at the top of the data type hierarchy,
in the PScm::Expr class as:
7 sub isTrue { 8 my ($self) = @_; 9 scalar($self->value); 10 }
Remembering that value()
just dereferences the underlying
list or scalar, isTrue()
then pretty much agrees with Perl's idea
of truth, namely that
zero, the empty string, and the empty list are false, everything
else is true9.
That really is all there is to evaluation. Next we're going to take a look at the print system.
After Eval()
returns the result to the repl,
ReadEvalPrint()
calls the result's Print()
method with
the output handle as argument. That method is defined in
PScm.pm
48 sub Print { 49 my ($self, $outfh) = @_; 50 print $outfh $self->as_string, "\n"; 51 }
All it does is print the string representation of the
object obtained by calling its as_string()
method. A fallback
as_string()
method is provided in this class at Line 53.
53 sub as_string { ref($_[0]); }
It just returns the class name of the object. This is needed occasionally in the case where internals such as primitive operations might be returned by the evaluator, for example:
> * PScm::Primitive::Multiply
But that is an unusual and usually unintentional situation. The
main as_string()
methods are strategically placed around the
by now familiar PScm::Expr hierarchy, as shown in Figure 3.5.
as_string()
methodsThe as_string()
method in PScm::Expr::Atom is
just a call to value()
:
30 sub as_string { $_[0]->value }
That method works for both symbols and numbers.
PScm::Expr::List's as_string()
method
returns a string representation of the list by recursively calling
as_string()
on each of its components and concatenating the
result, separated by spaces and wrapped in braces10.
55 sub as_string { 56 my ($self) = @_; 57 return '(' 58 . join(' ', map { $_->as_string } $self->value) 59 . ')'; 60 }
Finally, PScm::Expr::String's as_string()
method at
Lines 97–104
overrides the one in PScm::Expr::Atom because it needs to put back
any backslashes that the parser took out, and wrap itself in double
quotes.
97 sub as_string { 98 my ($self) = @_; 99 100 my $copy = $self->value; 101 $copy =~ s/\\/\\\\/sg; 102 $copy =~ s/"/\\"/sg; 103 return qq'"$copy"'; 104 }
We're finally in a position to understand the whole of PScm::Expr as shown in Listing 7.
PScm/Expr.pm
1 package PScm::Expr; 2 3 use strict; 4 use warnings; 5 use base qw(PScm::Token); 6 7 sub isTrue { 8 my ($self) = @_; 9 scalar($self->value); 10 } 11 12 sub Eval { 13 my ($self) = @_; 14 return $self; 15 } 16 17 sub value { $_[0] } 18 19 ######################### 20 package PScm::Expr::Atom; 21 use base qw(PScm::Expr); 22 23 sub new { 24 my ($class, $value) = @_; 25 bless \$value, $class; 26 } 27 28 sub value { ${ $_[0] } } 29 30 sub as_string { $_[0]->value } 31 32 ######################### 33 package PScm::Expr::List; 34 use base qw(PScm::Expr); 35 36 sub new { 37 my ($class, @list) = @_; 38 39 $class = ref($class) || $class; 40 bless [@list], $class; 41 } 42 43 sub value { @{ $_[0] } } 44 45 sub first { $_[0][0] } 46 47 sub rest { 48 my ($self) = @_; 49 50 my @value = $self->value; 51 shift @value; 52 return $self->new(@value); 53 } 54 55 sub as_string { 56 my ($self) = @_; 57 return '(' 58 . join(' ', map { $_->as_string } $self->value) 59 . ')'; 60 } 61 62 sub Eval { 63 my ($self) = @_; 64 my $op = $self->first()->Eval(); 65 return $op->Apply($self->rest); 66 } 67 68 ########################### 69 package PScm::Expr::Symbol; 70 use base qw(PScm::Expr::Atom); 71 72 sub Eval { 73 my ($self) = @_; 74 return $PScm::GlobalEnv->LookUp($self); 75 } 76 77 ############################ 78 package PScm::Expr::Literal; 79 use base qw(PScm::Expr::Atom); 80 81 ########################### 82 package PScm::Expr::Number; 83 use base qw(PScm::Expr::Literal); 84 85 use Math::BigInt; 86 87 sub new { 88 my ($class, $value) = @_; 89 $value = new Math::BigInt($value) unless ref($value); 90 $class->SUPER::new($value); 91 } 92 93 ########################### 94 package PScm::Expr::String; 95 use base qw(PScm::Expr::Literal); 96 97 sub as_string { 98 my ($self) = @_; 99 100 my $copy = $self->value; 101 $copy =~ s/\\/\\\\/sg; 102 $copy =~ s/"/\\"/sg; 103 return qq'"$copy"'; 104 } 105 106 1;
The final version of our diagram, with all of the methods from PScm::Expr in place is shown in Figure 3.6.
That may seem like a lot of code for what is effectively just a pocket calculator11, but what has been done is to lay the groundwork for a much more powerful set of language constructs that will be added in subsequent chapters. Let's recap with an overview of the whole thing.
Read()
method.Read()
method returns PScm::Expr objects which
the repl evaluates. It evaluates them by calling their Eval()
method.
Eval()
method that just returns the object
unevaluated.Eval()
method
that looks up the value of the symbol in the global environment.Eval()
method
that evaluates the first component of the list, which should return a
primitive operation or special form, then calls that operations Apply()
method
with the remaining unevaluated components of the list as argument. What
happens next depends on the type of the operation.
Apply()
method
that evaluates each of the arguments and then passes them to the
individual primitive's private _apply()
method.Apply()
method that decides whether, and how, to evaluate the
arguments.Print()
method, which is defined in the PScm base
class.
Print()
method just calls $self->as_string()
and prints the result.
as_string()
method that returns the underlying scalar,
but PScm::Expr::String provides an override that wraps the
result in double quotes.as_string()
method that recursively calls as_string()
on its components and returns the result wrapped in braces.At the heart of the whole interpreter is the dynamic between
Eval()
which evaluates expressions, and Apply()
which applies operations to their arguments.
t/PScm.t
1 use strict; 2 use warnings; 3 use Test::More; 4 use lib './t/lib'; 5 use PScm::Test tests => 10; 6 7 BEGIN { use_ok('PScm') } 8 9 eval_ok('1', '1', 'numbers'); 10 eval_ok('+1', '1', 'explicit positive numbers'); 11 eval_ok('-1', '-1', 'negative numbers'); 12 eval_ok('"hello"', '"hello"', 'strings'); 13 eval_ok('(* 2 3 4)', '24', 'multiplication'); 14 eval_ok('(- 10 2 3)', '5', 'subtraction'); 15 eval_ok('(- 10)', '-10', 'negation'); 16 eval_ok('(if (* 0 1) 10 20)', '20', 'simple conditional'); 17 eval_ok(<<EOT, <<EOR, 'no overflow'); 18 (* 1234567890987654321 1234567890987654321) 19 EOT 20 1524157877457704723228166437789971041 21 EOR 22 23 # vim: ft=perl
t/lib/PScm/Test.pm
1 package PScm::Test; 2 use strict; 3 use warnings; 4 use FileHandle; 5 require Exporter; 6 7 our @ISA = qw(Exporter); 8 our @EXPORT = qw(eval_ok evaluate); 9 10 my $Test = Test::Builder->new; 11 12 sub import { 13 my ($self) = shift; 14 my $pack = caller; 15 $Test->exported_to($pack); 16 $Test->plan(@_); 17 18 $self->export_to_level(1, $self, 'eval_ok'); 19 $self->export_to_level(1, $self, 'evaluate'); 20 } 21 22 sub eval_ok { 23 my ($expr, $expected, $name) = @_; 24 my $result = evaluate($expr); 25 $result .= "\n" if $expected =~ /\n/; 26 $Test->is_eq($result, $expected, $name); 27 } 28 29 sub evaluate { 30 my ($expression) = @_; 31 32 my $fh = new FileHandle("> junk"); 33 $fh->print($expression); 34 $fh = new FileHandle('< junk'); 35 my $outfh = new FileHandle("> junk2"); 36 PScm::ReadEvalPrint($fh, $outfh); 37 $fh = 0; 38 $outfh = 0; 39 my $res = `cat junk2`; 40 chomp $res; 41 unlink('junk'); 42 unlink('junk2'); 43 44 # warn "# [$res]\n"; 45 return $res; 46 } 47 48 1;
The test module for our first version of the interpreter is in
Listing 8. The PScm::Test package
shown in Listing 9
provides an eval_ok()
sub which takes a
string expression, writes it out to a file, and calls ReadEvalPrint()
on it, with the output redirected to another file. It then reads
that output back in and compares it to its second argument12. The various simple
tests just exercise the system.
To allow users to play a little
more with the interpreter, there's a tiny interactive shell that
requires Term::ReadLine::Gnu and the libreadline
library. It's in
t/interactive
and can be run, without installing the interpreter,
by doing:
$ perl -Ilib ./t/interactive
from the root of any version of the distribution. It's short enough to show here in its entirety, in Listing 10.
t/interactive
1 use PScm; 2 3 package GetLine; 4 5 use Term::ReadLine; 6 7 sub new { 8 my ($class) = @_; 9 bless { 10 term => new Term::ReadLine('PScheme'), 11 }, $class; 12 } 13 14 sub getline { 15 my ($self) = @_; 16 $self->{term}->readline('> '); 17 } 18 19 package main; 20 21 my $in = new GetLine(); 22 23 ReadEvalPrint($in); 24 25 # vim: ft=perl
Full source code for this version of the interpreter is available athttp://billhails.net/Book/releases/PScm-0.0.0.tgz
but looks a lot prettier.
It could have just said
return $token unless $token->isa('PScm::Token::Open');
but I always think it's a bit rude to peep into the implementation like that, much better to ask it what it thinks it is, not forcibly extract its data type.
Lisp purists might raise an
eyebrow at this point, because Eval()
is supposed
to know what kind of form it is evaluating and decide whether or
not to evaluate the arguments. But this is an object-oriented
application, and it makes much more sense to leave that decision
to the objects that need to know.
This differs from a true Scheme implementation
where special boolean values #t
and #f
represent truth and falsehood, and everything else is
true. The reason for having an isTrue()
is to encapsulate
the chosen behaviour. If we wanted to change the meaning of truth,
we need only do so here.
We haven't seen anything yet that might, when evaluated, return a list for printing. That's for later.
especially one where the +
and /
keys don't work.
Ok, I should have used IO::String, so sue me.