It was mentioned in Section 1.2 that one of the great strengths of the Lisp family of languages is their ability to treat programs as data: to manipulate the same list structures that their expressions are composed of. So far we haven't seen any of that functionality implemented in our interpreter.
Those list structures are the ones constructed by the Reader, and the Reader can be considered a general purpose data input package: all it does is categorise and collect input into strings, numbers, symbols and lists. That's a very useful structure for organizing any kind of information, not just PScheme programs. The read-eval-print loop will, however, attempt to evaluate any such structure read in, so we need a way of stopping that.
quote
The appropriate form is called quote and is a PScm::SpecialForm. It takes a single argument and returns it unevaluated:
> (quote a) a > (quote (+ 1 2)) (+ 1 2)
The implementation is rather trivial:
Quote is used to turn off evaluation.
Since special forms don't have their arguments evaluated for them, all
that the Apply()
method in PScm::SpecialForm::Quote
need do is to return its first argument, still unevaluated.
Here's PScm::SpecialForm::Quote.
106 package PScm::SpecialForm::Quote; 107 108 use base qw(PScm::SpecialForm); 109 110 sub Apply { 111 my ($self, $form, $env) = @_; 112 return $form->first; 113 } 114 115 1;
list
Another useful operation is called list. It takes a list of arguments and constructs a new list from them. It is just a PScm::Primitive and so its arguments are evaluated:
> (list (- 8 1) "hello") (7 "hello")
It does nothing itself but return the list of its evaluated
arguments to the caller as a new PScm::Expr::List, so
it's also trivial to implement. To recap, all PScm::Primitive
classes share a common Apply()
method that evaluates the
arguments then calls the class-specific _apply()
method.
So all we have to do is to subclass PScm::Primitive to
PScm::Primitive::List and give that new subclass an
appropriate _apply()
method.
66 package PScm::Primitive::List; 67 68 use base qw(PScm::Primitive); 69 70 sub _apply { 71 my ($self, @args) = @_; 72 73 return new PScm::Expr::List(@args); 74 }
As has been said, it's trivial because it just returns its arguments as a new PScm::Expr::List.
car
and cdr
So we can create lists, but what can we do with them? We already
have two primitive internal operations on lists, namely the
PScm::Expr::List::first()
and rest()
methods.
They are something like the complement of the list
primitive
in that they take
apart a list into its components. Bowing to historical precedent
however, Scheme, and hence PScheme, doesn't call them “first” and
“rest”, instead they are called car and
cdr19.
Again their implementation is simple, we add a new
subclass of PScm::Primitive for each of them, and give
each new class an _apply()
method that calls the relevant
internal method. Here's PScm::Primitive::Car.
77 package PScm::Primitive::Car; 78 79 use base qw(PScm::Primitive); 80 81 sub _apply { 82 my ($self, $arg) = @_; 83 84 $self->_check_type($arg, 'PScm::Expr::List'); 85 return $arg->first; 86 }
It uses _check_type()
to verify that its argument is a list,
then calls its first()
method, returning the result.
Here's the equivalent PScm::Primitive::Cdr class.
89 package PScm::Primitive::Cdr; 90 91 use base qw(PScm::Primitive); 92 93 sub _apply { 94 my ($self, $arg) = @_; 95 96 $self->_check_type($arg, 'PScm::Expr::List'); 97 return $arg->rest; 98 }
cons
We're only missing one piece from our set of basic list operations
now, but before adding that it is necessary to explain and rectify
a significant
deviation that PScheme has so far made from other Lisp and Scheme
implementations. In our PScheme implementation lists have
been implemented as object wrappers around Perl lists. This had
the advantage that the Perl implementation was as simple as it could be. However
real Lisp systems implement lists as chains of what
are called cons cells, or more commonly
pairs. A cons cell, is a structure with
two components, both references to other data types.
For a true
list, the first component points at the current list element
and the second component points at the rest of the list. The first
component is called the car
and the second component the
cdr
, hence the eponymous functions that access those
components. So for example the lisp expression (foo ("bar"
10) baz)
Is not properly implemented as in
Figure 3.1,
but as shown in Figure 8.1.
(foo ("bar" 10) baz)
The unfilled cdr
pointers in the figure represent null
pointers and terminate the list structure.
This means that a true Lisp list is in fact a linked list. A
primary advantage of this is that the internal first()
and
rest()
(car
and cdr
) operations are
equally efficient: there is no need for rest()
to construct
a new list object, it just returns it's rest
component.
A second advantage is that cons cells are more
flexible than lists. A true list is a chain of cons cells linked
by their cdr component, ending in a cons cell with an empty cdr.
In general the cdr need not point to another cons cell, it could
equally well point to any other data type.
A cons cell is constructed with the primitive operator
cons.
Since we can now build structures that cannot be represented
in simple “(a b c)
” list notation, we need to extend that
notation to cope. The extension is thankfully very simple: if the
last component of a list is prepended by a period (separated by a space)
for example (a . b)
it is taken
to be the cdr
of the list.
This is called dot notation.
See Figure 8.2.
(a . b)
Dot notation will be supported for both input and output.
Dot notation is not limited to just dotted pairs, for example in Figure 8.3 you can see that more complex structures can also be represented.
(a b . c)
Dot notation is actually capable of representing any structure we can
envisage20.
In fact it is reasonable to think of the normal list
notation we have been using so far as merely a convenient shorthand
for dot notation. For example the list (a)
is the same
as the pair (a . ())
(because ()
is the empty
list.) Likewise the list (a b c)
can be represented
as (a . (b . (c . ())))
. Obviously this unwieldy notation
is to be avoided unless necessary, but you should at least be aware
of it.
Dot notation has a number of uses. Most importantly it
allows us to easily specify variable numbers
of arguments to our lambda expressions: if the formal arguments
in a lambda
expression are dotted, then that can be taken
to mean the dotted symbol is to be bound to a list of the remaining
actual arguments. For Example:
> (let ((test > (lambda (a b . c) > (list a b c)))) > (test 1 2 3 4 5)) (1 2 (3 4 5))
The a
and b
are bound to 1
and 2
as always, but the c
, because it occupies
the entirity of the rest of the formal argument list gets bound to the remaining
list of additional arguments.
Interestingly this also allows entirely arbitrary lists of arguments.
If you think about it the dotted pair notation ( . a)
can be
made perfectly legal for input, and is equivalent to the symbol
a
: the opening brace implies a list, but the first symbol
encountered is the cdr of a list that has not started to form yet,
so the result is just that symbol. Since we must accept lambda expressions
with such an argument declaration, we must also accept lambda expressions
with a single symbol instead of a list of arguments. For example we
could define our list
primitive in PScheme like:
> (let ((list (lambda args args))) > (list 1 2 3 4 5)) (1 2 3 4 5)
list
takes any number of arguments as a single
list args
and returns them.
Let's make the change to use that alternative implementation.
Since the PScm::Expr::List class hides its internal
structure and provides accessor methods, technically that should be the only
package that needs to change. However it is worthwhile making use
of the new list structure in other parts of the PScheme system.
The language is highly recursive, and this new linked list structure
lends itself to recursion much more naturally than a plain
perl @list
does.
To get us started, here's the new implementation of PScm::Expr::List:
59 package PScm::Expr::List; 60 use base qw(PScm::Expr); 61 62 sub new { 63 my ($class, @list) = @_; 64 65 $class = ref($class) || $class; 66 if (@list) { 67 my $first = shift @list; 68 $class->Cons($first, $class->new(@list)); 69 } else { 70 new PScm::Expr::List::Null(); 71 } 72 } 73 74 sub Cons { 75 my ($class, $first, $rest) = @_; 76 return PScm::Expr::List::Pair->new($first, $rest); 77 } 78 79 sub as_string { 80 my ($self) = @_; 81 return '(' . join(' ', $self->strings) . ')'; 82 }
The new()
method on Lines 62–72 is
a little more complicated than it was, because it has to recurse
on its argument list building a linked list. If the list is not
empty then on Line 68 it
calls an ancilliary method Cons()
(defined on Lines 74–77) to actually construct a new
PScm::Expr::List::Pair node (a cons cell). So the
PScm::Expr::List class is now in fact abstract. Although
it has a new()
method, that method actually returns instances
of either PScm::Expr::List::Pair or another new object,
PScm::Expr::List::Null, which represents the empty list.
if you remember the old implementation of new()
just wrapped its argument
list:
36 sub new { 37 my ($class, @list) = @_; 38 39 $class = ref($class) || $class; 40 bless [@list], $class; 41 }
The as_string()
method of PScm::Expr::List
has changed too. Rather than just mapping as_string()
over the components of the list, it calls a separate strings()
method that will return an array of strings, and joins and wraps that.
The main reason for that is to cope with dotted pair notation. We'll
see how strings()
works soon.
Much of the functionality that was in PScm::Expr::List has been moved out into PScm::Expr::List::Pair, shown next:
85 package PScm::Expr::List::Pair; 86 use base qw(PScm::Expr::List); 87 88 use constant { 89 FIRST => 0, 90 REST => 1, 91 }; 92 93 sub new { 94 my ($class, $first, $rest) = @_; 95 bless [$first, $rest], $class; 96 } 97 98 sub value { 99 my ($self) = @_; 100 my @value = ($self->[FIRST], $self->[REST]->value); 101 return @value; 102 } 103 104 sub first { $_[0][FIRST] } 105 106 sub rest { $_[0][REST] } 107 108 sub strings { 109 my ($self) = @_; 110 return ($self->[FIRST]->as_string, 111 $self->[REST]->strings); 112 } 113 114 sub Eval { 115 my ($self, $env) = @_; 116 my $op = $self->[FIRST]->Eval($env); 117 return $op->Apply($self->[REST], $env); 118 } 119 120 sub map_eval { 121 my ($self, $env) = @_; 122 return $self->Cons($self->[FIRST]->Eval($env), 123 $self->[REST]->map_eval($env)); 124 } 125 126 sub is_pair { 1 }
As a minor optimization,
PScm::Expr::List::Pair will store its first
and rest
components in an array ref rather than a hash.
For that reason it declares two constants FIRST
and
REST
to act as indexes into that structure.
PScm::Expr::List::Pair has its own new()
method
which we've already seen being called by Cons()
. It just collects
its two arguments into a new object.
The other methods in PScm::Expr::List::Pair are fairly straightforward.
value()
method on Lines 98–102
converts the linked list back into a Perl list21.
Because the structure is no longer necessarily a true list
we must supply an alternative value()
method in
PScm::Expr which just returns $self
(ignoring the dot notation). There is another value()
method in PScm::Expr::List::Null that returns
the empty (Perl) list and likewise terminates the recursion
of PScm::Expr::List::value()
.first()
and rest()
methods are simplified,
they are now just
accessors to their equivalent fields.as_string()
method from
PScm::Expr::List has to deal with dot notation,
so cannot simply map
an as_string()
over the list's
value()
. Instead it calls a helper method strings()
.strings()
, on
Lines 108–112,
returns a perl list of the string representation of
the first item on the list, plus the result of calling itself
on the rest of the list. There is an implementation of strings()
in PScm::Expr::List::Null that just returns the
empty (Perl) list:
141 sub strings { (); }and another at the root of the heirarchy in PScm::Expr which catches the situation where a type other than a list or null is the
cdr
of a list:
17 sub strings { 18 my ($self) = @_; 19 return ('.', $self->as_string); 20 }It returns a list of the string
'.'
plus the result
of calling as_string()
on itself. Since it knows it must be
terminating a list, it need not recurse.Eval()
method
on
Lines 114–118
is functionally
unchanged from the previous implementation, but it directly
accesses the FIRST
and REST
fields rather
than using the first()
and rest()
method calls
for a slight performance improvement.map_eval()
method on
Lines 120–124
will come in very handy later. It takes an environment as argument
and builds a copy of itself with each component replaced by the
result of evaluating that component in the argument environment.
Because, like strings()
it must deal with the possibility
that the structure is not a true list, an additional map_eval()
method is provided in the base PScm::Expr class that just
returns the result of calling Eval()
on $self
.is_pair()
method is defined to be
true in this class only. It is defined false by default in
PScm::Expr.Another part of our alternative implementation of lists is that new
PScm::Expr::List::Null class. It represents the PScheme empty
list, and also quite reasonably descends from the list class. It provides
a simple new()
method with no arguments, and overrides the
value()
and strings()
methods to just return an empty perl list.
129 package PScm::Expr::List::Null; 130 use base qw(PScm::Expr::List); 131 132 sub new { 133 my ($class) = @_; 134 135 $class = ref($class) || $class; 136 bless {}, $class; 137 } 138 139 sub value { (); } 140 141 sub strings { (); } 142 143 sub first { $_[0] } 144 145 sub rest { $_[0] } 146 147 sub is_null { 1 }
Interestingly, it also overrides first()
and rest()
to return $self
, so the car
or cdr
of the
empty list is the empty list, and it overrides Eval()
to just return
$self
too, so an empty list evaluates to itself22.
Back to our cons
function. Scheme implementations
have a cons
operation that takes two arguments and creates
a cons cell with its car referencing the first argument, and its
cdr referencing the second.
> (cons 10 (list 20 30)) (10 20 30)
Thus the car
and cdr
operations are the
precise complement of cons
: cons
constructs
a cell, and car
and cdr
take the cell apart.
> (car (cons 10 (list 20 30))) 10 > (cdr (cons 10 (list 20 30))) (20 30)
Provided the second argument to cons
is a list, the result will
also be a list, but there is no requirement for the second argument to be a
list. That makes cons
a second way to create dotted pairs,
other than inputting them directly:
> (quote (a . b)) (a . b) > (cons (quote a) (quote b)) (a . b) > (cons (quote a) (quote (b))) (a b)
cons
is implemented
in the normal way, by subclassing PScm::Primitive and
giving the new class, PScm::Primitive::Cons in this case,
an _apply()
method. Here's that method.
101 package PScm::Primitive::Cons; 102 103 use base qw(PScm::Primitive); 104 105 sub _apply { 106 my ($self, $car, $cdr) = @_; 107 108 return PScm::Expr::List->Cons($car, $cdr); 109 } 110 111 1;
It can be seen that all it does is to call
PScm::Expr::List's Cons()
method.
As I've already said, It will pay to make use of this new list
structure wherever possible throughout the PScheme implementation.
The first place we shall do so is in PScm::Primitive::Apply()
,
which can now make use of that new map_eval()
method:
7 sub Apply { 8 my ($self, $form, $env) = @_; 9 10 my @evaluated_args = $form->map_eval($env)->value(); 11 return $self->_apply(@evaluated_args); 12 }
The question arises as to why I'm then just calling value()
on the result and passing an ordinary perl list to the individual
primitives, after I just said that it was worthwhile passing around
the new linked list structures. It's just a personal choice, but I
feel that the individual primitives should present an
“abstraction layer” such that anything below that layer is
pure Perl and does not depend on the details of the PScheme
implementation above that layer. Anyway, that's how I see it.
Special forms, on the other hand, are very much part of the PScheme implementation and do make full use of the new linked lists.
First up is PScm::SpecialForm::Let. If you remember
let
and friends make use of a shared UnPack()
method which used to return a list of two references to perl
lists, one for the symbols and one for the values, plus the
body of the let
expression. Now it will
return PScheme lists instead, and those lists will be passed to
the various Extend*()
methods in the environment implementation,
which will also have to change.
anyway here's the new PScm::SpecialForm::Let::Apply()
:
12 sub Apply { 13 my ($self, $form, $env) = @_; 14 15 my ($symbols, $values, $body) = $self->UnPack($form); 16 17 return $body->Eval($env->Extend($symbols, $values)); 18 }
Only the variable names have changed: $symbols
and
$values
used to be called $ra_symbols
and
$ra_values
to indicate that they were references to arrays.
This is no longer the case.
The UnPack()
still just calls value()
on the $form
to get the bindings and the body, but now it makes use of an additional
unpack_bindings()
method to build the new PScheme lists:
20 sub UnPack { 21 my ($self, $form) = @_; 22 23 my ($bindings, $body) = $form->value; 24 my ($symbols, $values) = $self->unpack_bindings($bindings); 25 return ($symbols, $values, $body); 26 }
unpack_bindings()
itself is where the real work gets done:
28 sub unpack_bindings { 29 my ($self, $bindings) = @_; 30 if ($bindings->is_null) { 31 my $null = new PScm::Expr::List::Null(); 32 return ($null, $null); 33 } else { 34 my ($symbols, $values) = 35 $self->unpack_bindings($bindings->rest); 36 return ( 37 PScm::Expr::List->Cons($bindings->first->first, 38 $symbols), 39 PScm::Expr::List->Cons($bindings->first->rest->first, 40 $values) 41 ); 42 } 43 }
If it has reached the end of the bindings it creates a new null object and returns two of it. otherwise it calls itself on the rest of the bindings, collects the results, then prepends the symbol from the current binding onto the first list and the value from the current binding onto the second list. Finally it returns those two new lists. Essentially it recurses to the end of the bindings then builds a pair of lists on the way back up. This guarantees that the symbols and values are in the same order in the results as they were in the bindings.
Given the new UnPack()
method, the Apply()
methods for
PScm::SpecialForm::LetRec:
50 sub Apply { 51 my ($self, $form, $env) = @_; 52 53 my ($symbols, $values, $body) = $self->UnPack($form); 54 55 return $body->Eval( 56 $env->ExtendRecursively($symbols, $values)); 57 }
and PScm::SpecialForm::LetStar:
64 sub Apply { 65 my ($self, $form, $env) = @_; 66 67 my ($symbols, $values, $body) = $self->UnPack($form); 68 69 return $body->Eval( 70 $env->ExtendIteratively($symbols, $values)); 71 }
are similarily unchanged except for variable renaming.
I've taken the opportunity to make a small but significant change to PScm::SpecialForm::If:
78 sub Apply { 79 my ($self, $form, $env) = @_; 80 81 my $condition = $form->first; 82 my $true_branch = $form->rest->first; 83 my $false_branch = $form->rest->rest->first; 84 85 if ($condition->Eval($env)->isTrue) { 86 return $true_branch->Eval($env); 87 } else { 88 return $false_branch->Eval($env); 89 } 90 }
Because it extracts the condition, true and false branches from the
$form
by using combinations of first()
and rest()
rather than just using value()
, and because the first()
and rest()
of the empty list is the empty list, and because
the empty list evaluates to itself, our new if
no longer
requires a third argument. If the test fails and a false branch is
not supplied the result will be the empty list.
PScm::SpecialForm::Lambda::Apply()
is unchanged, but the
closure that it constructs will make use of the new lists when it
interacts with the changed environment implementation.
The only remaining special form is the new PScm::SpecialForm::Quote, but we've seen that already.
In fact there is very little change in this package.
The differences are that
the specific PScm::Closure::Function::Apply()
method
(which applies a lambda
closure to its arguments)
now uses map_eval()
to construct a PScheme list of evaluated
arguments and pass that
to the shared _apply()
rather than a perl list:
39 sub Apply { 40 my ($self, $form, $env) = @_; 41 42 my $evaluated_args = $form->map_eval($env); 43 return $self->_apply($evaluated_args); 44 }
The shared _apply()
method
recieves a PScheme list of actual arguments
rather than a reference to an array
and it passes that, plus its PScheme list of formal arguments
directly to the environment's ExtendUnevaluated()
method:
17 sub _apply { 18 my ($self, $args) = @_; 19 20 my $extended_env = 21 $self->{env}->ExtendUnevaluated($self->{args}, $args); 22 return $self->{body}->Eval($extended_env); 23 }
The various Extend*()
methods of PScm::Env
now take PScm::Expr::List objects as arguments
rather than perl listrefs. This makes them the right place to
implement the new variable arguments functionality of lambda
.
First of all Extend()
itself:
14 sub Extend { 15 my ($self, $symbols, $values) = @_; 16 17 return $self->ExtendUnevaluated($symbols, 18 $values->map_eval($self)); 19 }
Variable names have changed to reflect the fact that they are
no longer references to arrays, and Extend()
uses
map_eval()
to evaluate the list of values before
passing them to ExtendUnevaluated()
.
ExtendUnevaluated()
is similarily changed:
21 sub ExtendUnevaluated { 22 my ($self, $symbols, $values) = @_; 23 24 my %bindings; 25 $self->_populate_bindings(\%bindings, $symbols, $values); 26 my $newenv = $self->new(%bindings); 27 $newenv->{parent} = $self; 28 return $newenv; 29 }
It uses a new private _populate_bindings()
method
to populate a hash of bindings from the $symbols
and $values
lists. After that it does what it
always did, creating a new PScm::Env and
setting its parent
field to $self
before returning it.
The private _populate_bindings()
method actually
does the “parsing” of the argument list of symbols
and their binding to equivalent values.
31 sub _populate_bindings { 32 my ($self, $bindings, $symbols, $values) = @_; 33 if ($symbols->is_null) { 34 die "too many arguments\n" unless $values->is_null; 35 } elsif ($symbols->is_pair) { 36 if ($values->is_pair) { 37 my $symbol = $symbols->first; 38 if ($symbol->is_symbol) { 39 $bindings->{$symbol->value} = $values->first; 40 $self->_populate_bindings($bindings, 41 $symbols->rest, 42 $values->rest); 43 } else { 44 die "bad formal arguments[1]:", 45 $symbol->as_string(), "\n"; 46 } 47 } else { 48 die "not enough arguments\n"; 49 } 50 } elsif ($symbols->is_symbol) { 51 $bindings->{$symbols->value} = $values; 52 } else { 53 die "bad formal arguments[2]:", 54 $symbols->as_string(), "\n"; 55 } 56 }
On Line 33 it checks to see if it has reached the end of the list of symbols. If it has, then it throws an error if it has not also reached the end of the list of values (too many arguments).
If $symbols
is not null, then on
Line 35
it checks to see if it is a pair. If it is, then it gets the
current symbol, checks that it is a symbol, binds
it to the equivalent value, and recurses on the rest of
the two lists.
If $symbols
is not a pair, then on
Line 38
it checks to see if it is itself a symbol. This would
correspond to either a single args
in a
lambda
statement like (lambda args ...)
, or a dotted
pair terminating a list of arguments. In either case if
$symbols
is a symbol, it binds it to the entirity
of the rest of the $values
list and terminates the
recursion.
If $symbols
is none of the empty list, a pair or a
symbol then something is seriously wrong and _populate_bindings()
throws an exception.
Next up is ExtendRecursively()
. It is unchanged except
that, as in other cases, its arguments have been renamed because they are
no longer array references:
58 sub ExtendRecursively { 59 my ($self, $symbols, $values) = @_; 60 61 my $newenv = $self->ExtendUnevaluated($symbols, $values); 62 $newenv->_eval_values(); 63 return $newenv; 64 }
Finally, ExtendIteratively()
benefits from the new lists
too. It returns $self
if the list of symbols is empty, otherwise
it calls Extend()
on the first of each list then calls itself
on the extended result with the rest of the list:
66 sub ExtendIteratively { 67 my ($self, $symbols, $values) = @_; 68 69 if ($symbols->is_null) { 70 return $self; 71 } else { 72 return $self->Extend($symbols->first, $values->first) 73 ->ExtendIteratively($symbols->rest, 74 $values->rest); 75 } 76 }
Note that
ExtendIteratively()
is only used by let*
, and
so does not need to deal with non-lists.
The changes in thr Reader support the input of dot notation,
and they make direct use of PScm::Expr::List::Cons()
to facilitate
this. First of all Read()
itself has been somewhat simplified:
17 sub Read { 18 my ($self) = @_; 19 20 my $token = $self->_next_token(); 21 return undef unless defined $token; 22 23 if ($token->is_open_token) { 24 return $self->read_list(); 25 } else { 26 return $token; 27 } 28 }
It may look very different but all that has really happened
is that the old looping code which collects a list
has been moved out into a separate
method, and that method, read_list()
,
is now recursive instead of iterative:
30 sub read_list { 31 my ($self) = @_; 32 my $token = $self->read_list_element(); 33 if ($token->is_close_token) { 34 return new PScm::Expr::List::Null(); 35 } elsif ($token->is_dot_token) { 36 $token = $self->read_list_element(); 37 die "syntax error" unless $token->is_expr; 38 my $close = $self->read_list_element(); 39 die "syntax error" unless $close->is_close_token; 40 return $token; 41 } else { 42 return PScm::Expr::List->Cons($token, $self->read_list()); 43 } 44 }
read_list()
collects its components using another new
method read_list_element()
. On
Line 33
if it detects a closing brace it returns the empty list.
If the token is not a closing brace, then on
Line 35
if the token is a dot, it reads the next element, checks that
it is an expression, checks that
the element after that is a closing brace, and returns the
element just read as the entire list. In the final case,
if the token is neither a closing brace or a dot, it uses
Cons()
to construct a list with the current token as the
first element and the result of calling read_list()
as the rest. Hopefully you can convince yourself that this
will deal correctly with both ordinary lists and dotted pair
notation.
read_list_element()
just centralises the otherwise
tedious and repetitive checking for EOF which is a syntax error
(unterminated list) while collecting list elements:
46 sub read_list_element { 47 my ($self) = @_; 48 my $token = $self->Read(); 49 die "unexpected EOF" 50 if !defined $token; 51 return $token; 52 }
Finally _next_token()
has an extra clause to detect a standalone
period (dot) and if it does it returns an instance of a new
token class PScm::Token::Dot:
54 sub _next_token { 55 my ($self) = @_; 56 57 while (!$self->{Line}) { 58 $self->{Line} = $self->{FileHandle}->getline(); 59 return undef unless defined $self->{Line}; 60 $self->{Line} =~ s/^\s+//s; 61 } 62 63 for ($self->{Line}) { 64 s/^\(\s*// && return PScm::Token::Open->new(); 65 s/^\)\s*// && return PScm::Token::Close->new(); 66 s/^\.\s*// && return PScm::Token::Dot->new(); 67 s/^([-+]?\d+)\s*// 68 && return PScm::Expr::Number->new($1); 69 s/^"((?:(?:\\.)|([^"]))*)"\s*// && do { 70 my $string = $1; 71 $string =~ s/\\//g; 72 return PScm::Expr::String->new($string); 73 }; 74 s/^([^\s\(\)]+)\s*// 75 && return PScm::Expr::Symbol->new($1); 76 } 77 die "can't parse: $self->{Line}"; 78 }
The new class is in PScm/Token.pm
alongside
the others:
27 package PScm::Token::Dot; 28 29 use base qw(PScm::Token); 30 31 sub is_dot_token { 1 } 32 33 1;
There is a default is_dot_token()
method in
PScm::Token that returns false.
In interpreter 0.0.5 five related operations for creating and
manipulating list structures have been added. We'll put those to
good use in the next version of the interpreter when we look at
macros.
In the process of implementing one of those operations, cons
,
the basic list implementation was changed to be closer to a “standard”
scheme implementation, and out of that we won the ability to construct
dotted pairs, and from that we got variable arguments to lambda
expressions.
Just for the sake of completeness here's the changes to
our top-level PScm::ReadEvalPrint()
method, where we add
the new bindings for these functions in the initial environment.
31 sub ReadEvalPrint { 32 my ($infh, $outfh) = @_; 33 34 $outfh ||= new FileHandle(">-"); 35 my $reader = new PScm::Read($infh); 36 while (defined(my $expr = $reader->Read)) { 37 my $result = $expr->Eval( 38 new PScm::Env( 39 let => new PScm::SpecialForm::Let(), 40 '*' => new PScm::Primitive::Multiply(), 41 '-' => new PScm::Primitive::Subtract(), 42 if => new PScm::SpecialForm::If(), 43 lambda => new PScm::SpecialForm::Lambda(), 44 list => new PScm::Primitive::List(), 45 car => new PScm::Primitive::Car(), 46 cdr => new PScm::Primitive::Cdr(), 47 cons => new PScm::Primitive::Cons(), 48 letrec => new PScm::SpecialForm::LetRec(), 49 'let*' => new PScm::SpecialForm::LetStar(), 50 quote => new PScm::SpecialForm::Quote(), 51 ) 52 ); 53 $result->Print($outfh); 54 } 55 }
t/PScm_List.t
1 use strict; 2 use warnings; 3 use Test::More; 4 use lib 't/lib'; 5 use PScm::Test tests => 9; 6 7 BEGIN { use_ok('PScm') } 8 9 eval_ok(<<EOF, '(1 2 3)', 'list primitive'); 10 (let ((a 1) 11 (b 2) 12 (c 3)) 13 (list a b c)) 14 EOF 15 16 eval_ok(<<EOF, '1', 'car primitive'); 17 (let ((a (list 1 2 3))) 18 (car a)) 19 EOF 20 21 eval_ok(<<EOF, '(2 3)', 'cdr primitive'); 22 (let ((a (list 1 2 3))) 23 (cdr a)) 24 EOF 25 26 eval_ok(<<EOF, '(1 2 3)', 'cons primitive'); 27 (cons 1 (list 2 3)) 28 EOF 29 30 eval_ok('(quote (list 1 2 3))', '(list 1 2 3)', 31 'quote special form'); 32 33 eval_ok('(quote (1 2 . 3))', '(1 2 . 3)', 'dot in'); 34 35 eval_ok('(quote (1 2 . (3)))', '(1 2 3)', 'dot out'); 36 37 eval_ok('(quote (1 . (2 . (3 .()))))', 38 '(1 2 3)', 'complex dot out'); 39 40 # vim: ft=perl
Tests for version 0.0.5 of the interpreter are in t/PScm_List.t
and t/PScm_dot.t
,
which you can see in Listing 16 and Listing 17.
The tests in t/PScm_List.t
exercise our new list functions.
The first test shows that list
evaluates its arguments
then returns a list of them.
The second test proves that the car
of the list (1 2 3)
is the value 1
.
The third test proves that the cdr
of the list (1 2 3)
is the list (2 3)
.
The fourth test proves that (cons 1 (list 2 3))
is
(1 2 3)
.
The fifth test proves that quote
protects its argument
from evaluation: (quote (list 1 2 3))
is (list 1 2 3)
.
Lastly, three tests verify that dot notation on input and output
behaves as expected.
t/PScm_Dot.t
1 use strict; 2 use warnings; 3 use Test::More; 4 use lib './t/lib'; 5 use PScm::Test tests => 3; 6 use FileHandle; 7 8 BEGIN { use_ok('PScm') } 9 10 eval_ok(<<EOF, <<EOR, 'lambda args'); 11 (let ((test (lambda args args))) 12 (test 1 2 3)) 13 EOF 14 (1 2 3) 15 EOR 16 17 eval_ok(<<EOF, <<EOR, 'lambda args'); 18 (let ((test (lambda (a b . rest) 19 (list a b rest)))) 20 (test 1 2 3 4)) 21 EOF 22 (1 2 (3 4)) 23 EOR 24 25 # vim: ft=perl
The tests in Listing 17 verify the new variable arguments to lambda expressions, much as our previous examples demonstrated.
Full source code for this version of the interpreter is available athttp://billhails.net/Book/releases/PScm-0.0.5.tgz
Obligatory footnote. CAR stands for “the Contents of the Address part of the Register” and CDR stands for “the Contents of the Decrement part of the Register.” This is a reference to the original hardware on which the first Lisp system was implemented, and means nothing now but the names have stuck.
Well, actually it is possible to imagine circular
structures that would defeat any notation. A true Scheme will even allow
the creation of such circular lists, with such dubious expressions as
(set-cdr! x x)
, but that's a world of pain
that we will stay well away from.
The reason for
the temporary @value
variable is not redundant clarification
of the
code, it is a workaround for a rather obscure piece of Perl behaviour.
If the code had simply said:
return ($self->[FIRST], $self->[REST]->value);
then the scalar context imposed by the isTrue()
method in
PScm::Expr would cause Perl to treat the comma as
the comma operator,
throwing away the left hand side and returning only the right, recursively,
so all lists would end up being treated as false.
This interesting trick of having an object to represent the absence of another object is a well-known design pattern called the Null Object Pattern.