List Processing

It was mentioned in Section 1.2 that one of the great strengths of the Lisp family of languages is their ability to treat programs as data: to manipulate the same list structures that their expressions are composed of. So far we haven't seen any of that functionality implemented in our interpreter.

Those list structures are the ones constructed by the Reader, and the Reader can be considered a general purpose data input package: all it does is categorise and collect input into strings, numbers, symbols and lists. That's a very useful structure for organizing any kind of information, not just PScheme programs. The read-eval-print loop will, however, attempt to evaluate any such structure read in, so we need a way of stopping that.

8.1. `quote`

The appropriate form is called quote and is a PScm::SpecialForm. It takes a single argument and returns it unevaluated:

> (quote a)
a
> (quote (+ 1 2))
(+ 1 2)
[download]

The implementation is rather trivial: Quote is used to turn off evaluation. Since special forms don't have their arguments evaluated for them, all that the Apply() method in PScm::SpecialForm::Quote need do is to return its first argument, still unevaluated.

Here's PScm::SpecialForm::Quote.

106 package PScm::SpecialForm::Quote;
107 
108 use base qw(PScm::SpecialForm);
109 
110 sub Apply {
111     my ($self, $form, $env) = @_;
112     return $form->first;
113 }
114 
115 1;

8.2. `list`

Another useful operation is called list. It takes a list of arguments and constructs a new list from them. It is just a PScm::Primitive and so its arguments are evaluated:

> (list (- 8 1) "hello")
(7 "hello")
[download]

It does nothing itself but return the list of its evaluated arguments to the caller as a new PScm::Expr::List, so it's also trivial to implement. To recap, all PScm::Primitive classes share a common Apply() method that evaluates the arguments then calls the class-specific _apply() method. So all we have to do is to subclass PScm::Primitive to PScm::Primitive::List and give that new subclass an appropriate _apply() method.

 66 package PScm::Primitive::List;
 67 
 68 use base qw(PScm::Primitive);
 69 
 70 sub _apply {
 71     my ($self, @args) = @_;
 72 
 73     return new PScm::Expr::List(@args);
 74 }

As has been said, it's trivial because it just returns its arguments as a new PScm::Expr::List.

8.3. `car` and `cdr`

So we can create lists, but what can we do with them? We already have two primitive internal operations on lists, namely the PScm::Expr::List::first() and rest() methods. They are something like the complement of the list primitive in that they take apart a list into its components. Bowing to historical precedent however, Scheme, and hence PScheme, doesn't call them “first” and “rest”, instead they are called car and cdr¹⁹.

Again their implementation is simple, we add a new subclass of PScm::Primitive for each of them, and give each new class an _apply() method that calls the relevant internal method. Here's PScm::Primitive::Car.

 77 package PScm::Primitive::Car;
 78 
 79 use base qw(PScm::Primitive);
 80 
 81 sub _apply {
 82     my ($self, $arg) = @_;
 83 
 84     $self->_check_type($arg, 'PScm::Expr::List');
 85     return $arg->first;
 86 }

It uses _check_type() to verify that its argument is a list, then calls its first() method, returning the result.

Here's the equivalent PScm::Primitive::Cdr class.

 89 package PScm::Primitive::Cdr;
 90 
 91 use base qw(PScm::Primitive);
 92 
 93 sub _apply {
 94     my ($self, $arg) = @_;
 95 
 96     $self->_check_type($arg, 'PScm::Expr::List');
 97     return $arg->rest;
 98 }

8.4. `cons`

We're only missing one piece from our set of basic list operations now, but before adding that it is necessary to explain and rectify a significant deviation that PScheme has so far made from other Lisp and Scheme implementations. In our PScheme implementation lists have been implemented as object wrappers around Perl lists. This had the advantage that the Perl implementation was as simple as it could be. However real Lisp systems implement lists as chains of what are called cons cells, or more commonly pairs. A cons cell, is a structure with two components, both references to other data types. For a true list, the first component points at the current list element and the second component points at the rest of the list. The first component is called the car and the second component the cdr, hence the eponymous functions that access those components. So for example the lisp expression (foo ("bar" 10) baz) Is not properly implemented as in Figure 3.1, but as shown in Figure 8.1.

Figure 8.1. Cons Cell Representation of a nested list (foo ("bar" 10) baz)

The unfilled cdr pointers in the figure represent null pointers and terminate the list structure.

This means that a true Lisp list is in fact a linked list. A primary advantage of this is that the internal first() and rest() (car and cdr) operations are equally efficient: there is no need for rest() to construct a new list object, it just returns it's rest component. A second advantage is that cons cells are more flexible than lists. A true list is a chain of cons cells linked by their cdr component, ending in a cons cell with an empty cdr. In general the cdr need not point to another cons cell, it could equally well point to any other data type. A cons cell is constructed with the primitive operator cons.

8.4.1. Dot Notation

Since we can now build structures that cannot be represented in simple “(a b c)” list notation, we need to extend that notation to cope. The extension is thankfully very simple: if the last component of a list is prepended by a period (separated by a space) for example (a . b) it is taken to be the cdr of the list. This is called dot notation. See Figure 8.2.

Figure 8.2. The pair (a . b)

Dot notation will be supported for both input and output.

Dot notation is not limited to just dotted pairs, for example in Figure 8.3 you can see that more complex structures can also be represented.

Figure 8.3. The structure (a b . c)

Dot notation is actually capable of representing any structure we can envisage²⁰. In fact it is reasonable to think of the normal list notation we have been using so far as merely a convenient shorthand for dot notation. For example the list (a) is the same as the pair (a . ()) (because () is the empty list.) Likewise the list (a b c) can be represented as (a . (b . (c . ()))). Obviously this unwieldy notation is to be avoided unless necessary, but you should at least be aware of it.

Dot notation has a number of uses. Most importantly it allows us to easily specify variable numbers of arguments to our lambda expressions: if the formal arguments in a lambda expression are dotted, then that can be taken to mean the dotted symbol is to be bound to a list of the remaining actual arguments. For Example:

> (let ((test
>         (lambda (a b . c)
>           (list a b c))))
>   (test 1 2 3 4 5))
(1 2 (3 4 5))
[download]

The a and b are bound to 1 and 2 as always, but the c, because it occupies the entirity of the rest of the formal argument list gets bound to the remaining list of additional arguments.

Interestingly this also allows entirely arbitrary lists of arguments. If you think about it the dotted pair notation ( . a) can be made perfectly legal for input, and is equivalent to the symbol a: the opening brace implies a list, but the first symbol encountered is the cdr of a list that has not started to form yet, so the result is just that symbol. Since we must accept lambda expressions with such an argument declaration, we must also accept lambda expressions with a single symbol instead of a list of arguments. For example we could define our list primitive in PScheme like:

> (let ((list (lambda args args)))
>   (list 1 2 3 4 5))
(1 2 3 4 5)
[download]

list takes any number of arguments as a single list args and returns them.

8.5. Implementation

Let's make the change to use that alternative implementation. Since the PScm::Expr::List class hides its internal structure and provides accessor methods, technically that should be the only package that needs to change. However it is worthwhile making use of the new list structure in other parts of the PScheme system. The language is highly recursive, and this new linked list structure lends itself to recursion much more naturally than a plain perl @list does.

8.5.1. Changes to Expressions

To get us started, here's the new implementation of PScm::Expr::List:

 59 package PScm::Expr::List;
 60 use base qw(PScm::Expr);
 61 
 62 sub new {
 63     my ($class, @list) = @_;
 64 
 65     $class = ref($class) || $class;
 66     if (@list) {
 67         my $first = shift @list;
 68         $class->Cons($first, $class->new(@list));
 69     } else {
 70         new PScm::Expr::List::Null();
 71     }
 72 }
 73 
 74 sub Cons {
 75     my ($class, $first, $rest) = @_;
 76     return PScm::Expr::List::Pair->new($first, $rest);
 77 }
 78 
 79 sub as_string {
 80     my ($self) = @_;
 81     return '(' . join(' ', $self->strings) . ')';
 82 }

The new() method on Lines 62–72 is a little more complicated than it was, because it has to recurse on its argument list building a linked list. If the list is not empty then on Line 68 it calls an ancilliary method Cons() (defined on Lines 74–77) to actually construct a new PScm::Expr::List::Pair node (a cons cell). So the PScm::Expr::List class is now in fact abstract. Although it has a new() method, that method actually returns instances of either PScm::Expr::List::Pair or another new object, PScm::Expr::List::Null, which represents the empty list.

if you remember the old implementation of new() just wrapped its argument list:

 36 sub new {
 37     my ($class, @list) = @_;
 38 
 39     $class = ref($class) || $class;
 40     bless [@list], $class;
 41 }

The as_string() method of PScm::Expr::List has changed too. Rather than just mapping as_string() over the components of the list, it calls a separate strings() method that will return an array of strings, and joins and wraps that. The main reason for that is to cope with dotted pair notation. We'll see how strings() works soon.

Much of the functionality that was in PScm::Expr::List has been moved out into PScm::Expr::List::Pair, shown next:

 85 package PScm::Expr::List::Pair;
 86 use base qw(PScm::Expr::List);
 87 
 88 use constant {
 89     FIRST => 0,
 90     REST => 1,
 91 };
 92 
 93 sub new {
 94     my ($class, $first, $rest) = @_;
 95     bless [$first, $rest], $class;
 96 }
 97 
 98 sub value {
 99     my ($self) = @_;
100     my @value = ($self->[FIRST], $self->[REST]->value);
101     return @value;
102 }
103 
104 sub first { $_[0][FIRST] }
105 
106 sub rest { $_[0][REST] }
107 
108 sub strings {
109     my ($self) = @_;
110     return ($self->[FIRST]->as_string,
111             $self->[REST]->strings);
112 }
113 
114 sub Eval {
115     my ($self, $env) = @_;
116     my $op = $self->[FIRST]->Eval($env);
117     return $op->Apply($self->[REST], $env);
118 }
119 
120 sub map_eval {
121     my ($self, $env) = @_;
122     return $self->Cons($self->[FIRST]->Eval($env),
123                        $self->[REST]->map_eval($env));
124 }
125 
126 sub is_pair { 1 }

As a minor optimization, PScm::Expr::List::Pair will store its first and rest components in an array ref rather than a hash. For that reason it declares two constants FIRST and REST to act as indexes into that structure. PScm::Expr::List::Pair has its own new() method which we've already seen being called by Cons(). It just collects its two arguments into a new object.

The other methods in PScm::Expr::List::Pair are fairly straightforward.

The value() method on Lines 98–102 converts the linked list back into a Perl list²¹. Because the structure is no longer necessarily a true list we must supply an alternative value() method in PScm::Expr which just returns $self (ignoring the dot notation). There is another value() method in PScm::Expr::List::Null that returns the empty (Perl) list and likewise terminates the recursion of PScm::Expr::List::value().
The first() and rest() methods are simplified, they are now just accessors to their equivalent fields.
As mentioned above, the as_string() method from PScm::Expr::List has to deal with dot notation, so cannot simply map an as_string() over the list's value(). Instead it calls a helper method strings().
strings(), on Lines 108–112, returns a perl list of the string representation of the first item on the list, plus the result of calling itself on the rest of the list. There is an implementation of strings() in PScm::Expr::List::Null that just returns the empty (Perl) list:
```
141 sub strings { (); }
```
and another at the root of the heirarchy in PScm::Expr which catches the situation where a type other than a list or null is the cdr of a list:
```
 17 sub strings {
 18     my ($self) = @_;
 19     return ('.', $self->as_string);
 20 }
```
It returns a list of the string '.' plus the result of calling as_string() on itself. Since it knows it must be terminating a list, it need not recurse.
Back to PScm::Expr::List::Pair. The Eval() method on Lines 114–118 is functionally unchanged from the previous implementation, but it directly accesses the FIRST and REST fields rather than using the first() and rest() method calls for a slight performance improvement.
A new map_eval() method on Lines 120–124 will come in very handy later. It takes an environment as argument and builds a copy of itself with each component replaced by the result of evaluating that component in the argument environment. Because, like strings() it must deal with the possibility that the structure is not a true list, an additional map_eval() method is provided in the base PScm::Expr class that just returns the result of calling Eval() on $self.
Finally, an identifying is_pair() method is defined to be true in this class only. It is defined false by default in PScm::Expr.

Another part of our alternative implementation of lists is that new PScm::Expr::List::Null class. It represents the PScheme empty list, and also quite reasonably descends from the list class. It provides a simple new() method with no arguments, and overrides the value() and strings() methods to just return an empty perl list.

129 package PScm::Expr::List::Null;
130 use base qw(PScm::Expr::List);
131 
132 sub new {
133     my ($class) = @_;
134 
135     $class = ref($class) || $class;
136     bless {}, $class;
137 }
138 
139 sub value { (); }
140 
141 sub strings { (); }
142 
143 sub first { $_[0] }
144 
145 sub rest { $_[0] }
146 
147 sub is_null { 1 }

Interestingly, it also overrides first() and rest() to return $self, so the car or cdr of the empty list is the empty list, and it overrides Eval() to just return $self too, so an empty list evaluates to itself²².

Back to our cons function. Scheme implementations have a cons operation that takes two arguments and creates a cons cell with its car referencing the first argument, and its cdr referencing the second.

> (cons 10 (list 20 30))
(10 20 30)
[download]

Thus the car and cdr operations are the precise complement of cons: cons constructs a cell, and car and cdr take the cell apart.

> (car (cons 10 (list 20 30)))
10
> (cdr (cons 10 (list 20 30)))
(20 30)
[download]

Provided the second argument to cons is a list, the result will also be a list, but there is no requirement for the second argument to be a list. That makes cons a second way to create dotted pairs, other than inputting them directly:

> (quote (a . b))
(a . b)
> (cons (quote a) (quote b))
(a . b)
> (cons (quote a) (quote (b)))
(a b)
[download]

cons is implemented in the normal way, by subclassing PScm::Primitive and giving the new class, PScm::Primitive::Cons in this case, an _apply() method. Here's that method.

101 package PScm::Primitive::Cons;
102 
103 use base qw(PScm::Primitive);
104 
105 sub _apply {
106     my ($self, $car, $cdr) = @_;
107 
108     return PScm::Expr::List->Cons($car, $cdr);
109 }
110 
111 1;

It can be seen that all it does is to call PScm::Expr::List's Cons() method.

8.5.2. Changes to Primitives and Special Forms

As I've already said, It will pay to make use of this new list structure wherever possible throughout the PScheme implementation. The first place we shall do so is in PScm::Primitive::Apply(), which can now make use of that new map_eval() method:

  7 sub Apply {
  8     my ($self, $form, $env) = @_;
  9 
 10     my @evaluated_args = $form->map_eval($env)->value();
 11     return $self->_apply(@evaluated_args);
 12 }

The question arises as to why I'm then just calling value() on the result and passing an ordinary perl list to the individual primitives, after I just said that it was worthwhile passing around the new linked list structures. It's just a personal choice, but I feel that the individual primitives should present an “abstraction layer” such that anything below that layer is pure Perl and does not depend on the details of the PScheme implementation above that layer. Anyway, that's how I see it.

Special forms, on the other hand, are very much part of the PScheme implementation and do make full use of the new linked lists.

First up is PScm::SpecialForm::Let. If you remember let and friends make use of a shared UnPack() method which used to return a list of two references to perl lists, one for the symbols and one for the values, plus the body of the let expression. Now it will return PScheme lists instead, and those lists will be passed to the various Extend*() methods in the environment implementation, which will also have to change.

anyway here's the new PScm::SpecialForm::Let::Apply():

 12 sub Apply {
 13     my ($self, $form, $env) = @_;
 14 
 15     my ($symbols, $values, $body) = $self->UnPack($form);
 16 
 17     return $body->Eval($env->Extend($symbols, $values));
 18 }

Only the variable names have changed: $symbols and $values used to be called $ra_symbols and $ra_values to indicate that they were references to arrays. This is no longer the case.

The UnPack() still just calls value() on the $form to get the bindings and the body, but now it makes use of an additional unpack_bindings() method to build the new PScheme lists:

 20 sub UnPack {
 21     my ($self, $form) = @_;
 22 
 23     my ($bindings, $body) = $form->value;
 24     my ($symbols, $values) = $self->unpack_bindings($bindings);
 25     return ($symbols, $values, $body);
 26 }

unpack_bindings() itself is where the real work gets done:

 28 sub unpack_bindings {
 29     my ($self, $bindings) = @_;
 30     if ($bindings->is_null) {
 31         my $null = new PScm::Expr::List::Null();
 32         return ($null, $null);
 33     } else {
 34         my ($symbols, $values) =
 35             $self->unpack_bindings($bindings->rest);
 36         return (
 37             PScm::Expr::List->Cons($bindings->first->first,
 38                                    $symbols),
 39             PScm::Expr::List->Cons($bindings->first->rest->first,
 40                                    $values)
 41         );
 42     }
 43 }

If it has reached the end of the bindings it creates a new null object and returns two of it. otherwise it calls itself on the rest of the bindings, collects the results, then prepends the symbol from the current binding onto the first list and the value from the current binding onto the second list. Finally it returns those two new lists. Essentially it recurses to the end of the bindings then builds a pair of lists on the way back up. This guarantees that the symbols and values are in the same order in the results as they were in the bindings.

Given the new UnPack() method, the Apply() methods for PScm::SpecialForm::LetRec:

 50 sub Apply {
 51     my ($self, $form, $env) = @_;
 52 
 53     my ($symbols, $values, $body) = $self->UnPack($form);
 54 
 55     return $body->Eval(
 56         $env->ExtendRecursively($symbols, $values));
 57 }

and PScm::SpecialForm::LetStar:

 64 sub Apply {
 65     my ($self, $form, $env) = @_;
 66 
 67     my ($symbols, $values, $body) = $self->UnPack($form);
 68 
 69     return $body->Eval(
 70         $env->ExtendIteratively($symbols, $values));
 71 }

are similarily unchanged except for variable renaming.

I've taken the opportunity to make a small but significant change to PScm::SpecialForm::If:

 78 sub Apply {
 79     my ($self, $form, $env) = @_;
 80 
 81     my $condition = $form->first;
 82     my $true_branch = $form->rest->first;
 83     my $false_branch = $form->rest->rest->first;
 84 
 85     if ($condition->Eval($env)->isTrue) {
 86         return $true_branch->Eval($env);
 87     } else {
 88         return $false_branch->Eval($env);
 89     }
 90 }

Because it extracts the condition, true and false branches from the $form by using combinations of first() and rest() rather than just using value(), and because the first() and rest() of the empty list is the empty list, and because the empty list evaluates to itself, our new if no longer requires a third argument. If the test fails and a false branch is not supplied the result will be the empty list.

PScm::SpecialForm::Lambda::Apply() is unchanged, but the closure that it constructs will make use of the new lists when it interacts with the changed environment implementation.

The only remaining special form is the new PScm::SpecialForm::Quote, but we've seen that already.

8.5.3. Changes to Closures

In fact there is very little change in this package. The differences are that the specific PScm::Closure::Function::Apply() method (which applies a lambda closure to its arguments) now uses map_eval() to construct a PScheme list of evaluated arguments and pass that to the shared _apply() rather than a perl list:

 39 sub Apply {
 40     my ($self, $form, $env) = @_;
 41 
 42     my $evaluated_args = $form->map_eval($env);
 43     return $self->_apply($evaluated_args);
 44 }

The shared _apply() method recieves a PScheme list of actual arguments rather than a reference to an array and it passes that, plus its PScheme list of formal arguments directly to the environment's ExtendUnevaluated() method:

 17 sub _apply {
 18     my ($self, $args) = @_;
 19 
 20     my $extended_env =
 21       $self->{env}->ExtendUnevaluated($self->{args}, $args);
 22     return $self->{body}->Eval($extended_env);
 23 }

8.5.4. Changes to the Environment

The various Extend*() methods of PScm::Env now take PScm::Expr::List objects as arguments rather than perl listrefs. This makes them the right place to implement the new variable arguments functionality of lambda. First of all Extend() itself:

 14 sub Extend {
 15     my ($self, $symbols, $values) = @_;
 16 
 17     return $self->ExtendUnevaluated($symbols,
 18                                     $values->map_eval($self));
 19 }

Variable names have changed to reflect the fact that they are no longer references to arrays, and Extend() uses map_eval() to evaluate the list of values before passing them to ExtendUnevaluated().

ExtendUnevaluated() is similarily changed:

 21 sub ExtendUnevaluated {
 22     my ($self, $symbols, $values) = @_;
 23 
 24     my %bindings;
 25     $self->_populate_bindings(\%bindings, $symbols, $values);
 26     my $newenv = $self->new(%bindings);
 27     $newenv->{parent} = $self;
 28     return $newenv;
 29 }

It uses a new private _populate_bindings() method to populate a hash of bindings from the $symbols and $values lists. After that it does what it always did, creating a new PScm::Env and setting its parent field to $self before returning it.

The private _populate_bindings() method actually does the “parsing” of the argument list of symbols and their binding to equivalent values.

 31 sub _populate_bindings {
 32     my ($self, $bindings, $symbols, $values) = @_;
 33     if ($symbols->is_null) {
 34         die "too many arguments\n" unless $values->is_null;
 35     } elsif ($symbols->is_pair) {
 36         if ($values->is_pair) {
 37             my $symbol = $symbols->first;
 38             if ($symbol->is_symbol) {
 39                 $bindings->{$symbol->value} = $values->first;
 40                 $self->_populate_bindings($bindings,
 41                                           $symbols->rest,
 42                                           $values->rest);
 43             } else {
 44                 die "bad formal arguments[1]:",
 45                     $symbol->as_string(), "\n";
 46             }
 47         } else {
 48             die "not enough arguments\n";
 49         }
 50     } elsif ($symbols->is_symbol) {
 51         $bindings->{$symbols->value} = $values;
 52     } else {
 53         die "bad formal arguments[2]:",
 54             $symbols->as_string(), "\n";
 55     }
 56 }

On Line 33 it checks to see if it has reached the end of the list of symbols. If it has, then it throws an error if it has not also reached the end of the list of values (too many arguments).

If $symbols is not null, then on Line 35 it checks to see if it is a pair. If it is, then it gets the current symbol, checks that it is a symbol, binds it to the equivalent value, and recurses on the rest of the two lists.

If $symbols is not a pair, then on Line 38 it checks to see if it is itself a symbol. This would correspond to either a single args in a lambda statement like (lambda args ...), or a dotted pair terminating a list of arguments. In either case if $symbols is a symbol, it binds it to the entirity of the rest of the $values list and terminates the recursion.

If $symbols is none of the empty list, a pair or a symbol then something is seriously wrong and _populate_bindings() throws an exception.

Next up is ExtendRecursively(). It is unchanged except that, as in other cases, its arguments have been renamed because they are no longer array references:

 58 sub ExtendRecursively {
 59     my ($self, $symbols, $values) = @_;
 60 
 61     my $newenv = $self->ExtendUnevaluated($symbols, $values);
 62     $newenv->_eval_values();
 63     return $newenv;
 64 }

Finally, ExtendIteratively() benefits from the new lists too. It returns $self if the list of symbols is empty, otherwise it calls Extend() on the first of each list then calls itself on the extended result with the rest of the list:

 66 sub ExtendIteratively {
 67     my ($self, $symbols, $values) = @_;
 68 
 69     if ($symbols->is_null) {
 70         return $self;
 71     } else {
 72         return $self->Extend($symbols->first, $values->first)
 73                     ->ExtendIteratively($symbols->rest,
 74                                         $values->rest);
 75     }
 76 }

Note that ExtendIteratively() is only used by let*, and so does not need to deal with non-lists.

8.5.5. Changes to the Reader

The changes in thr Reader support the input of dot notation, and they make direct use of PScm::Expr::List::Cons() to facilitate this. First of all Read() itself has been somewhat simplified:

 17 sub Read {
 18     my ($self) = @_;
 19 
 20     my $token = $self->_next_token();
 21     return undef unless defined $token;
 22 
 23     if ($token->is_open_token) {
 24         return $self->read_list();
 25     } else {
 26         return $token;
 27     }
 28 }

It may look very different but all that has really happened is that the old looping code which collects a list has been moved out into a separate method, and that method, read_list(), is now recursive instead of iterative:

 30 sub read_list {
 31     my ($self) = @_;
 32     my $token = $self->read_list_element();
 33     if ($token->is_close_token) {
 34         return new PScm::Expr::List::Null();
 35     } elsif ($token->is_dot_token) {
 36         $token = $self->read_list_element();
 37         die "syntax error" unless $token->is_expr;
 38         my $close = $self->read_list_element();
 39         die "syntax error" unless $close->is_close_token;
 40         return $token;
 41     } else {
 42         return PScm::Expr::List->Cons($token, $self->read_list());
 43     }
 44 }

read_list() collects its components using another new method read_list_element(). On Line 33 if it detects a closing brace it returns the empty list. If the token is not a closing brace, then on Line 35 if the token is a dot, it reads the next element, checks that it is an expression, checks that the element after that is a closing brace, and returns the element just read as the entire list. In the final case, if the token is neither a closing brace or a dot, it uses Cons() to construct a list with the current token as the first element and the result of calling read_list() as the rest. Hopefully you can convince yourself that this will deal correctly with both ordinary lists and dotted pair notation.

read_list_element() just centralises the otherwise tedious and repetitive checking for EOF which is a syntax error (unterminated list) while collecting list elements:

 46 sub read_list_element {
 47     my ($self) = @_;
 48     my $token = $self->Read();
 49     die "unexpected EOF"
 50       if !defined $token;
 51     return $token;
 52 }

Finally _next_token() has an extra clause to detect a standalone period (dot) and if it does it returns an instance of a new token class PScm::Token::Dot:

 54 sub _next_token {
 55     my ($self) = @_;
 56 
 57     while (!$self->{Line}) {
 58         $self->{Line} = $self->{FileHandle}->getline();
 59         return undef unless defined $self->{Line};
 60         $self->{Line} =~ s/^\s+//s;
 61     }
 62 
 63     for ($self->{Line}) {
 64         s/^\(\s*// && return PScm::Token::Open->new();
 65         s/^\)\s*// && return PScm::Token::Close->new();
 66         s/^\.\s*// && return PScm::Token::Dot->new();
 67         s/^([-+]?\d+)\s*//
 68           && return PScm::Expr::Number->new($1);
 69         s/^"((?:(?:\\.)|([^"]))*)"\s*// && do {
 70             my $string = $1;
 71             $string =~ s/\\//g;
 72             return PScm::Expr::String->new($string);
 73         };
 74         s/^([^\s\(\)]+)\s*//
 75           && return PScm::Expr::Symbol->new($1);
 76     }
 77     die "can't parse: $self->{Line}";
 78 }

The new class is in PScm/Token.pm alongside the others:

 27 package PScm::Token::Dot;
 28 
 29 use base qw(PScm::Token);
 30 
 31 sub is_dot_token { 1 }
 32 
 33 1;

There is a default is_dot_token() method in PScm::Token that returns false.

¹⁹

Obligatory footnote. CAR stands for “the Contents of the Address part of the Register” and CDR stands for “the Contents of the Decrement part of the Register.” This is a reference to the original hardware on which the first Lisp system was implemented, and means nothing now but the names have stuck.

²⁰

Well, actually it is possible to imagine circular structures that would defeat any notation. A true Scheme will even allow the creation of such circular lists, with such dubious expressions as (set-cdr! x x), but that's a world of pain that we will stay well away from.

²¹

The reason for the temporary @value variable is not redundant clarification of the code, it is a workaround for a rather obscure piece of Perl behaviour. If the code had simply said:

return ($self->[FIRST], $self->[REST]->value);
[download]

then the scalar context imposed by the isTrue() method in PScm::Expr would cause Perl to treat the comma as the comma operator, throwing away the left hand side and returning only the right, recursively, so all lists would end up being treated as false.

²²

This interesting trick of having an object to represent the absence of another object is a well-known design pattern called the Null Object Pattern.

Chapter 8. List Processing

8.1. `quote`

8.2. `list`

8.3. `car` and `cdr`

8.4. `cons`

8.4.1. Dot Notation

8.5. Implementation

8.5.1. Changes to Expressions

8.5.2. Changes to Primitives and Special Forms

8.5.3. Changes to Closures

8.5.4. Changes to the Environment

8.5.5. Changes to the Reader

8.6. Summary

8.7. Tests

Chapter 8. List Processing

8.1. quote

8.2. list

8.3. car and cdr

8.4. cons

8.4.1. Dot Notation

8.5. Implementation

8.5.1. Changes to Expressions

8.5.2. Changes to Primitives and Special Forms

8.5.3. Changes to Closures

8.5.4. Changes to the Environment

8.5.5. Changes to the Reader

8.6. Summary

8.7. Tests

8.1. `quote`

8.2. `list`

8.3. `car` and `cdr`

8.4. `cons`