The question arises as to why the implementation of the
define
special form,
described back in
Section 2.4
has beed deferred for so long,
when it would have
made so much of the previous discussion easier, particularily the
Scheme examples. Well there are good reasons. Consider
what the language so far has got.
So far, the language is fairly complete. It is however a purely functional language, in the sense that its expressions are like mathematical functions: the value of a variable is fixed for the duration of the expression, and it makes no difference in what order the arguments to a function are evaluated. There is no sequential execution (though there is recursion) and most importantly there are no side effects. That means that one part of a program cannot affect the execution of another part that it does not contain, except by returning values that get passed in as arguments to that other part.
While this might seem to be a limitation of the language, it does in fact have some very important advantages. Essentially it makes large-scale expressions in the language very simple to analyse: since one part of an expression cannot affect another independant part, different components of a program (including the current environment) could be fed to separate processes, perhaps even separate machines on a network.
For example suppose we had a networked version of the
language. There would be a pool of processes available to do work, and
each process could delegate sub-parts of their work to other processes. A
simple rule might be that for parallel binding, e.g. let
bindings
and the arguments to closures and primitive functions, the evaluation
of the arguments could be performed in parallel by “outsourcing”
the evaluation of
each subexpression to a different process. For example consider the
following fragment:
... (let ((a (func1 x)) (b (func2 y)) (c (func3 z))) (func4 (func5 a) (func6 b) (func7 c))) ...
The let
could elect to send the subexpression (func1 x)
(plus the environment where func1
and x
have
a value) to one process, and the subexpression (func2 y)
to another. While those two expressions are being evaluated the let
could get on with evaluating (func3 z)
itself. Then when it
had finished that evaluation it would collect the result of the other two
evaluations and proceed to evaluate its body. Similarily the evaluation
of the body (the call to func4
) could outsource
the evaluation of the arguments (func5 a)
and (func6
b)
and get on with evaluating (func7 c)
, collecting
the other results when it finished, and then proceeding to evaluate
the func4
call with those evaluated arguments. It probably
shouldn't outsource something a simple as variable lookup.
The implementation of this networked programming language is left to
you. Some sort of central ticketing server would be needed, with
a queue where requestors could post their requests in exchange for
a ticket, and a pool where
clients could post their results so that they need not wait for the
requestor to get back to them, and there'd have to be some way
of telling the server that a particular ticket depends
on other tickets, so that a client would never block waiting
for the server to return an outstanding ticket... quite an
interesting project. The real point though is that both
let
and lambda
not only conceptually evaluate and
bind their arguments in parallel, they could actually
do so without disturbing the sense of the program.
Beyond this point that kind of application becomes nearly impossible because we are about to introduce side effects, in particular variable assignment and, to make that useful, sequences of expressions. That's just the proper name for something we're all very familiar with—one statement following another.
In fact, the primary difference between a statement and an
expression is that a statement is executed purely for
its side effects. There are only a couple of different side effects that
we need to consider. The first is variable assignment, that is
to say the changing of the existing value of a variable binding. The second
is definition, the creation of new entries in the current
environment. Note that this is very different from what the various
let
special forms do. They always create a new environment, they never
modify an existing one. Even letrec
which may appear to be
modifying an existing environment by changing bindings is not really
doing so: nothing actually gets to use that environment until
after letrec
has finished building it, the creation of that
new environment is atomic as far as the PScheme language is concerned.
Just to reiterate before we move on. In a functional language it makes no difference in what order the arguments to a function are evaluated, but in a language with side-effects, if those arguments cause side-effects during their evaluation, then the order of evaluation is significant and must be taken into account when designing a program.
This version of the interpreter will only add variable assignment (and sequences,) definition is left for later. Variable assignment should be a special form, so that the variable being assigned to does not get evaluated. In Scheme, and PScheme, the symbol bound to the variable assignment special form is set!, the exclaimation point signifying that this operation has a side effect that might upset an otherwise purely functional program28. The syntax is simple, but there is a gotcha:
(set! a 10) Error: no binding for a in PScm::Env
This may sound unnecessarily picky, but for variable assignment
to work, there must already be a binding in place that the assignment
can affect. The reasoning is that the variable being assigned to
need not be in the current environment frame: it must be searched
for, and if we allow set!
to create new variables then
they would probably be installed in the current top frame. So the
scope of a variable that was set!
would depend on
whether the variable already existed or not, which is inconsistent
at best.
Anyway, this works:
> (let* ((a 5) > (dummy (set! a 10)) > a)) 10
This is a bit contrived, we use a dummy variable to allow the
set!
expression to be evaluated before the body of the
let*
returns the new value of a
.
However it should be clear from the example that the set!
did
take effect.
So how do we go about implementing set!
? Well as luck
would have it, we already have a method for changing existing bindings in
an environment, we have the Assign()
method that was developed
for the letrec
special form in a previous incarnation. It has
precisely the right semantics too (what a co-incidence!) It searches
through the chain of environment frames until it finds an appropriate
binding, assigning the value if it does and die()
-ing if it
doesn't. Here it is again:
59 sub Assign { 60 my ($self, $symbol, $value) = @_; 61 62 if (defined(my $ref = $self->_lookup_ref($symbol))) { 63 $$ref = $value; 64 } else { 65 die "no binding for @{[$symbol->value]}", 66 " in @{[ref($self)]}\n"; 67 } 68 }
So all we have to do is wire it up. The process of creating new
special forms should be familiar by now. Firstly we subclass
PScm::SpecialForm and give our new class an Apply()
method. The
new class in this case is PScm::SpecialForm::Set. Its
Apply()
method as usual takes a form and the current environment
as arguments. In this case the form should contain a symbol and a value
expression. This Apply()
will evaluate the expression and call the
environment's Assign()
method with the symbol and resulting
value as arguments:
138 package PScm::SpecialForm::Set; 139 140 use base qw(PScm::SpecialForm); 141 142 sub Apply { 143 my ($self, $form, $env) = @_; 144 my ($symbol, $expr) = $form->value; 145 $env->Assign($symbol, $expr->Eval($env)); 146 }
And that's all there is to it, apart from adding a
PScm::SpecialForm::Set object to the initial
environment, bound to the symbol set!
.
Sequences are another fairly trivial addition to the language. Rather than a single expression, a sequence contains zero or more expressions. Each expression is evaluated in order, and the value of the last expression is the value of the sequence. The value of an empty sequence is null, the empty list. This is such a common thing in many other languages (such as Perl) that it goes without notice, but thinking about it, sequences only become relevant and useful in the presence of side effects. Since only the value of the last expression is returned, preceding expressions can only affect the computation if they have side effects.
The keyword introducing a sequence in PScm is begin.
begin
takes a list of zero or more expressions, evaluates
each one, and returns the value of the last expression, or null if there
are no expressions to evaluate. It functions
something like a block in Perl, except that a Perl block also encloses
a variable scope like let
does, but begin
does not imply
any scope.
(begin <expression1> <expression2> ...)
With begin
, we can write that rather awkward
set!
example from the previous section a lot more
clearly:
> (let ((a 5)) > (begin > (set! a 10) > a)) 10
begin
could, in fact, be implemented as a function,
provided that functions are guaranteed to evaluate their arguments in
left-to-right order, as this implementation does. However it is
safer not to make that assumption (remember the networked language
where evaluation was envisaged in parallel,) so begin
is better implemented as a special form which can guarantee that
left to right behaviour.
As might be imagined the code is quite trivial: evaluate each
component of the form and return the last value. We pick as an
initial value the empty list, so that if the body of the begin
is empty, that is the result. As usual we subclass PScm::SpecialForm,
in this case to PScm::SpecialForm::Begin, and give the new class an
Apply()
method:
149 package PScm::SpecialForm::Begin; 150 151 use base qw(PScm::SpecialForm); 152 153 sub Apply { 154 my ($self, $form, $env) = @_; 155 my (@expressions) = $form->value; 156 157 my $ret = new PScm::Expr::List(); 158 159 while (@expressions) { 160 $ret = shift(@expressions)->Eval($env); 161 } 162 163 return $ret; 164 } 165 166 1;
On
Line 155 it extracts the
expressions from the argument $form
.
Then on
Line 157
it initialises the return
value $ret
to an initial value. On
Lines 159–161
it loops over each expression, evaluating it in the
current environment, and assigning the result to $ret
,
replacing the previous value. Lastly on
Line 163
it returns the final value.
In this section we've seen variable assignment added to the language, but also taken some time to consider some drawbacks of that feature. We've also looked at how sequences become useful in the presence of variable assignment. In fact sequences serve no purpose without side-effects and side-effects are difficult to do without sequences.
As usual we need to add our new forms to the interpreter by
binding them in the initial environment. Here's ReadEvalPrint()
with the additional bindings.
31 sub ReadEvalPrint { 32 my ($infh, $outfh) = @_; 33 34 $outfh ||= new FileHandle(">-"); 35 my $reader = new PScm::Read($infh); 36 while (defined(my $expr = $reader->Read)) { 37 my $result = $expr->Eval( 38 new PScm::Env( 39 let => new PScm::SpecialForm::Let(), 40 '*' => new PScm::Primitive::Multiply(), 41 '-' => new PScm::Primitive::Subtract(), 42 if => new PScm::SpecialForm::If(), 43 lambda => new PScm::SpecialForm::Lambda(), 44 list => new PScm::Primitive::List(), 45 car => new PScm::Primitive::Car(), 46 cdr => new PScm::Primitive::Cdr(), 47 cons => new PScm::Primitive::Cons(), 48 letrec => new PScm::SpecialForm::LetRec(), 49 'let*' => new PScm::SpecialForm::LetStar(), 50 eval => new PScm::SpecialForm::Eval(), 51 macro => new PScm::SpecialForm::Macro(), 52 quote => new PScm::SpecialForm::Quote(), 53 'set!' => new PScm::SpecialForm::Set(), 54 begin => new PScm::SpecialForm::Begin(), 55 ) 56 ); 57 $result->Print($outfh); 58 } 59 }
It's worth pointing out a major difference between PScheme and standard
scheme implementations here. In standard scheme, function
bodies and the bodies of let
expressions and their variants
are all implicit sequences.
So in fact in a real scheme implementation
you could just write:
> (let ((a 5)) > (set! a 10) > a) 10
In PScheme function and let
bodies are single statements and require an explicit begin
to create a sequence,
because I wanted to keep the
distinction between single expressions and sequences clear to the
reader.
t/PScm_SideEffects.t
1 use strict; 2 use warnings; 3 use Test::More; 4 use lib 't/lib'; 5 use PScm::Test tests => 2; 6 7 BEGIN { use_ok('PScm') } 8 9 eval_ok(<<EOF, '2', 'begin and assignment'); 10 (let ((a 1)) 11 (begin (set! a 2) 12 a)) 13 EOF 14 15 # vim: ft=perl
A test of set!
and begin
can be seen in
Listing 20. The test binds a
to 1
in a let
, then in the body of the let
it performs a begin
which set!
s
a
to 2
then returns the new value of
a
.
Full source code for this version of the interpreter is available athttp://billhails.net/Book/releases/PScm-0.0.7.tgz
The exclaimation
point is used to suffix all such side-effecting operations, with the
exception of define
, for some reason.