Chapter 10. Side Effects

The question arises as to why the implementation of the define special form, described back in Section 2.4 has beed deferred for so long, when it would have made so much of the previous discussion easier, particularily the Scheme examples. Well there are good reasons. Consider what the language so far has got.

10.1. The Beauty of Functional Languages

So far, the language is fairly complete. It is however a purely functional language, in the sense that its expressions are like mathematical functions: the value of a variable is fixed for the duration of the expression, and it makes no difference in what order the arguments to a function are evaluated. There is no sequential execution (though there is recursion) and most importantly there are no side effects. That means that one part of a program cannot affect the execution of another part that it does not contain, except by returning values that get passed in as arguments to that other part.

While this might seem to be a limitation of the language, it does in fact have some very important advantages. Essentially it makes large-scale expressions in the language very simple to analyse: since one part of an expression cannot affect another independant part, different components of a program (including the current environment) could be fed to separate processes, perhaps even separate machines on a network.

For example suppose we had a networked version of the language. There would be a pool of processes available to do work, and each process could delegate sub-parts of their work to other processes. A simple rule might be that for parallel binding, e.g. let bindings and the arguments to closures and primitive functions, the evaluation of the arguments could be performed in parallel by “outsourcing” the evaluation of each subexpression to a different process. For example consider the following fragment:

...
(let ((a (func1 x))
      (b (func2 y))
      (c (func3 z)))
  (func4 (func5 a) (func6 b) (func7 c)))
...

The let could elect to send the subexpression (func1 x) (plus the environment where func1 and x have a value) to one process, and the subexpression (func2 y) to another. While those two expressions are being evaluated the let could get on with evaluating (func3 z) itself. Then when it had finished that evaluation it would collect the result of the other two evaluations and proceed to evaluate its body. Similarily the evaluation of the body (the call to func4) could outsource the evaluation of the arguments (func5 a) and (func6 b) and get on with evaluating (func7 c), collecting the other results when it finished, and then proceeding to evaluate the func4 call with those evaluated arguments. It probably shouldn't outsource something a simple as variable lookup.

The implementation of this networked programming language is left to you. Some sort of central ticketing server would be needed, with a queue where requestors could post their requests in exchange for a ticket, and a pool where clients could post their results so that they need not wait for the requestor to get back to them, and there'd have to be some way of telling the server that a particular ticket depends on other tickets, so that a client would never block waiting for the server to return an outstanding ticket... quite an interesting project. The real point though is that both let and lambda not only conceptually evaluate and bind their arguments in parallel, they could actually do so without disturbing the sense of the program.

Beyond this point that kind of application becomes nearly impossible because we are about to introduce side effects, in particular variable assignment and, to make that useful, sequences of expressions. That's just the proper name for something we're all very familiar with—one statement following another.

In fact, the primary difference between a statement and an expression is that a statement is executed purely for its side effects. There are only a couple of different side effects that we need to consider. The first is variable assignment, that is to say the changing of the existing value of a variable binding. The second is definition, the creation of new entries in the current environment. Note that this is very different from what the various let special forms do. They always create a new environment, they never modify an existing one. Even letrec which may appear to be modifying an existing environment by changing bindings is not really doing so: nothing actually gets to use that environment until after letrec has finished building it, the creation of that new environment is atomic as far as the PScheme language is concerned.

Just to reiterate before we move on. In a functional language it makes no difference in what order the arguments to a function are evaluated, but in a language with side-effects, if those arguments cause side-effects during their evaluation, then the order of evaluation is significant and must be taken into account when designing a program.

10.2. Variable Assignment

This version of the interpreter will only add variable assignment (and sequences,) definition is left for later. Variable assignment should be a special form, so that the variable being assigned to does not get evaluated. In Scheme, and PScheme, the symbol bound to the variable assignment special form is set!, the exclaimation point signifying that this operation has a side effect that might upset an otherwise purely functional program28. The syntax is simple, but there is a gotcha:

(set! a 10)
Error: no binding for a in PScm::Env

This may sound unnecessarily picky, but for variable assignment to work, there must already be a binding in place that the assignment can affect. The reasoning is that the variable being assigned to need not be in the current environment frame: it must be searched for, and if we allow set! to create new variables then they would probably be installed in the current top frame. So the scope of a variable that was set! would depend on whether the variable already existed or not, which is inconsistent at best.

Anyway, this works:

> (let* ((a 5)
>        (dummy (set! a 10))
>   a))
10

This is a bit contrived, we use a dummy variable to allow the set! expression to be evaluated before the body of the let* returns the new value of a. However it should be clear from the example that the set! did take effect.

So how do we go about implementing set!? Well as luck would have it, we already have a method for changing existing bindings in an environment, we have the Assign() method that was developed for the letrec special form in a previous incarnation. It has precisely the right semantics too (what a co-incidence!) It searches through the chain of environment frames until it finds an appropriate binding, assigning the value if it does and die()-ing if it doesn't. Here it is again:

 59 sub Assign {
 60     my ($self, $symbol, $value) = @_;
 61 
 62     if (defined(my $ref = $self->_lookup_ref($symbol))) {
 63         $$ref = $value;
 64     } else {
 65         die "no binding for @{[$symbol->value]}",
 66           " in @{[ref($self)]}\n";
 67     }
 68 }

So all we have to do is wire it up. The process of creating new special forms should be familiar by now. Firstly we subclass PScm::SpecialForm and give our new class an Apply() method. The new class in this case is PScm::SpecialForm::Set. Its Apply() method as usual takes a form and the current environment as arguments. In this case the form should contain a symbol and a value expression. This Apply() will evaluate the expression and call the environment's Assign() method with the symbol and resulting value as arguments:

138 package PScm::SpecialForm::Set;
139 
140 use base qw(PScm::SpecialForm);
141 
142 sub Apply {
143     my ($self, $form, $env) = @_;
144     my ($symbol, $expr) = $form->value;
145     $env->Assign($symbol, $expr->Eval($env));
146 }

And that's all there is to it, apart from adding a PScm::SpecialForm::Set object to the initial environment, bound to the symbol set!.

10.3. Sequences

Sequences are another fairly trivial addition to the language. Rather than a single expression, a sequence contains zero or more expressions. Each expression is evaluated in order, and the value of the last expression is the value of the sequence. The value of an empty sequence is null, the empty list. This is such a common thing in many other languages (such as Perl) that it goes without notice, but thinking about it, sequences only become relevant and useful in the presence of side effects. Since only the value of the last expression is returned, preceding expressions can only affect the computation if they have side effects.

The keyword introducing a sequence in PScm is begin. begin takes a list of zero or more expressions, evaluates each one, and returns the value of the last expression, or null if there are no expressions to evaluate. It functions something like a block in Perl, except that a Perl block also encloses a variable scope like let does, but begin does not imply any scope.

(begin <expression1> <expression2> ...)

With begin, we can write that rather awkward set! example from the previous section a lot more clearly:

> (let ((a 5))
>      (begin
>          (set! a 10)
>          a))
10

begin could, in fact, be implemented as a function, provided that functions are guaranteed to evaluate their arguments in left-to-right order, as this implementation does. However it is safer not to make that assumption (remember the networked language where evaluation was envisaged in parallel,) so begin is better implemented as a special form which can guarantee that left to right behaviour.

As might be imagined the code is quite trivial: evaluate each component of the form and return the last value. We pick as an initial value the empty list, so that if the body of the begin is empty, that is the result. As usual we subclass PScm::SpecialForm, in this case to PScm::SpecialForm::Begin, and give the new class an Apply() method:

149 package PScm::SpecialForm::Begin;
150 
151 use base qw(PScm::SpecialForm);
152 
153 sub Apply {
154     my ($self, $form, $env) = @_;
155     my (@expressions) = $form->value;
156 
157     my $ret = new PScm::Expr::List();
158 
159     while (@expressions) {
160         $ret = shift(@expressions)->Eval($env);
161     }
162 
163     return $ret;
164 }
165 
166 1;

On Line 155 it extracts the expressions from the argument $form. Then on Line 157 it initialises the return value $ret to an initial value. On Lines 159–161 it loops over each expression, evaluating it in the current environment, and assigning the result to $ret, replacing the previous value. Lastly on Line 163 it returns the final value.

10.4. Summary

In this section we've seen variable assignment added to the language, but also taken some time to consider some drawbacks of that feature. We've also looked at how sequences become useful in the presence of variable assignment. In fact sequences serve no purpose without side-effects and side-effects are difficult to do without sequences.

As usual we need to add our new forms to the interpreter by binding them in the initial environment. Here's ReadEvalPrint() with the additional bindings.

 31 sub ReadEvalPrint {
 32     my ($infh, $outfh) = @_;
 33 
 34     $outfh ||= new FileHandle(">-");
 35     my $reader = new PScm::Read($infh);
 36     while (defined(my $expr = $reader->Read)) {
 37         my $result = $expr->Eval(
 38             new PScm::Env(
 39                 let    => new PScm::SpecialForm::Let(),
 40                 '*'    => new PScm::Primitive::Multiply(),
 41                 '-'    => new PScm::Primitive::Subtract(),
 42                 if     => new PScm::SpecialForm::If(),
 43                 lambda => new PScm::SpecialForm::Lambda(),
 44                 list   => new PScm::Primitive::List(),
 45                 car    => new PScm::Primitive::Car(),
 46                 cdr    => new PScm::Primitive::Cdr(),
 47                 cons   => new PScm::Primitive::Cons(),
 48                 letrec => new PScm::SpecialForm::LetRec(),
 49                 'let*' => new PScm::SpecialForm::LetStar(),
 50                 eval   => new PScm::SpecialForm::Eval(),
 51                 macro  => new PScm::SpecialForm::Macro(),
 52                 quote  => new PScm::SpecialForm::Quote(),
 53                 'set!' => new PScm::SpecialForm::Set(),
 54                 begin  => new PScm::SpecialForm::Begin(),
 55             )
 56         );
 57         $result->Print($outfh);
 58     }
 59 }

It's worth pointing out a major difference between PScheme and standard scheme implementations here. In standard scheme, function bodies and the bodies of let expressions and their variants are all implicit sequences. So in fact in a real scheme implementation you could just write:

> (let ((a 5))
>      (set! a 10)
>      a)
10

In PScheme function and let bodies are single statements and require an explicit begin to create a sequence, because I wanted to keep the distinction between single expressions and sequences clear to the reader.

10.5. Tests

Listing 20. t/PScm_SideEffects.t
  1 use strict;
  2 use warnings;
  3 use Test::More;
  4 use lib 't/lib';
  5 use PScm::Test tests => 2;
  6 
  7 BEGIN { use_ok('PScm') }
  8 
  9 eval_ok(<<EOF, '2', 'begin and assignment');
 10 (let ((a 1))
 11     (begin (set! a 2)
 12            a))
 13 EOF
 14 
 15 # vim: ft=perl

A test of set! and begin can be seen in Listing 20. The test binds a to 1 in a let, then in the body of the let it performs a begin which set!s a to 2 then returns the new value of a.

Full source code for this version of the interpreter is available at
http://billhails.net/Book/releases/PScm-0.0.7.tgz
Last updated Sun Mar 14 10:43:08 2010 UST
28

The exclaimation point is used to suffix all such side-effecting operations, with the exception of define, for some reason.