Chapter 9. Macros

What is a macro? People familiar with the C programming language will probably think of macros as being purely a textual substitution mechanism done in some sort of preprocessing step before the compiler proper gets to look at the code. However that's a somewhat limited perspective, perfectly adequate for languages like C but constraining from our point of view. A better definition of a macro is any sort of substitution or replacement that can happen before the final code is executed.

The real importance of macros is their potential to allow syntactic extensions to their language. In the case of PScheme, each special form is a syntactic extension to the language, and so our working definition of a PScheme macro could be something that allows us to define our own special forms within the language itself. Here's an example. Suppose the language lacked the let special form. As was mentioned in Chapter 5, let shares a good deal in common with lambda. In fact any let expression, say

(let ((a 10)
      (b 20))
     (- b a))
[download]

has an equivalent lambda expression, in this case

((lambda (a b) (- b a)) 10 20)
[download]

The body of the let is the same as the body of the lambda, and the bindings of the let are split between the formal and actual arguments to the lambda expression. In general any let expression:

(let ((‹var1› ‹val1›)
      (‹var2› ‹val2›)
      ...)
     ‹expression›)
[download]

has an equivalent lambda form:

((lambda (‹var1› ‹var2› ...)
         ‹expression›)
    ‹val1› ‹val2› ...)
[download]

Of course internally let doesn't make use of closures, but in the case of the lambda equivalent to let, the lambda expression is evaluated immediately in the same environment as it was defined, so closure is immaterial. All that our purported let macro need do then, is to rewrite its arguments into an equivalent lambda form and have that executed in its place. We developed all of the list manipulation tools we will need to do that in the 0.0.5 version of the interpreter from Chapter 8 (remember that code and data are the same list objects so list functions can operate on both). All we need to do now is to think of a way to allow us to define macros.

Macros will obviously share a great deal in common with functions. They will have a separate declaration and use. They will also take arguments, and have a body that is evaluated in some way. In fact the first part of their implementation, that of parsing their declaration will be virtually identical to that of lambda expressions, except that the lambda keyword is already taken. We'll use “macro” in its place.

9.1. macro

As before then, we subclass PScm::SpecialForm and give the new class an Apply() method. The new class is called PScm::SpecialForm::Macro after its eponymous symbol. Here's the Apply() method for PScm::SpecialForm::Macro.

121 sub Apply {
122     my ($self, $form, $env) = @_;
123     my ($args, $body) = $form->value;
124     return PScm::Closure::Macro->new($args, $body, $env);
125 }

It's virtually identical to PScm::SpecialForm::Lambda except that it creates a new PScm::Closure::Macro instead of a PScm::Closure::Function. So we've left the problem of how to make a macro actually work until last, in the PScm::Closure::Macro's Apply() method.

9.2. Evaluating Macros

Consider how PScm::Closure::Function::Apply() works. It evaluates its arguments in the passed-in environment then gives the results to its parent PScm::Closure::_apply() method. That _apply() method extends the environment that was captured when the closure was created with bindings of those actual arguments to its formal arguments. Then it evaluates its body in that extended environment and returns the result. Here again is PScm::Closure::Function's Apply() method:

 39 sub Apply {
 40     my ($self, $form, $env) = @_;
 41 
 42     my $evaluated_args = $form->map_eval($env);
 43     return $self->_apply($evaluated_args);
 44 }

And here again is the private _apply() method in the base PScm::Closure class:

 17 sub _apply {
 18     my ($self, $args) = @_;
 19 
 20     my $extended_env =
 21       $self->{env}->ExtendUnevaluated($self->{args}, $args);
 22     return $self->{body}->Eval($extended_env);
 23 }

Any implementation of macros will share something in common with this implementation of functions, but there will be differences. Obviously a macro should be passed its arguments unevaluated. That way it can perform whatever (list) operations it likes on that structure. Then when it returns a new form, it is that form that gets evaluated.

In fact it's as simple as that, and here's the Apply() method for PScm::Closure::Macro:

 55 sub Apply {
 56     my ($self, $form, $env) = @_;
 57 
 58     my $new_form = $self->_apply($form);
 59     return $new_form->Eval($env);
 60 }

Compare that with the Apply() method from PScm::Closure::Function above.

Functions evaluate their arguments, then evaluate their body with those arguments bound. Macros don't evaluate their arguments, they evaluate their body with their unevaluated arguments bound, then they re-evaluate the result. This is quite subtle. Macros perform substitutions on their arguments, but the result of those substitutions must be subsequently evaluated for the macro to have had the desired effect.

To finish off this part of the implementation, we must remember that in Section 5.3 we made closures printable, and since macros are a new kind of closure, we must supply the supporting _symbol() method in PScm::Closure::Macro for the PScm::Closure::as_string() method to find. This _symbol() method returns the symbol macro so that if a macro is printed it will display as (macro ‹args› ‹body›). Here's the new PScm::Closure::Macro::_symbol() method.

 62 sub _symbol {
 63     PScm::Expr::Symbol->new('macro');
 64 }

9.2.1. Trying it out

Just for fun, let's take a look at how we might attack the problem which introduced this section: implementing let in terms of lambda. Remember that any let expression has an equivalent lambda form, so here's a use of macro that translates one into the other:

(let* ((mylet
         (macro (bindings body)
                (let* ((names (cars bindings))
                       (values (cadrs bindings)))
                  (cons (list (quote lambda) names body) values)))))
  (mylet ((a 1)
          (b 2))
         (list a b)))

This code uses let* (remember we're pretending that we don't have let) to bind mylet to a macro definition, then it uses mylet in the body of the let*. It makes use of some supporting functions that we'll define presently, but first let's try to get a feel for what it is doing. As stated above, the symbol macro introduces a macro definition. The arguments to mylet will be the same as those to let, namely a list of bindings and a body to execute with those bindings in place. It has to separate the bindings (symbol-value pairs) into two lists, one of the symbols and one of the values. It might be useful in the following discussion to refer to Figure 9.1 which shows the internal structure of the mylet form that we'll be rearranging.

Figure 9.1. Example mylet internal structure
figure

The mylet macro uses a function cars to extract the car of each binding (the symbol) into the list called names.

Here's the definition of cars:

(letrec (...
          (cars
            (lambda (lst)
              (map car lst)))
          ...)
  ...)

It uses another yet to be defined function map, which does the same as Perl's built in map: it applies a function to each element of a list and returns a new list of the results23. map is surprisingly easy to implement in PScheme:

(letrec ((map
           (lambda (op lst)
             (if lst
               (cons (op (car lst))
                     (map op (cdr lst)))
               ())))
         ...)

It's a recursive function, hence the need for letrec to bind it. Passed a function and list of zero or more bindings, if the list is empty it returns the empty list, otherwise it cons-es the result of calling the function on the car of the list with to the result of calling itself on the rest (cdr) of the list. So for example if lst is ((a 1) (b 2)), then (map car lst) would return the list (a b), and that is exactly what the cars function does.

cadrs24 is very similar. It walks the list collecting the second component of each sublist (the values of the bindings). So for example given the list ((a 1) (b 2)), cadrs will return the list (1 2).

(letrec (...
          (cadrs
            (lambda (lst)
              (map (lambda (x) (car (cdr x))) lst)))
          ...)
  ...)

Again it makes use of map this time passing it an anonymous function that will take the car of the cdr of its argument. This is very much in the style of real Scheme programming now: constructing lambda expressions on the fly and passing them to other functions as arguments, I hope you are aquiring a taste for it. Anyway here's the whole mylet definition plus some code that calls it.

(let* ((mylet
        (letrec ((map
                   (lambda (op lst)
                     (if lst
                       (cons (op (car lst))
                             (map op (cdr lst)))
                       ())))
                 (cars
                   (lambda (lst)
                     (map car lst)))
                 (cadrs
                   (lambda (lst)
                     (map (lambda (x) (car (cdr x))) lst))))
              (macro (bindings body)
                      (let* ((names (cars bindings))
                             (values (cadrs bindings)))
                        (cons (list (quote lambda) names body)
                              values))))))
    (mylet ((a 1)
            (b 2))
           (list a b)))

After collecting the names into one list and the values into another, the mylet macro builds:

((lambda (‹names›) (‹body›)) ‹values›)

Where ‹names›, ‹body› and ‹values› are expanded using the appropriate magic:

(cons (list (quote lambda) names body) values)

A point worth noting is that the constructed mylet macro is a true closure, since it has captured the definitions of the cars and cadrs functions and executes in an outer environment (the let*) where those functions are not visible.

9.2.2. An Improvement

The macro substitution system demonstrated so far is pretty crude, after all it requires the programmer to directly manipulate low-level list structures, rather than just supplying an “example” of how the transformation is to be performed. In fact the topic of macro expansion as provided by a full Scheme implementation is deserving of a book to itself. Apart from the templating ability, there are also issues of avoiding variable collision (so-called hygenic macros) so that full Scheme macros are much closer to the idea of C++'s inline functions than they are to C's #define25.

However there is one simple addition that we can make, which will greatly improve the usefulness of macros, and that involves an extension to the quote special form that we introduced in Section 8.1. If you remember quote just returns its argument, preventing unwanted evaluation. This already has proved useful in the construction of macros, as we have seen above.

Now one perfect use of quote would be to provide templates for macros, if we could arrange that parts of the quoted template could be substituted before the quoted template is returned. To that purpose we introduce a keyword unquote which marks a section of a quoted form for evaluation. Perhaps an example might make this clear:

> (let ((x "rain")
>       (y "spain")
>       (z "plain"))
>      (quote
>         (the (unquote x)
>          in (unquote y)
>          falls mainly on the (unquote z))))
(the "rain" in "spain" falls mainly on the "plain")

The let bindings bind x to the string "rain" etc. That is not the important part. The important part is the body of the let where the use of the unquote keyword allows evaluation of the contained expressions (x etc.) despite their being inside a quote.

How can this help us with macro definitions? Well in a big way! consider this macro definition of a while loop:

(define while
  (macro (test body)
         (quote (letrec
                  ((loop
                     (lambda ()
                       (if (unquote test)
                         (begin
                           (unquote body)
                           (loop))
                         ()))))
                  (loop)))))

It uses a few features that aren't available yet, like define and begin (which just executes one expression after another), and it would seem to be in danger of running out of stack, but I hope you can see that essentially the quote and unquote are doing all of the work building the body of the macro. The quoted result is shown in bold, with the internal substitutions unbolded again.

Implementing unquote is easy, but it's a little different from the normal special forms and primitives we've seen up to now. I've been careful to only refer to it as a “keyword”, because it means nothing special outside of a quoted expression.

We'll obviously have to change the way quote works to make this happen, so lets start by looking at the changed PScm::SpecialForm::Quote::Apply().

132 sub Apply {
133     my ($self, $form, $env) = @_;
134     return $form->first->Quote($env);
135 }

Rather than just returning its first argument, it now calls a new method Quote() on it, passing Quote() the current environment. Quote() essentially just makes a copy of the expressions concerned, but it keeps an eye out for unquote symbols. Now this method will be implemented in the PScm::Expr classes as follows:

The default Quote() in PScm::Expr just returns $self:

 41 sub Quote { $_[0] }

The Quote() in PScm::Expr::List::Pair is where most of the decision making happens.

133 sub Quote {
134     my ($self, $env) = @_;
135     if ($self->[FIRST]->is_unquote) {
136         return $self->[REST]->first->Eval($env);
137     } else {
138         return $self->quote_rest($env);
139     }
140 }

On Line 135 it checks to see if the first element of the list is the symbol unquote (is_unquote.) If it is then it evaluates the second element in the current environment and returns it. If the first element is not unquote then it hands over control to a helper routine quote_rest().

Here's quote_rest().

142 sub quote_rest {
143     my ($self, $env) = @_;
144     return $self->Cons(
145         $self->[FIRST]->Quote($env),
146         $self->[REST]->quote_rest($env)
147     );
148 }

It just walks the list, recursively, constructing a copy as it goes by calling Quote() on each element and calling Cons() on the quoted subexpression and the result of the recursive call26.

The PScm::Expr::List::Null package inherits Quote() from PScm::Expr, which just returns $self, and PScm::Expr also has a quote_rest() method which also just returns $self and usefully terminates the recursion of the non-empty PScm::Expr::List quote_rest() method.

 43 sub quote_rest { $_[0] }

That just leaves that is_unquote() method. Well since only a symbol could possibly be unquote, we can put a default is_unquote() method at the top of the expression type hierachy, in PScm::Expr, which just returns false:

 12 sub is_unquote { 0 }

Then for PScm::Expr::Symbol only, we override that with a method that checks to see if its value() is the string "unquote":

182 sub is_unquote {
183     my ($self) = @_;
184     return $self->value eq "unquote";
185 }

That completes our re-implementation of quote to allow the recognition of the unquote keyword, but we're not quite done yet.

quote and unquote turn out to be so useful in the definition of macros that PScheme provides shorthand syntactic sugar for these forms. The construct '‹expression› (note the single quote) gets expanded to (quote ‹expression›), and similarily the construct ,‹expression› with a leading comma gets expanded to (unquote ‹expression›). This is fairly easy to do, so let's see what changes we need to make to the reader to make this happen.

First here's the changes to PScm::Read::_next_token().

 66 sub _next_token {
 67     my ($self) = @_;
 68 
 69     while (!$self->{Line}) {
 70         $self->{Line} = $self->{FileHandle}->getline();
 71         return undef unless defined $self->{Line};
 72         $self->{Line} =~ s/^\s+//s;
 73     }
 74 
 75     for ($self->{Line}) {
 76         s/^\(\s*// && return PScm::Token::Open->new();
 77         s/^\)\s*// && return PScm::Token::Close->new();
 78         s/^\'\s*// && return PScm::Token::Quote->new();
 79         s/^\,\s*// && return PScm::Token::Unquote->new();
 80         s/^\.\s*// && return PScm::Token::Dot->new();
 81         s/^([-+]?\d+)\s*//
 82           && return PScm::Expr::Number->new($1);
 83         s/^"((?:(?:\\.)|([^"]))*)"\s*// && do {
 84             my $string = $1;
 85             $string =~ s/\\//g;
 86             return PScm::Expr::String->new($string);
 87         };
 88         s/^([^\s\(\)]+)\s*//
 89           && return PScm::Expr::Symbol->new($1);
 90     }
 91     die "can't parse: $self->{Line}";
 92 }

The change is very simple. You can see that on Lines 78–79 if it strips a leading quote or comma, it returns an equivalent token object. Those new token types are both in PScm/Token.pm, here's PScm::Token::Quote.

 29 package PScm::Token::Quote;
 30 
 31 use base qw(PScm::Token);
 32 
 33 sub is_quote_token { 1 }

It inherits from PScm::Token and a default is_quote_token() there returns false.

The equivalent PScm::Token::Unquote deliberately inherits from PScm::Token::Quote rather than just PScm::Token so it gets the overridden is_quote_token() method, and supplies an additional is_unquote_token() method returning true. Again a default is_unquote_token() in PScm::Token returns false.

 36 package PScm::Token::Unquote;
 37 
 38 use base qw(PScm::Token::Quote);
 39 
 40 sub is_unquote_token { 1 }

The upshot of this is that both PScm::Token::Quote and PScm::Token::Unquote return true for is_quote_token(), but only PScm::Token::Unquote returns true for is_unquote_token(). Finally, let's see how the reader PScm::Read::Read() makes use of these new token objects.

 17 sub Read {
 18     my ($self) = @_;
 19 
 20     my $token = $self->_next_token();
 21     return undef unless defined $token;
 22 
 23     if ($token->is_quote_token) {
 24         my $expr = $self->Read;
 25         die "syntax Error"
 26             unless defined($expr) && $expr->is_expr;
 27         return new PScm::Expr::List(
 28             $token->is_unquote_token
 29                 ? new PScm::Expr::Symbol('unquote')
 30                 : new PScm::Expr::Symbol('quote'),
 31             $expr
 32         );
 33     }
 34 
 35     if ($token->is_open_token) {
 36         return $self->read_list();
 37     } else {
 38         return $token;
 39     }
 40 }

The additional code on Lines 23–33 checks to see if the token is a quote or unquote token, and if so reads the next expression, checks that it is valid and returns a new PScm::Expr::List containing the appropriate quote or unquote symbol and the expression read afterwards. The is_expr() method is defined to be true in PScm::Expr and false in PScm::Token, and its use here stops dubious constructs like “')”.

So we now have a convenient shorthand for quote and unquote. To demonstrate it in action, here's that while macro again, this time using the new tokens.

(define while
  (macro (test body)
         '(letrec
            ((loop
               (lambda ()
                 (if ,test
                   (begin
                     ,body
                     (loop))
                   ()))))
            (loop))))

We'll be making significant use of macro, quote and unquote in subsequent chapters, so it's worth familiarizing yourself now with this new idiom27.

9.2.3. One Last Addition

Sometimes given a quoted expression, you'd really just like to have it evaluated. It may have been passed as an argument, or it may have been constructed in some other way. What you need is another round of evaluation. This is supplied by the special form eval. eval really does do just that. For example:

> (eval '(* 2 2))
4

The quote stopped the first round of evaluation, but eval got another try at it. Here's another example:

> (eval (list '* 4 4))
16

eval is quite simple. It is a special form because it needs access to an environment in which to perform the evaluation (remember primitives have their arguments evaluated for them and so don't need an environment.) It evaluates its first argument in the current environment (special forms don't have their arguments evaluated for them,) then it evaluates the result a second time, this time in the top-level environment. Here's PScm::SpecialForm::Eval:

 93 package PScm::SpecialForm::Eval;
 94 
 95 use base qw(PScm::SpecialForm);
 96 
 97 sub Apply {
 98     my ($self, $form, $env) = @_;
 99     $form->first()->Eval($env)->Eval($env->top);
100 }

You can see that the second round of evaluation is done in the context of the top-level environment obtained by calling a new method top() on the current environment. That top() method is also very simple:

 14 sub top {
 15     my ($self) = @_;
 16     if ($self->{parent}) {
 17         return $self->{parent}->top;
 18     } else {
 19         return $self;
 20     }
 21 }

It just checks to see if it has a parent, calling top() recursively on that if it has, and returning itself if it hasn't.

One thing to watch out for with eval: the code that is evaluated is not a closure. Any variables in that code will be looked up in the top-level environment, not the one where the expression was constructed, nor the one that is current when eval is called. For example:

> (let ((* -))
>   (* 3 3))
0
> (let ((* -))
>   (eval '(* 3 3)))
9

Nonetheless eval is a useful tool in your kit, we'll see it in action in later chapters.

9.3. Summary

Here's the additions to ReadEvalPrint() which bind our new macro feature and eval in the initial environment. The quote binding was already there, and as shown above, unquote is only a keyword and does not need a binding:

 31 sub ReadEvalPrint {
 32     my ($infh, $outfh) = @_;
 33 
 34     $outfh ||= new FileHandle(">-");
 35     my $reader = new PScm::Read($infh);
 36     while (defined(my $expr = $reader->Read)) {
 37         my $result = $expr->Eval(
 38             new PScm::Env(
 39                 let    => new PScm::SpecialForm::Let(),
 40                 '*'    => new PScm::Primitive::Multiply(),
 41                 '-'    => new PScm::Primitive::Subtract(),
 42                 if     => new PScm::SpecialForm::If(),
 43                 lambda => new PScm::SpecialForm::Lambda(),
 44                 list   => new PScm::Primitive::List(),
 45                 car    => new PScm::Primitive::Car(),
 46                 cdr    => new PScm::Primitive::Cdr(),
 47                 cons   => new PScm::Primitive::Cons(),
 48                 letrec => new PScm::SpecialForm::LetRec(),
 49                 'let*' => new PScm::SpecialForm::LetStar(),
 50                 eval   => new PScm::SpecialForm::Eval(),
 51                 macro  => new PScm::SpecialForm::Macro(),
 52                 quote  => new PScm::SpecialForm::Quote(),
 53             )
 54         );
 55         $result->Print($outfh);
 56     }
 57 }

9.4. Tests

The tests for macro and unquote are in Listing 18.

Listing 18. t/PScm_Macro.t
  1 use strict;
  2 use warnings;
  3 use Test::More;
  4 use lib 't/lib';
  5 use PScm::Test tests => 5;
  6 
  7 BEGIN { use_ok('PScm') }
  8 
  9 eval_ok(<<EOF, '(1 2)', 'macros');
 10 (let* ((mylet
 11         (letrec ((map
 12                    (lambda (op lst)
 13                      (if lst
 14                        (cons (op (car lst))
 15                              (map op (cdr lst)))
 16                        ())))
 17                  (cars
 18                    (lambda (lst)
 19                      (map car lst)))
 20                  (cadrs
 21                    (lambda (lst)
 22                      (map (lambda (x) (car (cdr x))) lst))))
 23               (macro (bindings body)
 24                       (let* ((names (cars bindings))
 25                              (values (cadrs bindings)))
 26                         (cons (list (quote lambda) names body)
 27                               values))))))
 28     (mylet ((a 1)
 29             (b 2))
 30            (list a b)))
 31 EOF
 32 
 33 eval_ok(<<EOF, <<EOR, 'unquote');
 34 (let ((x (quote rain))
 35       (y (quote spain))
 36       (z (quote plain)))
 37      (quote (the (unquote x)
 38              in (unquote y)
 39              falls mainly on the
 40              (unquote z))))
 41 EOF
 42 (the rain in spain falls mainly on the plain)
 43 EOR
 44 
 45 eval_ok(<<EOF, <<EOR, 'quote and unquote syntactic sugar');
 46 (let ((x 'rain)
 47       (y 'spain)
 48       (z 'plain))
 49      '(the ,x
 50            in ,y
 51            falls mainly on the
 52            ,z))
 53 EOF
 54 (the rain in spain falls mainly on the plain)
 55 EOR
 56 
 57 eval_ok(<<EOF, <<EOR, 'macro to string');
 58 (macro (x)
 59    '(a ,x))
 60 EOF
 61 (macro (x) (quote (a (unquote x))))
 62 EOR
 63 
 64 # vim: ft=perl

The first test just implements and tests the mylet example we worked through in the text, and the second test shows unquote in action with a variation on another example we've already seen. The third test exercises the syntax extensions in the reader, and the fourth test demonstrates that macros, like closures, produce a textual representation of themselves when printed.

The tests for eval are in Listing 19. This just does a simple evaluation of a quoted form.

Listing 19. t/PScm_Eval.t
  1 use strict;
  2 use warnings;
  3 use Test::More;
  4 use lib 't/lib';
  5 use PScm::Test tests => 3;
  6 
  7 BEGIN { use_ok('PScm') }
  8 
  9 eval_ok(<<EOT, <<EOR, 'eval');
 10 (eval '(* 2 2))
 11 EOT
 12 4
 13 EOR
 14 
 15 eval_ok(<<EOT, <<EOR, 'eval operates in the top-level environment');
 16 (let ((* -))
 17   (eval '(* 3 3)))
 18 EOT
 19 9
 20 EOR
 21 
 22 # vim: ft=perl
Full source code for this version of the interpreter is available at
http://billhails.net/Book/releases/PScm-0.0.6.tgz
Last updated Sun Mar 14 10:43:08 2010 UST
23

Perl actually borrows its map function from Lisp, which has had one for many years.

24

The term cadr is a contraction of “car of the cdr” e.g. (cadr x) == (car (cdr x)). this sort of contraction is often seen in scheme code, sometimes nested as much as four or five levels deep, i.e. cadadr.

25

A full scheme implementation provides an extend-syntax special form. Using extend-syntax, defining mylet is as simple as:

(extend-syntax (mylet)
    (mylet ((var val) ...) body)
    ((lambda (var ...) body) val ...))
26

Note the similarity between this method and the definition of map in Pscheme above.

27

The quote and unquote described here are done differently in true Scheme. A true Scheme implementation distinguishes between a simple quote which does not recognize unquote, and an alternative quasiquote which does. This means quote is as efficient as our original implementation, but we still have access to an unquote mechanism. The quote form still has the “'” syntactic sugar, and quasiquote uses the alternative “`” (backtick) shorthand. Additionally a full Scheme provides an unquote-splicing (“,@”) which expects a list and splices it into the existing form at that point.