What is a macro? People familiar with the C programming language will probably think of macros as being purely a textual substitution mechanism done in some sort of preprocessing step before the compiler proper gets to look at the code. However that's a somewhat limited perspective, perfectly adequate for languages like C but constraining from our point of view. A better definition of a macro is any sort of substitution or replacement that can happen before the final code is executed.
The real importance of macros is their potential to allow syntactic
extensions to their language. In the case of PScheme, each special
form is a syntactic extension to the language, and so our working
definition of a PScheme macro could be something that allows us to
define our own special forms within the language itself. Here's
an example. Suppose the language lacked the let special form.
As was mentioned in Chapter 5, let shares a
good deal in common with lambda. In fact any let expression,
say
(let ((a 10)
(b 20))
(- b a))
has an equivalent lambda expression, in this case
((lambda (a b) (- b a)) 10 20)
The body of the let is the same as the body of the
lambda, and the bindings of the let are split
between the formal and actual arguments to the lambda
expression. In general any let expression:
(let ((‹var1› ‹val1›)
(‹var2› ‹val2›)
...)
‹expression›)
has an equivalent lambda form:
((lambda (‹var1› ‹var2› ...)
‹expression›)
‹val1› ‹val2› ...)
Of course internally let doesn't make use of closures, but
in the case of the lambda equivalent to let, the lambda
expression is evaluated immediately in the same environment as it
was defined, so closure is immaterial. All that our purported
let macro need do then, is to rewrite its arguments into an
equivalent lambda form and have that executed in its place. We
developed all of the list manipulation tools we will need to do that in the 0.0.5 version
of the
interpreter from Chapter 8 (remember that code and data
are the same list objects so list functions can operate on both). All we need to do now is to think
of a way to allow us to define macros.
Macros will obviously share a great deal in common with functions. They
will have a separate declaration and use. They will also take arguments,
and have a body that is evaluated in some way. In fact the first part of
their implementation, that of parsing their declaration will be virtually
identical to that of lambda expressions, except that the
lambda keyword is already taken. We'll use
“macro” in its place.
macroAs before then, we subclass PScm::SpecialForm and give
the new class an Apply() method. The new class is called
PScm::SpecialForm::Macro after its eponymous symbol. Here's
the Apply() method for PScm::SpecialForm::Macro.
121 sub Apply { 122 my ($self, $form, $env) = @_; 123 my ($args, $body) = $form->value; 124 return PScm::Closure::Macro->new($args, $body, $env); 125 }
It's virtually identical to PScm::SpecialForm::Lambda except
that it creates a new PScm::Closure::Macro
instead of a
PScm::Closure::Function. So we've left the problem of how to make
a macro actually work until last, in the PScm::Closure::Macro's
Apply() method.
Consider how PScm::Closure::Function::Apply() works. It
evaluates its arguments in the passed-in environment then gives
the results to its parent PScm::Closure::_apply() method.
That _apply() method extends the
environment that was captured when the closure was created with bindings
of those actual arguments to its formal arguments. Then it evaluates
its body in that extended environment and returns the result.
Here again is PScm::Closure::Function's Apply()
method:
39 sub Apply { 40 my ($self, $form, $env) = @_; 41 42 my $evaluated_args = $form->map_eval($env); 43 return $self->_apply($evaluated_args); 44 }
And here again is the private _apply() method in the base
PScm::Closure class:
17 sub _apply { 18 my ($self, $args) = @_; 19 20 my $extended_env = 21 $self->{env}->ExtendUnevaluated($self->{args}, $args); 22 return $self->{body}->Eval($extended_env); 23 }
Any implementation of macros will share something in common with this implementation of functions, but there will be differences. Obviously a macro should be passed its arguments unevaluated. That way it can perform whatever (list) operations it likes on that structure. Then when it returns a new form, it is that form that gets evaluated.
In fact it's as simple as that, and here's
the Apply() method for PScm::Closure::Macro:
55 sub Apply { 56 my ($self, $form, $env) = @_; 57 58 my $new_form = $self->_apply($form); 59 return $new_form->Eval($env); 60 }
Compare that with the Apply() method from
PScm::Closure::Function above.
Functions evaluate their arguments, then evaluate their body with those arguments bound. Macros don't evaluate their arguments, they evaluate their body with their unevaluated arguments bound, then they re-evaluate the result. This is quite subtle. Macros perform substitutions on their arguments, but the result of those substitutions must be subsequently evaluated for the macro to have had the desired effect.
To finish off this part of the implementation, we must remember
that in Section 5.3 we made closures printable,
and since macros are a new kind of closure, we must supply the supporting
_symbol() method in PScm::Closure::Macro for the
PScm::Closure::as_string() method to find. This _symbol() method
returns the symbol macro so that if a macro is printed
it will display as
(macro ‹args› ‹body›). Here's the
new PScm::Closure::Macro::_symbol() method.
62 sub _symbol { 63 PScm::Expr::Symbol->new('macro'); 64 }
Just for fun, let's take a look at how we might attack the problem
which introduced this section: implementing let in terms
of lambda. Remember that any let expression has an equivalent
lambda form, so here's a use of macro that translates one into
the other:
(let* ((mylet
(macro (bindings body)
(let* ((names (cars bindings))
(values (cadrs bindings)))
(cons (list (quote lambda) names body) values)))))
(mylet ((a 1)
(b 2))
(list a b)))
This code uses let* (remember we're pretending that we
don't have let) to bind mylet to a macro definition, then
it uses mylet in the body of the let*. It makes use of some
supporting functions that we'll define presently, but first let's try
to get a feel for what it is doing. As stated above, the symbol macro
introduces a macro definition. The arguments to mylet will be the
same as those to let, namely a list of bindings and a body to
execute with those bindings in place. It has to separate the bindings
(symbol-value pairs) into two lists, one of the symbols and one of the
values. It might be useful in the following discussion to refer
to Figure 9.1 which shows the internal structure
of the mylet form that we'll be rearranging.
mylet internal structure
The mylet macro uses a function cars to extract the car of each
binding (the symbol) into the list called names.
Here's the definition of cars:
(letrec (...
(cars
(lambda (lst)
(map car lst)))
...)
...)
It uses another yet to be defined function map, which
does the same as Perl's built in map: it applies a function
to each element of a list and returns a new list of the results23. map is surprisingly easy to implement
in PScheme:
(letrec ((map
(lambda (op lst)
(if lst
(cons (op (car lst))
(map op (cdr lst)))
())))
...)
It's a recursive function, hence the need for letrec to
bind it. Passed a function and list of zero or more bindings,
if the list is empty it returns the empty list, otherwise it cons-es
the result of calling the function on the car of the list with
to the result of calling itself on the rest (cdr) of the list.
So for example if lst is
((a 1) (b 2)), then (map car lst) would return the list
(a b), and that is exactly what the cars function does.
cadrs24 is very similar. It walks the
list collecting the second component of each sublist (the values of
the bindings). So for example given the list ((a 1) (b 2)),
cadrs will return the list (1 2).
(letrec (...
(cadrs
(lambda (lst)
(map (lambda (x) (car (cdr x))) lst)))
...)
...)
Again it makes use of map this time passing it an anonymous
function that will take the car of the cdr of its argument. This is
very much in the style of real Scheme programming now: constructing
lambda expressions on the fly and passing them to other functions as arguments,
I hope you are aquiring a taste for it. Anyway here's the whole mylet
definition plus some code that calls it.
(let* ((mylet
(letrec ((map
(lambda (op lst)
(if lst
(cons (op (car lst))
(map op (cdr lst)))
())))
(cars
(lambda (lst)
(map car lst)))
(cadrs
(lambda (lst)
(map (lambda (x) (car (cdr x))) lst))))
(macro (bindings body)
(let* ((names (cars bindings))
(values (cadrs bindings)))
(cons (list (quote lambda) names body)
values))))))
(mylet ((a 1)
(b 2))
(list a b)))
After collecting the names into one list and the values into
another, the mylet macro builds:
((lambda (‹names›) (‹body›)) ‹values›)
Where ‹names›, ‹body› and ‹values› are expanded using the appropriate magic:
(cons (list (quote lambda) names body) values)
A point worth noting is that the constructed mylet
macro is a true closure, since it has captured the definitions of
the cars and cadrs functions and executes in an outer
environment (the let*) where those functions are not visible.
The macro substitution system demonstrated so far is pretty crude,
after all it requires the programmer to directly manipulate low-level
list structures, rather than just supplying an “example” of how
the transformation is to be performed. In fact the topic of macro
expansion as provided by a full Scheme implementation is deserving of
a book to itself. Apart from the templating ability, there are also issues
of avoiding variable collision (so-called hygenic macros)
so that full Scheme macros are much closer to the idea of
C++'s inline functions than they are to C's
#define25.
However there is one simple addition that we can make, which will
greatly improve the usefulness of macros, and that involves an
extension to the quote special form that we introduced in
Section 8.1. If you remember quote
just returns its argument, preventing unwanted evaluation.
This already has proved useful in the construction of macros, as we have
seen above.
Now one perfect use of quote would be to provide
templates for macros, if we could arrange that parts
of the quoted template could be substituted before the quoted
template is returned. To that purpose we introduce a keyword
unquote which marks a section of a quoted form for
evaluation. Perhaps an example might make this clear:
> (let ((x "rain") > (y "spain") > (z "plain")) > (quote > (the (unquote x) > in (unquote y) > falls mainly on the (unquote z)))) (the "rain" in "spain" falls mainly on the "plain")
The let bindings bind x to the string "rain"
etc. That is not the important part. The important part is the body of the
let where the use of the unquote keyword allows evaluation
of the contained expressions (x etc.) despite their being inside
a quote.
How can this help us with macro definitions? Well in a big way!
consider this macro definition of a while loop:
(define while
(macro (test body)
(quote (letrec
((loop
(lambda ()
(if (unquote test)
(begin
(unquote body)
(loop))
()))))
(loop)))))
It uses a few features that aren't available yet, like
define and begin (which just executes one
expression after another), and it would seem to be in danger
of running out of stack, but I hope you can see that
essentially the quote and unquote
are doing all of the work building the body of the macro.
The quoted result is shown in bold, with the internal
substitutions unbolded again.
Implementing unquote is easy, but it's
a little different from the normal special forms and
primitives we've seen up to now. I've been careful to
only refer to it as a “keyword”, because it means
nothing special outside of a quoted expression.
We'll obviously have to change the way quote
works to make this happen, so lets start by looking at
the changed PScm::SpecialForm::Quote::Apply().
132 sub Apply { 133 my ($self, $form, $env) = @_; 134 return $form->first->Quote($env); 135 }
Rather than just returning its first argument, it now calls
a new method Quote() on it, passing Quote()
the current environment. Quote()
essentially just makes a copy of the expressions concerned,
but it keeps an eye out for unquote symbols.
Now this method
will be implemented in the PScm::Expr classes as follows:
The default Quote() in PScm::Expr
just returns $self:
41 sub Quote { $_[0] }
The Quote() in PScm::Expr::List::Pair
is where most of the decision making happens.
133 sub Quote { 134 my ($self, $env) = @_; 135 if ($self->[FIRST]->is_unquote) { 136 return $self->[REST]->first->Eval($env); 137 } else { 138 return $self->quote_rest($env); 139 } 140 }
On
Line 135
it checks to see if the first element of the list is
the symbol unquote (is_unquote.)
If it is then it evaluates the second element
in the current environment and returns it. If the first
element
is
not
unquote
then it hands over control to a helper routine
quote_rest().
Here's quote_rest().
142 sub quote_rest { 143 my ($self, $env) = @_; 144 return $self->Cons( 145 $self->[FIRST]->Quote($env), 146 $self->[REST]->quote_rest($env) 147 ); 148 }
It just walks the list, recursively, constructing a copy
as it goes by calling Quote() on each
element and calling Cons() on the quoted
subexpression and the result of the recursive call26.
The PScm::Expr::List::Null package
inherits Quote() from PScm::Expr,
which just returns $self,
and PScm::Expr also has a
quote_rest() method which also just returns
$self and usefully terminates the recursion
of the non-empty PScm::Expr::List quote_rest()
method.
43 sub quote_rest { $_[0] }
That just leaves that is_unquote()
method. Well since only a symbol could possibly be
unquote, we can put a default is_unquote()
method at the top of the expression type hierachy, in
PScm::Expr, which just returns false:
12 sub is_unquote { 0 }
Then for PScm::Expr::Symbol only, we override that
with a method that checks to see if its value()
is the string "unquote":
182 sub is_unquote { 183 my ($self) = @_; 184 return $self->value eq "unquote"; 185 }
That completes our re-implementation of quote
to allow the recognition of the unquote keyword,
but we're not quite done yet.
quote and unquote turn out to be so useful in
the definition of macros that PScheme provides shorthand
syntactic sugar for these forms. The construct
'‹expression› (note the single quote) gets expanded to
(quote ‹expression›), and similarily
the construct ,‹expression› with a leading comma
gets expanded to (unquote ‹expression›).
This is fairly easy to do, so let's see what changes we
need to make to the reader to make this happen.
First here's the changes to PScm::Read::_next_token().
66 sub _next_token { 67 my ($self) = @_; 68 69 while (!$self->{Line}) { 70 $self->{Line} = $self->{FileHandle}->getline(); 71 return undef unless defined $self->{Line}; 72 $self->{Line} =~ s/^\s+//s; 73 } 74 75 for ($self->{Line}) { 76 s/^\(\s*// && return PScm::Token::Open->new(); 77 s/^\)\s*// && return PScm::Token::Close->new(); 78 s/^\'\s*// && return PScm::Token::Quote->new(); 79 s/^\,\s*// && return PScm::Token::Unquote->new(); 80 s/^\.\s*// && return PScm::Token::Dot->new(); 81 s/^([-+]?\d+)\s*// 82 && return PScm::Expr::Number->new($1); 83 s/^"((?:(?:\\.)|([^"]))*)"\s*// && do { 84 my $string = $1; 85 $string =~ s/\\//g; 86 return PScm::Expr::String->new($string); 87 }; 88 s/^([^\s\(\)]+)\s*// 89 && return PScm::Expr::Symbol->new($1); 90 } 91 die "can't parse: $self->{Line}"; 92 }
The change is very simple. You can see that on
Lines 78–79
if it strips a leading quote or comma, it returns an equivalent token object.
Those new token types are both in PScm/Token.pm, here's
PScm::Token::Quote.
29 package PScm::Token::Quote; 30 31 use base qw(PScm::Token); 32 33 sub is_quote_token { 1 }
It inherits from PScm::Token and a default
is_quote_token() there returns false.
The equivalent PScm::Token::Unquote deliberately
inherits from PScm::Token::Quote rather than just
PScm::Token so it gets the overridden is_quote_token()
method, and supplies an additional is_unquote_token() method
returning true. Again a default is_unquote_token() in
PScm::Token returns false.
36 package PScm::Token::Unquote; 37 38 use base qw(PScm::Token::Quote); 39 40 sub is_unquote_token { 1 }
The upshot of this is that both PScm::Token::Quote
and PScm::Token::Unquote return true for is_quote_token(),
but only PScm::Token::Unquote returns true for
is_unquote_token(). Finally, let's see how the reader
PScm::Read::Read() makes use of these new token objects.
17 sub Read { 18 my ($self) = @_; 19 20 my $token = $self->_next_token(); 21 return undef unless defined $token; 22 23 if ($token->is_quote_token) { 24 my $expr = $self->Read; 25 die "syntax Error" 26 unless defined($expr) && $expr->is_expr; 27 return new PScm::Expr::List( 28 $token->is_unquote_token 29 ? new PScm::Expr::Symbol('unquote') 30 : new PScm::Expr::Symbol('quote'), 31 $expr 32 ); 33 } 34 35 if ($token->is_open_token) { 36 return $self->read_list(); 37 } else { 38 return $token; 39 } 40 }
The additional code on
Lines 23–33
checks to see if the token is a quote or unquote token, and if so
reads the next expression, checks that it is valid and returns a new
PScm::Expr::List containing the appropriate quote
or unquote symbol and the expression read afterwards.
The is_expr() method is defined to be true in PScm::Expr
and false in PScm::Token, and its use here stops dubious
constructs like “')”.
So we now have a convenient shorthand for quote and
unquote. To demonstrate it in action, here's that while
macro again, this time using the new tokens.
(define while
(macro (test body)
'(letrec
((loop
(lambda ()
(if ,test
(begin
,body
(loop))
()))))
(loop))))
We'll be making significant use of macro,
quote and unquote in subsequent chapters,
so it's worth familiarizing yourself now with this new
idiom27.
Sometimes given a quoted expression, you'd really just like to have
it evaluated. It may have been passed as an argument, or it may have
been constructed in some other way. What you need is another round of
evaluation. This is supplied by the special form eval.
eval really does do just that. For example:
> (eval '(* 2 2)) 4
The quote stopped the first round of evaluation, but eval
got another try at it. Here's another example:
> (eval (list '* 4 4)) 16
eval is quite simple. It is a special form because it needs
access to an environment in which to perform the evaluation (remember
primitives have their arguments evaluated for them and so don't need an
environment.) It evaluates its first argument in the current environment
(special forms don't have their arguments evaluated for them,)
then it evaluates the result a second time, this time in the top-level
environment. Here's PScm::SpecialForm::Eval:
93 package PScm::SpecialForm::Eval; 94 95 use base qw(PScm::SpecialForm); 96 97 sub Apply { 98 my ($self, $form, $env) = @_; 99 $form->first()->Eval($env)->Eval($env->top); 100 }
You can see that the second round of evaluation is done in the
context of the top-level environment obtained by calling a new
method top() on the current environment. That top()
method is also very simple:
14 sub top { 15 my ($self) = @_; 16 if ($self->{parent}) { 17 return $self->{parent}->top; 18 } else { 19 return $self; 20 } 21 }
It just checks to see if it has a parent, calling top()
recursively on that if it has, and returning itself if it hasn't.
One thing to watch out for with eval: the code that
is evaluated is not a closure. Any variables in that code
will be looked up in the top-level environment, not the one where
the expression was constructed, nor the one that is current
when eval is called. For example:
> (let ((* -)) > (* 3 3)) 0 > (let ((* -)) > (eval '(* 3 3))) 9
Nonetheless eval is a useful tool in your kit, we'll
see it in action in later chapters.
Here's the additions to ReadEvalPrint()
which bind our new macro feature and eval in the
initial environment.
The quote binding was already there, and as shown
above, unquote is only a keyword and does not need
a binding:
31 sub ReadEvalPrint { 32 my ($infh, $outfh) = @_; 33 34 $outfh ||= new FileHandle(">-"); 35 my $reader = new PScm::Read($infh); 36 while (defined(my $expr = $reader->Read)) { 37 my $result = $expr->Eval( 38 new PScm::Env( 39 let => new PScm::SpecialForm::Let(), 40 '*' => new PScm::Primitive::Multiply(), 41 '-' => new PScm::Primitive::Subtract(), 42 if => new PScm::SpecialForm::If(), 43 lambda => new PScm::SpecialForm::Lambda(), 44 list => new PScm::Primitive::List(), 45 car => new PScm::Primitive::Car(), 46 cdr => new PScm::Primitive::Cdr(), 47 cons => new PScm::Primitive::Cons(), 48 letrec => new PScm::SpecialForm::LetRec(), 49 'let*' => new PScm::SpecialForm::LetStar(), 50 eval => new PScm::SpecialForm::Eval(), 51 macro => new PScm::SpecialForm::Macro(), 52 quote => new PScm::SpecialForm::Quote(), 53 ) 54 ); 55 $result->Print($outfh); 56 } 57 }
The tests for macro and unquote are in
Listing 18.
t/PScm_Macro.t1 use strict; 2 use warnings; 3 use Test::More; 4 use lib 't/lib'; 5 use PScm::Test tests => 5; 6 7 BEGIN { use_ok('PScm') } 8 9 eval_ok(<<EOF, '(1 2)', 'macros'); 10 (let* ((mylet 11 (letrec ((map 12 (lambda (op lst) 13 (if lst 14 (cons (op (car lst)) 15 (map op (cdr lst))) 16 ()))) 17 (cars 18 (lambda (lst) 19 (map car lst))) 20 (cadrs 21 (lambda (lst) 22 (map (lambda (x) (car (cdr x))) lst)))) 23 (macro (bindings body) 24 (let* ((names (cars bindings)) 25 (values (cadrs bindings))) 26 (cons (list (quote lambda) names body) 27 values)))))) 28 (mylet ((a 1) 29 (b 2)) 30 (list a b))) 31 EOF 32 33 eval_ok(<<EOF, <<EOR, 'unquote'); 34 (let ((x (quote rain)) 35 (y (quote spain)) 36 (z (quote plain))) 37 (quote (the (unquote x) 38 in (unquote y) 39 falls mainly on the 40 (unquote z)))) 41 EOF 42 (the rain in spain falls mainly on the plain) 43 EOR 44 45 eval_ok(<<EOF, <<EOR, 'quote and unquote syntactic sugar'); 46 (let ((x 'rain) 47 (y 'spain) 48 (z 'plain)) 49 '(the ,x 50 in ,y 51 falls mainly on the 52 ,z)) 53 EOF 54 (the rain in spain falls mainly on the plain) 55 EOR 56 57 eval_ok(<<EOF, <<EOR, 'macro to string'); 58 (macro (x) 59 '(a ,x)) 60 EOF 61 (macro (x) (quote (a (unquote x)))) 62 EOR 63 64 # vim: ft=perl
The first test just implements and tests the mylet example
we worked through in the text, and the second test shows unquote
in action with a variation on another example we've already seen.
The third test exercises the syntax extensions in the reader, and the
fourth test demonstrates that macros, like closures, produce a textual
representation of themselves when printed.
The tests for eval are in
Listing 19.
This just does a simple evaluation of a quoted form.
t/PScm_Eval.t1 use strict; 2 use warnings; 3 use Test::More; 4 use lib 't/lib'; 5 use PScm::Test tests => 3; 6 7 BEGIN { use_ok('PScm') } 8 9 eval_ok(<<EOT, <<EOR, 'eval'); 10 (eval '(* 2 2)) 11 EOT 12 4 13 EOR 14 15 eval_ok(<<EOT, <<EOR, 'eval operates in the top-level environment'); 16 (let ((* -)) 17 (eval '(* 3 3))) 18 EOT 19 9 20 EOR 21 22 # vim: ft=perl
Full source code for this version of the interpreter is available athttp://billhails.net/Book/releases/PScm-0.0.6.tgz
Perl actually borrows its map function from Lisp, which has had one for many years.
The term cadr is a contraction
of “car of the cdr” e.g. (cadr x) ==
(car (cdr x)). this sort of contraction is often seen in
scheme code, sometimes nested as much as four or five levels deep,
i.e. cadadr.
A full scheme implementation provides an extend-syntax
special form. Using extend-syntax, defining mylet
is as simple as:
(extend-syntax (mylet)
(mylet ((var val) ...) body)
((lambda (var ...) body) val ...))
Note the similarity between this method and the definition
of map in Pscheme above.
The quote and unquote described here are done
differently in true Scheme. A true Scheme implementation
distinguishes between a simple quote which does
not recognize unquote, and an alternative
quasiquote which does. This means quote
is as efficient as our original implementation, but we still
have access to an unquote mechanism. The quote
form still has the “'” syntactic sugar, and
quasiquote uses the alternative “`”
(backtick) shorthand. Additionally a full Scheme provides
an unquote-splicing (“,@”) which
expects a list and splices it into the existing form
at that point.