What is a macro? People familiar with the C programming language will probably think of macros as being purely a textual substitution mechanism done in some sort of preprocessing step before the compiler proper gets to look at the code. However that's a somewhat limited perspective, perfectly adequate for languages like C but constraining from our point of view. A better definition of a macro is any sort of substitution or replacement that can happen before the final code is executed.
The real importance of macros is their potential to allow syntactic
extensions to their language. In the case of PScheme, each special
form is a syntactic extension to the language, and so our working
definition of a PScheme macro could be something that allows us to
define our own special forms within the language itself. Here's
an example. Suppose the language lacked the let
special form.
As was mentioned in Chapter 5, let
shares a
good deal in common with lambda
. In fact any let
expression,
say
(let ((a 10) (b 20)) (- b a))
has an equivalent lambda
expression, in this case
((lambda (a b) (- b a)) 10 20)
The body of the let
is the same as the body of the
lambda
, and the bindings of the let
are split
between the formal and actual arguments to the lambda
expression. In general any let
expression:
(let ((‹var1› ‹val1›) (‹var2› ‹val2›) ...) ‹expression›)
has an equivalent lambda
form:
((lambda (‹var1› ‹var2› ...) ‹expression›) ‹val1› ‹val2› ...)
Of course internally let
doesn't make use of closures, but
in the case of the lambda
equivalent to let
, the lambda
expression is evaluated immediately in the same environment as it
was defined, so closure is immaterial. All that our purported
let
macro need do then, is to rewrite its arguments into an
equivalent lambda
form and have that executed in its place. We
developed all of the list manipulation tools we will need to do that in the 0.0.5 version
of the
interpreter from Chapter 8 (remember that code and data
are the same list objects so list functions can operate on both). All we need to do now is to think
of a way to allow us to define macros.
Macros will obviously share a great deal in common with functions. They
will have a separate declaration and use. They will also take arguments,
and have a body that is evaluated in some way. In fact the first part of
their implementation, that of parsing their declaration will be virtually
identical to that of lambda
expressions, except that the
lambda
keyword is already taken. We'll use
“macro” in its place.
macro
As before then, we subclass PScm::SpecialForm and give
the new class an Apply()
method. The new class is called
PScm::SpecialForm::Macro after its eponymous symbol. Here's
the Apply()
method for PScm::SpecialForm::Macro.
121 sub Apply { 122 my ($self, $form, $env) = @_; 123 my ($args, $body) = $form->value; 124 return PScm::Closure::Macro->new($args, $body, $env); 125 }
It's virtually identical to PScm::SpecialForm::Lambda except
that it creates a new PScm::Closure::Macro
instead of a
PScm::Closure::Function. So we've left the problem of how to make
a macro actually work until last, in the PScm::Closure::Macro's
Apply()
method.
Consider how PScm::Closure::Function::Apply()
works. It
evaluates its arguments in the passed-in environment then gives
the results to its parent PScm::Closure::_apply()
method.
That _apply()
method extends the
environment that was captured when the closure was created with bindings
of those actual arguments to its formal arguments. Then it evaluates
its body in that extended environment and returns the result.
Here again is PScm::Closure::Function's Apply()
method:
39 sub Apply { 40 my ($self, $form, $env) = @_; 41 42 my $evaluated_args = $form->map_eval($env); 43 return $self->_apply($evaluated_args); 44 }
And here again is the private _apply()
method in the base
PScm::Closure class:
17 sub _apply { 18 my ($self, $args) = @_; 19 20 my $extended_env = 21 $self->{env}->ExtendUnevaluated($self->{args}, $args); 22 return $self->{body}->Eval($extended_env); 23 }
Any implementation of macros will share something in common with this implementation of functions, but there will be differences. Obviously a macro should be passed its arguments unevaluated. That way it can perform whatever (list) operations it likes on that structure. Then when it returns a new form, it is that form that gets evaluated.
In fact it's as simple as that, and here's
the Apply()
method for PScm::Closure::Macro:
55 sub Apply { 56 my ($self, $form, $env) = @_; 57 58 my $new_form = $self->_apply($form); 59 return $new_form->Eval($env); 60 }
Compare that with the Apply()
method from
PScm::Closure::Function above.
Functions evaluate their arguments, then evaluate their body with those arguments bound. Macros don't evaluate their arguments, they evaluate their body with their unevaluated arguments bound, then they re-evaluate the result. This is quite subtle. Macros perform substitutions on their arguments, but the result of those substitutions must be subsequently evaluated for the macro to have had the desired effect.
To finish off this part of the implementation, we must remember
that in Section 5.3 we made closures printable,
and since macros are a new kind of closure, we must supply the supporting
_symbol()
method in PScm::Closure::Macro for the
PScm::Closure::as_string()
method to find. This _symbol()
method
returns the symbol macro
so that if a macro is printed
it will display as
(macro ‹args› ‹body›)
. Here's the
new PScm::Closure::Macro::_symbol()
method.
62 sub _symbol { 63 PScm::Expr::Symbol->new('macro'); 64 }
Just for fun, let's take a look at how we might attack the problem
which introduced this section: implementing let
in terms
of lambda
. Remember that any let
expression has an equivalent
lambda
form, so here's a use of macro
that translates one into
the other:
(let* ((mylet (macro (bindings body) (let* ((names (cars bindings)) (values (cadrs bindings))) (cons (list (quote lambda) names body) values))))) (mylet ((a 1) (b 2)) (list a b)))
This code uses let*
(remember we're pretending that we
don't have let
) to bind mylet
to a macro
definition, then
it uses mylet
in the body of the let*
. It makes use of some
supporting functions that we'll define presently, but first let's try
to get a feel for what it is doing. As stated above, the symbol macro
introduces a macro definition. The arguments to mylet
will be the
same as those to let
, namely a list of bindings and a body to
execute with those bindings in place. It has to separate the bindings
(symbol-value pairs) into two lists, one of the symbols and one of the
values. It might be useful in the following discussion to refer
to Figure 9.1 which shows the internal structure
of the mylet
form that we'll be rearranging.
mylet
internal structureThe mylet
macro uses a function cars
to extract the car of each
binding (the symbol) into the list called names.
Here's the definition of cars
:
(letrec (... (cars (lambda (lst) (map car lst))) ...) ...)
It uses another yet to be defined function map
, which
does the same as Perl's built in map
: it applies a function
to each element of a list and returns a new list of the results23. map
is surprisingly easy to implement
in PScheme:
(letrec ((map (lambda (op lst) (if lst (cons (op (car lst)) (map op (cdr lst))) ()))) ...)
It's a recursive function, hence the need for letrec
to
bind it. Passed a function and list of zero or more bindings,
if the list is empty it returns the empty list, otherwise it cons
-es
the result of calling the function on the car
of the list with
to the result of calling itself on the rest (cdr
) of the list.
So for example if lst
is
((a 1) (b 2))
, then (map car lst)
would return the list
(a b)
, and that is exactly what the cars
function does.
cadrs
24 is very similar. It walks the
list collecting the second component of each sublist (the values of
the bindings). So for example given the list ((a 1) (b 2))
,
cadrs
will return the list (1 2)
.
(letrec (... (cadrs (lambda (lst) (map (lambda (x) (car (cdr x))) lst))) ...) ...)
Again it makes use of map
this time passing it an anonymous
function that will take the car
of the cdr
of its argument. This is
very much in the style of real Scheme programming now: constructing
lambda expressions on the fly and passing them to other functions as arguments,
I hope you are aquiring a taste for it. Anyway here's the whole mylet
definition plus some code that calls it.
(let* ((mylet (letrec ((map (lambda (op lst) (if lst (cons (op (car lst)) (map op (cdr lst))) ()))) (cars (lambda (lst) (map car lst))) (cadrs (lambda (lst) (map (lambda (x) (car (cdr x))) lst)))) (macro (bindings body) (let* ((names (cars bindings)) (values (cadrs bindings))) (cons (list (quote lambda) names body) values)))))) (mylet ((a 1) (b 2)) (list a b)))
After collecting the names into one list and the values into
another, the mylet
macro builds:
((lambda (‹names›) (‹body›)) ‹values›)
Where ‹names›, ‹body› and ‹values› are expanded using the appropriate magic:
(cons (list (quote lambda) names body) values)
A point worth noting is that the constructed mylet
macro is a true closure, since it has captured the definitions of
the cars
and cadrs
functions and executes in an outer
environment (the let*
) where those functions are not visible.
The macro substitution system demonstrated so far is pretty crude,
after all it requires the programmer to directly manipulate low-level
list structures, rather than just supplying an “example” of how
the transformation is to be performed. In fact the topic of macro
expansion as provided by a full Scheme implementation is deserving of
a book to itself. Apart from the templating ability, there are also issues
of avoiding variable collision (so-called hygenic macros)
so that full Scheme macros are much closer to the idea of
C++'s inline
functions than they are to C's
#define
25.
However there is one simple addition that we can make, which will
greatly improve the usefulness of macros, and that involves an
extension to the quote
special form that we introduced in
Section 8.1. If you remember quote
just returns its argument, preventing unwanted evaluation.
This already has proved useful in the construction of macros, as we have
seen above.
Now one perfect use of quote
would be to provide
templates for macros, if we could arrange that parts
of the quoted template could be substituted before the quoted
template is returned. To that purpose we introduce a keyword
unquote which marks a section of a quoted form for
evaluation. Perhaps an example might make this clear:
> (let ((x "rain") > (y "spain") > (z "plain")) > (quote > (the (unquote x) > in (unquote y) > falls mainly on the (unquote z)))) (the "rain" in "spain" falls mainly on the "plain")
The let
bindings bind x
to the string "rain"
etc. That is not the important part. The important part is the body of the
let
where the use of the unquote
keyword allows evaluation
of the contained expressions (x
etc.) despite their being inside
a quote
.
How can this help us with macro definitions? Well in a big way!
consider this macro definition of a while
loop:
(define while (macro (test body) (quote (letrec ((loop (lambda () (if (unquote test) (begin (unquote body) (loop)) ())))) (loop)))))
It uses a few features that aren't available yet, like
define
and begin
(which just executes one
expression after another), and it would seem to be in danger
of running out of stack, but I hope you can see that
essentially the quote
and unquote
are doing all of the work building the body of the macro.
The quoted result is shown in bold, with the internal
substitutions unbolded again.
Implementing unquote
is easy, but it's
a little different from the normal special forms and
primitives we've seen up to now. I've been careful to
only refer to it as a “keyword”, because it means
nothing special outside of a quoted expression.
We'll obviously have to change the way quote
works to make this happen, so lets start by looking at
the changed PScm::SpecialForm::Quote::Apply()
.
132 sub Apply { 133 my ($self, $form, $env) = @_; 134 return $form->first->Quote($env); 135 }
Rather than just returning its first argument, it now calls
a new method Quote()
on it, passing Quote()
the current environment. Quote()
essentially just makes a copy of the expressions concerned,
but it keeps an eye out for unquote
symbols.
Now this method
will be implemented in the PScm::Expr classes as follows:
The default Quote()
in PScm::Expr
just returns $self
:
41 sub Quote { $_[0] }
The Quote()
in PScm::Expr::List::Pair
is where most of the decision making happens.
133 sub Quote { 134 my ($self, $env) = @_; 135 if ($self->[FIRST]->is_unquote) { 136 return $self->[REST]->first->Eval($env); 137 } else { 138 return $self->quote_rest($env); 139 } 140 }
On
Line 135
it checks to see if the first element of the list is
the symbol unquote
(is_unquote
.)
If it is then it evaluates the second element
in the current environment and returns it. If the first
element
is
not
unquote
then it hands over control to a helper routine
quote_rest()
.
Here's quote_rest()
.
142 sub quote_rest { 143 my ($self, $env) = @_; 144 return $self->Cons( 145 $self->[FIRST]->Quote($env), 146 $self->[REST]->quote_rest($env) 147 ); 148 }
It just walks the list, recursively, constructing a copy
as it goes by calling Quote()
on each
element and calling Cons()
on the quoted
subexpression and the result of the recursive call26.
The PScm::Expr::List::Null package
inherits Quote()
from PScm::Expr,
which just returns $self
,
and PScm::Expr also has a
quote_rest()
method which also just returns
$self
and usefully terminates the recursion
of the non-empty PScm::Expr::List quote_rest()
method.
43 sub quote_rest { $_[0] }
That just leaves that is_unquote()
method. Well since only a symbol could possibly be
unquote
, we can put a default is_unquote()
method at the top of the expression type hierachy, in
PScm::Expr, which just returns false:
12 sub is_unquote { 0 }
Then for PScm::Expr::Symbol only, we override that
with a method that checks to see if its value()
is the string "unquote"
:
182 sub is_unquote { 183 my ($self) = @_; 184 return $self->value eq "unquote"; 185 }
That completes our re-implementation of quote
to allow the recognition of the unquote
keyword,
but we're not quite done yet.
quote
and unquote
turn out to be so useful in
the definition of macros that PScheme provides shorthand
syntactic sugar for these forms. The construct
'‹expression›
(note the single quote) gets expanded to
(quote ‹expression›)
, and similarily
the construct ,‹expression›
with a leading comma
gets expanded to (unquote ‹expression›)
.
This is fairly easy to do, so let's see what changes we
need to make to the reader to make this happen.
First here's the changes to PScm::Read::_next_token()
.
66 sub _next_token { 67 my ($self) = @_; 68 69 while (!$self->{Line}) { 70 $self->{Line} = $self->{FileHandle}->getline(); 71 return undef unless defined $self->{Line}; 72 $self->{Line} =~ s/^\s+//s; 73 } 74 75 for ($self->{Line}) { 76 s/^\(\s*// && return PScm::Token::Open->new(); 77 s/^\)\s*// && return PScm::Token::Close->new(); 78 s/^\'\s*// && return PScm::Token::Quote->new(); 79 s/^\,\s*// && return PScm::Token::Unquote->new(); 80 s/^\.\s*// && return PScm::Token::Dot->new(); 81 s/^([-+]?\d+)\s*// 82 && return PScm::Expr::Number->new($1); 83 s/^"((?:(?:\\.)|([^"]))*)"\s*// && do { 84 my $string = $1; 85 $string =~ s/\\//g; 86 return PScm::Expr::String->new($string); 87 }; 88 s/^([^\s\(\)]+)\s*// 89 && return PScm::Expr::Symbol->new($1); 90 } 91 die "can't parse: $self->{Line}"; 92 }
The change is very simple. You can see that on
Lines 78–79
if it strips a leading quote or comma, it returns an equivalent token object.
Those new token types are both in PScm/Token.pm
, here's
PScm::Token::Quote.
29 package PScm::Token::Quote; 30 31 use base qw(PScm::Token); 32 33 sub is_quote_token { 1 }
It inherits from PScm::Token and a default
is_quote_token()
there returns false.
The equivalent PScm::Token::Unquote deliberately
inherits from PScm::Token::Quote rather than just
PScm::Token so it gets the overridden is_quote_token()
method, and supplies an additional is_unquote_token()
method
returning true. Again a default is_unquote_token()
in
PScm::Token returns false.
36 package PScm::Token::Unquote; 37 38 use base qw(PScm::Token::Quote); 39 40 sub is_unquote_token { 1 }
The upshot of this is that both PScm::Token::Quote
and PScm::Token::Unquote return true for is_quote_token()
,
but only PScm::Token::Unquote returns true for
is_unquote_token()
. Finally, let's see how the reader
PScm::Read::Read()
makes use of these new token objects.
17 sub Read { 18 my ($self) = @_; 19 20 my $token = $self->_next_token(); 21 return undef unless defined $token; 22 23 if ($token->is_quote_token) { 24 my $expr = $self->Read; 25 die "syntax Error" 26 unless defined($expr) && $expr->is_expr; 27 return new PScm::Expr::List( 28 $token->is_unquote_token 29 ? new PScm::Expr::Symbol('unquote') 30 : new PScm::Expr::Symbol('quote'), 31 $expr 32 ); 33 } 34 35 if ($token->is_open_token) { 36 return $self->read_list(); 37 } else { 38 return $token; 39 } 40 }
The additional code on
Lines 23–33
checks to see if the token is a quote or unquote token, and if so
reads the next expression, checks that it is valid and returns a new
PScm::Expr::List containing the appropriate quote
or unquote
symbol and the expression read afterwards.
The is_expr()
method is defined to be true in PScm::Expr
and false in PScm::Token, and its use here stops dubious
constructs like “')
”.
So we now have a convenient shorthand for quote
and
unquote
. To demonstrate it in action, here's that while
macro again, this time using the new tokens.
(define while (macro (test body) '(letrec ((loop (lambda () (if ,test (begin ,body (loop)) ())))) (loop))))
We'll be making significant use of macro
,
quote
and unquote
in subsequent chapters,
so it's worth familiarizing yourself now with this new
idiom27.
Sometimes given a quoted expression, you'd really just like to have
it evaluated. It may have been passed as an argument, or it may have
been constructed in some other way. What you need is another round of
evaluation. This is supplied by the special form eval.
eval
really does do just that. For example:
> (eval '(* 2 2)) 4
The quote stopped the first round of evaluation, but eval
got another try at it. Here's another example:
> (eval (list '* 4 4)) 16
eval
is quite simple. It is a special form because it needs
access to an environment in which to perform the evaluation (remember
primitives have their arguments evaluated for them and so don't need an
environment.) It evaluates its first argument in the current environment
(special forms don't have their arguments evaluated for them,)
then it evaluates the result a second time, this time in the top-level
environment. Here's PScm::SpecialForm::Eval:
93 package PScm::SpecialForm::Eval; 94 95 use base qw(PScm::SpecialForm); 96 97 sub Apply { 98 my ($self, $form, $env) = @_; 99 $form->first()->Eval($env)->Eval($env->top); 100 }
You can see that the second round of evaluation is done in the
context of the top-level environment obtained by calling a new
method top()
on the current environment. That top()
method is also very simple:
14 sub top { 15 my ($self) = @_; 16 if ($self->{parent}) { 17 return $self->{parent}->top; 18 } else { 19 return $self; 20 } 21 }
It just checks to see if it has a parent, calling top()
recursively on that if it has, and returning itself if it hasn't.
One thing to watch out for with eval
: the code that
is evaluated is not a closure. Any variables in that code
will be looked up in the top-level environment, not the one where
the expression was constructed, nor the one that is current
when eval
is called. For example:
> (let ((* -)) > (* 3 3)) 0 > (let ((* -)) > (eval '(* 3 3))) 9
Nonetheless eval
is a useful tool in your kit, we'll
see it in action in later chapters.
Here's the additions to ReadEvalPrint()
which bind our new macro feature and eval
in the
initial environment.
The quote
binding was already there, and as shown
above, unquote
is only a keyword and does not need
a binding:
31 sub ReadEvalPrint { 32 my ($infh, $outfh) = @_; 33 34 $outfh ||= new FileHandle(">-"); 35 my $reader = new PScm::Read($infh); 36 while (defined(my $expr = $reader->Read)) { 37 my $result = $expr->Eval( 38 new PScm::Env( 39 let => new PScm::SpecialForm::Let(), 40 '*' => new PScm::Primitive::Multiply(), 41 '-' => new PScm::Primitive::Subtract(), 42 if => new PScm::SpecialForm::If(), 43 lambda => new PScm::SpecialForm::Lambda(), 44 list => new PScm::Primitive::List(), 45 car => new PScm::Primitive::Car(), 46 cdr => new PScm::Primitive::Cdr(), 47 cons => new PScm::Primitive::Cons(), 48 letrec => new PScm::SpecialForm::LetRec(), 49 'let*' => new PScm::SpecialForm::LetStar(), 50 eval => new PScm::SpecialForm::Eval(), 51 macro => new PScm::SpecialForm::Macro(), 52 quote => new PScm::SpecialForm::Quote(), 53 ) 54 ); 55 $result->Print($outfh); 56 } 57 }
The tests for macro
and unquote
are in
Listing 18.
t/PScm_Macro.t
1 use strict; 2 use warnings; 3 use Test::More; 4 use lib 't/lib'; 5 use PScm::Test tests => 5; 6 7 BEGIN { use_ok('PScm') } 8 9 eval_ok(<<EOF, '(1 2)', 'macros'); 10 (let* ((mylet 11 (letrec ((map 12 (lambda (op lst) 13 (if lst 14 (cons (op (car lst)) 15 (map op (cdr lst))) 16 ()))) 17 (cars 18 (lambda (lst) 19 (map car lst))) 20 (cadrs 21 (lambda (lst) 22 (map (lambda (x) (car (cdr x))) lst)))) 23 (macro (bindings body) 24 (let* ((names (cars bindings)) 25 (values (cadrs bindings))) 26 (cons (list (quote lambda) names body) 27 values)))))) 28 (mylet ((a 1) 29 (b 2)) 30 (list a b))) 31 EOF 32 33 eval_ok(<<EOF, <<EOR, 'unquote'); 34 (let ((x (quote rain)) 35 (y (quote spain)) 36 (z (quote plain))) 37 (quote (the (unquote x) 38 in (unquote y) 39 falls mainly on the 40 (unquote z)))) 41 EOF 42 (the rain in spain falls mainly on the plain) 43 EOR 44 45 eval_ok(<<EOF, <<EOR, 'quote and unquote syntactic sugar'); 46 (let ((x 'rain) 47 (y 'spain) 48 (z 'plain)) 49 '(the ,x 50 in ,y 51 falls mainly on the 52 ,z)) 53 EOF 54 (the rain in spain falls mainly on the plain) 55 EOR 56 57 eval_ok(<<EOF, <<EOR, 'macro to string'); 58 (macro (x) 59 '(a ,x)) 60 EOF 61 (macro (x) (quote (a (unquote x)))) 62 EOR 63 64 # vim: ft=perl
The first test just implements and tests the mylet
example
we worked through in the text, and the second test shows unquote
in action with a variation on another example we've already seen.
The third test exercises the syntax extensions in the reader, and the
fourth test demonstrates that macros, like closures, produce a textual
representation of themselves when printed.
The tests for eval
are in
Listing 19.
This just does a simple evaluation of a quoted form.
t/PScm_Eval.t
1 use strict; 2 use warnings; 3 use Test::More; 4 use lib 't/lib'; 5 use PScm::Test tests => 3; 6 7 BEGIN { use_ok('PScm') } 8 9 eval_ok(<<EOT, <<EOR, 'eval'); 10 (eval '(* 2 2)) 11 EOT 12 4 13 EOR 14 15 eval_ok(<<EOT, <<EOR, 'eval operates in the top-level environment'); 16 (let ((* -)) 17 (eval '(* 3 3))) 18 EOT 19 9 20 EOR 21 22 # vim: ft=perl
Full source code for this version of the interpreter is available athttp://billhails.net/Book/releases/PScm-0.0.6.tgz
Perl actually borrows its map function from Lisp, which has had one for many years.
The term cadr
is a contraction
of “car
of the cdr
” e.g. (cadr x) ==
(car (cdr x))
. this sort of contraction is often seen in
scheme code, sometimes nested as much as four or five levels deep,
i.e. cadadr
.
A full scheme implementation provides an extend-syntax
special form. Using extend-syntax
, defining mylet
is as simple as:
(extend-syntax (mylet) (mylet ((var val) ...) body) ((lambda (var ...) body) val ...))
Note the similarity between this method and the definition
of map
in Pscheme above.
The quote and unquote described here are done
differently in true Scheme. A true Scheme implementation
distinguishes between a simple quote
which does
not recognize unquote
, and an alternative
quasiquote
which does. This means quote
is as efficient as our original implementation, but we still
have access to an unquote
mechanism. The quote
form still has the “'
” syntactic sugar, and
quasiquote
uses the alternative “`
”
(backtick) shorthand. Additionally a full Scheme provides
an unquote-splicing
(“,@
”) which
expects a list and splices it into the existing form
at that point.