I had these goals while writing the Arc macro expander:
fn
, if
, quote
, assign
,
and function calls.
And 28 days of debugging later... :-)
In Arc3.1, the macro expander is written in Scheme as part of the compiler, which means that it needs to worry about the difference between Arc lists and Scheme lists:
; macros return Arc lists, ending with NIL. ; but the Arc compiler expects Scheme lists, ending with '(). ; what to do with (is x nil . nil) ? ; the first nil ought to be replaced with 'NIL ; the second with '() ; so the rule is: NIL in the car -> 'NIL, NIL in the cdr -> '(). ; NIL by itself -> NIL
If would be simpler if the macro expander could run first and completely expand all the macros in the source code expression, and then pass the fully expanded code to the compiler:
For the macro expander to be able to fully expand all the macros in an expression, like the compiler it would need to know about the syntax of Arc’s primitive forms. For example, in
(fn (foo x) (bar y))
the macro expander would need to know that it should macro
expand (bar y)
but not (foo x)
. Likewise,
with optional arguments:
(fn ((o foo (bar))) ...)
o
and foo
shouldn’t be macro expanded
but (bar)
should.
So we’ve got some redundancy here between the macro expander and the compiler, where both need to know about optional arguments. Can we make the code simpler by removing this redundancy somehow?
Suppose fn
was itself a macro which implemented
optional arguments and argument destructuring, expanding into some more
primitive “$fn” which only knows about plain arguments:
(mac fn (args . body) (if (fn-complex-args? args) (fn-complex-fn args (fn-body body)) `($fn ,args ,@(fn-body body))))
The implementation of fn-complex-fn
turns out to be
straightforward translation of ac-complex-fn
in
ac.scm to Arc.
At this point, not only does the compiler now not need to know about
complex function arguments, neither does the macro expander! The
macro expander does need to know to not expand function arguments in
the primitive $fn
form, but since fn
is a
plain macro the macro expander doesn’t need to do anything special
with it, just as the macro expander doesn’t need to do anything
special to avoid expanding the variables in a with
form.
(with (foo 0 bar 14) ...)
A surprising result of implementing complex args as a macro is that argument destructuring and optional arguments can now be used together:
(fn ((o (a b) '(1 2))) ...)
I didn’t do anything to make this happen; the Arc3.1
implementation used Scheme’s let*
and the
straightforward translation of that was to use
Arc’s withs
... and since that in turn expands
into fn
it all just worked.
Implementing complex args as a macro does make bootstrapping Arc a
bit more cumbersome. The functions used by the macro can’t
themselves use complex arguments since they haven’t been implemented
yet. For example, in Arc3.1 pair
looks
like:
(def pair (xs (o f list)) ...)
which we now need to do by hand:
(def pair (xs . rs) ((fn (f) ...) (if (is rs nil) list (car rs))))
Similarly quasiquotation and the expansion of square brackets can
also be implemented as macros, with the same tradeoff: it simplifies
the macro expander and the compiler, at the cost of making
bootstrapping arc.arc
more cumbersome since those
features can’t be used until enough of Arc has been defined to
implement them.
I prefer lexical variables to override macros of the same name. For example, if I write:
(def report (out) (out "Say, you know what?") (out "Rizzledink and pugliquat!"))
I prefer my local usage of out
to override
Arc’s out
macro, in the same way as it would override a
global function with that name.
For the macro expander to be able to do this it needs to keep track
of which variables in a scope are lexical variables, like the Arc
compiler does passing env
around in ac and keeping
track of which variables have been defined in a fn
:
(define (ac-fn args body env) ... (append (ac-arglist args) env) ...)
It turns out that once the macro expander is keeping track of which variables are lexical, the compiler no longer needs to.
There are actually a couple options here.
One option is to
implement global variables in Arc in an Arc table. To do this
with the macro expander I have references to globals expand into a
invocation of a global
macro. For example, a global
variable reference like the prn
in
(prn "Hi there")
expands to
(global pre)
The global
macro in turn can implemented such as:
(mac global (var) `(,globals ',var))
so that (prn "Hi there")
becomes:
((globals 'prn) "Hi there" )
Which is the same as (globals!prn "Hi there")
.
Naturally globals
itself can’t itself be a global
variable (or we’d get into a infinite loop expanding global variable
references), so the macro expander needs to
recognize globals
as a special form and inject a
reference to the Arc table being used to store global variables in
the macro expansion.
When I first noticed that global variables could be stored in a plain table I wondered how terrible it would be not to get an error when referencing an undefined global variable.
In Arc3.1 storing nil
in a table as a value
removes the key, and fetching the value of a key which doesn’t exist
returns nil
, so there’s no way to distinguish between a
global variable that’s been set to nil
(such as flags
can be) vs. a global variable that hasn’t been defined at all.
Referencing an “undefined” global variable simply
yields nil
. How bad would that be?
Pretty terrible it turns out :-) I was surprised how often I mistyped function names, and having such a bug result in nothing more informative than “Function call on inappropriate object nil” led to an irritating debugging session tracking down which function I had misspelled.
However, if we change Arc tables to store nil
values,
and add a function such as “has” to tell us whether a table stores a
particular key or not, we can extend our global
macro
to tell us when we’re referencing a global variable that hasn’t been
defined:
(mac global (var) `(,do (,unless (,bound ',var) (,err "global var not defined:" ',var)) (,globals ',var)))
But you don’t have to go so far if you don’t want to. We can still
avoid having the compiler need to keep track of which variables are
lexical even without making a global
macro (or storing
global variables in an Arc table).
Since the macro expander already knows which variables are global variables which which are lexical, it could tag them for the compiler. For example, it could take
(def hi (name) (prn "Hello " name))
and expand the body into something like
(($global-ref prn) "Hello " ($lex-ref name))
where the compiler would know what to do
with $global-ref
and $lex-ref
. I.e., in
Arc3.1 lexical variables expand as themselves and global
variables have an underline prefixed to their name.
So what language can we write the macro expander in? Ideally, of course, Arc :-)
Since the macro expander lives entirely in the Arc universe, it translates Arc lists to Arc lists, and writing in Arc is the simplest way to do that. Writing it in another language (such as the target laguage that we’re compiling Arc into) would complicate the implementation because it would need to be both implementing the macro expansion algorithm and navigating whatever representation Arc lists had in the particular target runtime.
Of course writing the macro expander in Arc has the catch-22 that we need a working version of Arc (including a macro expander) to implement the macro expander!
That’s not a problem if you have a working version of Arc (Arc3.1, Anarki, Arc/Nu, etc.), and you want to be running your code in the same runtime. But if you want to compile to a different target language or to have a different internal representation of Arc lists in your runtime, then you’ll either need to cross-compile the macro expander or translate it by hand.
There’s a couple of challenges with cross-compilation. The output
of the macro expander includes generated symbols (produced
by uniq
in Arc) in macro expansion, which need to be
unique, different than any variables the user might be using.
In Lisp the usual way to do this is to have “uninterned” symbols,
symbols which aren’t equal to each other (as tested
by eq?
in Scheme or is
in Arc), even if
they happen to have the same name.
But, if I’m cross-compiling, I’m (presumably) generating a source
code file in some representation of the target language, and when
that gets loaded I’d need references to uniq symbols be properly
uninterned. E.g. perhaps I’d a have a special form like
($gensym foo)
, and all references to
($gensym foo)
in a particular source code file would
expand into the same uninterned symbol, different than other symbols
named foo
loaded elsewhere.
It occurred to me though, if we want to have unique symbols, why not simply generate random symbols, using enough bits of random data so that they’ll be universally unique?
(def uniq () (coerce (rand-string 12) 'sym))
This answered another question of mine, what to name the primitives
such as $fn
. My hope was that the
Expanding Macros with Values approach would allow a programmer
to choose their own names for things without worry of conflict (and
to do that without having to layer on namespaces).
Whatever name I give the primitive fn, whether $fn
or
something else, that name then gets hard-coded into the output of
the macro expander and you can’t use that name for your own
purposes.
Ah, but what if the primitives are given random names, globally unique names? Now there’s no chance of accidental conflict. You’ll only write code containing the name of the primitive if you’re deliberately copying its name.
Thus in my implementation of the macro expander the primivites are
given names such as $fn--xVrP8JItk2Ot
. I use the
double dash to indicate that the first part is intended to be the
readable name and the second part the random id which makes it
globally unique.
It’s ugly, but it’s also something I never need to look at when programming. Only if I want to get into the internals of the macro expander and compiler do I need to see it.
Using the approach of expanding macros with values, I could
implement do
with:
(mac do body `((,fn () ,@body)))
By injecting the value of fn
(the macro) by
using ,fn
rather than simply using fn
(which would expand into the plain symbol fn
), I can
both use do
in my own code and define fn
to be something different with interfering with operation
of do
.
The primitive $fn--xVrP8JItk2Ot
is different. It
wouldn’t work to say:
(mac do body `((,$fn--xVrP8JItk2Ot () ,@body)))
because $fn--xVrP8JItk2Ot
is not a global variable
(like how fn
is a global variable named fn
which has as its value a macro), and it doesn’t have a value to be
injected into the expansion.
The right way to use the primitive $fn
is to expand
into the symbol itself:
(mac do body
`(($fn--xVrP8JItk2Ot () ,@body)))
Another challenge with cross-compilation is how to serialize injected values. If we’re injecting arbitrary values such as functions (and macros are simply functions tagged with 'mac) into the output code, then we’d need to be able to serialize functions in order to save the output of the cross-compilation to load into the target runtime.
However we only need to be able to cross-compile enough of arc.arc so that we can run the the macro expander and the compiler in the target runtime, and even when following the approach of injecting macro values, the value that do get injected in arc.arc are always the values of global variables.
For example, in the definition of unless
:
(mac unless (test . body) `(,if (,no ,test) (,do ,@body)))
The values that get injected could have been arbitrary function and
macro definitions, but are in fact the values of existing global values
(if
, no
, and do
).
This suggests a trick. When serializing the output of the
cross-compiler, when we come across a function value, we can look for
that value in the globals table and find out what its name is. Then
in the serialization output we can have a secial form to refer to the
value of a global variable, ($ global if)
.
In the target runtime we read and execute the compiled output, form
by form. Since what most of the forms do is set the values of
global variables (with def
, mac
, etc), by
the time we’re deserializing a reference to a value such as ($
global if)
, if
will be defined as a global
variable and we can inject its value.
This gives us three levels of language implementation. The first level, level 0, is the host language: here Arc3.1 (though it could be any working implementation of Arc).
Since the macro expander macro.arc is written in Arc, we can load macro.arc into the level 0 implementation. The level 0 implementation doesn’t need to have the features implemented by the macro expander, such as storing globals in an Arc table, or allowing values to be injected by macros; but it does need to allow injected values to pass through the compiler.
--- a/ac.scm 2009-11-01 14:02:06.000000000 -0500 +++ b/ac.scm 2009-11-01 14:02:42.000000000 -0500 @@ -34,7 +34,7 @@ (ac (list 'no (cons (cadar s) (cdr s))) env)) ((eq? (xcar (xcar s)) 'andf) (ac-andf s env)) ((pair? s) (ac-call (car s) (cdr s) env)) - (#t (err "Bad object in expression" s)))) + (#t s)))
At level 0, with the macro expander loaded, I can take some source
code and run it through the macro expander. I could eval the
result, except that the macro expander produces primitive forms such
as $fn--xVrP8JItk2Ot which the Arc3.1 compiler doesn’t
understand. These are only needed though to allow me to
redefine fn
if I wanted to, so I can convert those back
to a plain fn
etc.
Arc ---> macro ---> unprimitive ---> Arc (L1) expander (L0)
This fully expands all the macros in the source, producing Arc code without any macros, which can then be eval’ed by the level 0 language.
In Arc Global Variables in an Arc Table, I modified Arc3.1 to store global variables in an Arc table instead of in a Racket namespace. The level 1 language implemented by the translation process above also has global variables stored in an Arc table, but without having to modify the host language. By the time the host language at level 0 is eval’ing the code, references to “global varaiables” at level 1 have been expand into table references and the level 0 language neither knows nor cares how global variables are implemented in the level 1 language.
To explore how a cross-compiler would work I wrote a simple target runtime in Racket (ar.rkt), and a corresponding compiler (ac-racket.arc).
The runtime leaves out functionality such as sockets and atomic not needed for the demo, and it’s slow because I haven’t implemented the optimizations that ac.scm in Arc3.1 does such as the special cases of ar-apply.
To cross-compile, the translation flow looks like:
Arc ---> macro ---> ac-racket ---> serialize (L1) expand
The output of ac-racket is Racket source code
(with lambda
and so on), represented in Arc lists, and
including injected values. The serializer serializes the injected
values so that the output can be written out to a file.
By running arc.arc, the macro expander, and the compiler through the translation process, we end up with a file containing Racket source code (serialized with embedded values), an implementation in Racket of the Arc compiler, macro expander, and enough of Arc to be able to load and compile additional Arc code.
We can’t just run the source files through this translation process by itself though. The macro expansion of forms in arc.arc depend on the macros defined earlier. So we also need to eval the forms as we’re reading them so that we can macro expand the later ones.
The solution is to run both translations together in lockstep. We read a source form, macro expand it, and then both serialize it and eval it. That way later forms can use the macros defined earlier.