Arc Macro Expander

I had these goals while writing the Arc macro expander:

The macro expander runs as a separate phase before the Arc compiler, fully expanding all the macros in the source form. The Arc compiler receives code that has been fully macro expanded, and so only needs to implement compiling primitive forms such as fn, if, quote, assign, and function calls.
The macro expander tracks which variables are lexical variables and which are globals. References to global variables are expanded into macros, which in turn can chose how to implement global variables: whether as global variables in a Racket namespace like Arc_3.1 does, or by storing global variables in an Arc table. This further simplifies the Arc compiler because it also no longer needs to keep track of lexical variables.
The macro expander also supports injecting values (such as function values and macro values) into the result of a macro expansion. In Expanding Macros with Values Instead of Names I describe how this appears to support importing and exporting macros from modules without having to use namespaces.
To explore the feasibility of writing a cross-compiler when macro expansions can contain injected values, I wrote a demo Arc to Racket cross-compiler and target runtime.

And 28 days of debugging later... :-)

Rationale

In Arc_3.1, the macro expander is written in Scheme as part of the compiler, which means that it needs to worry about the difference between Arc lists and Scheme lists:

  ; macros return Arc lists, ending with NIL.
  ; but the Arc compiler expects Scheme lists, ending with '().
  ; what to do with (is x nil . nil) ?
  ;   the first nil ought to be replaced with 'NIL
  ;   the second with '()
  ; so the rule is: NIL in the car -> 'NIL, NIL in the cdr -> '().
  ;   NIL by itself -> NIL

If would be simpler if the macro expander could run first and completely expand all the macros in the source code expression, and then pass the fully expanded code to the compiler:

The macro expander could live entirely in the Arc universe, expanding Arc lists to Arc lists, without worrying about the underlying representation of Arc lists vs. the host language.
The compiler might also be able to be simpler, if macro expansion wasn’t one if the tasks it need to do.
Maybe new ideas in macro expansion, such as my Expanding Macros into Values, could be experimented with by just changing the macro expansion code without needing to get into the compiler.

For the macro expander to be able to fully expand all the macros in an expression, like the compiler it would need to know about the syntax of Arc’s primitive forms. For example, in

  (fn (foo x)
    (bar y))

the macro expander would need to know that it should macro expand (bar y) but not (foo x). Likewise, with optional arguments:

  (fn ((o foo (bar))) ...)

o and foo shouldn’t be macro expanded but (bar) should.

So we’ve got some redundancy here between the macro expander and the compiler, where both need to know about optional arguments. Can we make the code simpler by removing this redundancy somehow?

Suppose fn was itself a macro which implemented optional arguments and argument destructuring, expanding into some more primitive “$fn” which only knows about plain arguments:

  (mac fn (args . body)
    (if (fn-complex-args? args)
         (fn-complex-fn args (fn-body body))
         `($fn ,args ,@(fn-body body))))

The implementation of fn-complex-fn turns out to be straightforward translation of ac-complex-fn in ac.scm to Arc.

At this point, not only does the compiler now not need to know about complex function arguments, neither does the macro expander! The macro expander does need to know to not expand function arguments in the primitive $fn form, but since fn is a plain macro the macro expander doesn’t need to do anything special with it, just as the macro expander doesn’t need to do anything special to avoid expanding the variables in a with form.

  (with (foo 0 bar 14)
    ...)

A surprising result of implementing complex args as a macro is that argument destructuring and optional arguments can now be used together:

  (fn ((o (a b) '(1 2)))
    ...)

I didn’t do anything to make this happen; the Arc_3.1 implementation used Scheme’s let* and the straightforward translation of that was to use Arc’s withs... and since that in turn expands into fn it all just worked.

Implementing complex args as a macro does make bootstrapping Arc a bit more cumbersome. The functions used by the macro can’t themselves use complex arguments since they haven’t been implemented yet. For example, in Arc_3.1 pair looks like:

  (def pair (xs (o f list))
    ...)

which we now need to do by hand:

  (def pair (xs . rs)
    ((fn (f)
       ...)
     (if (is rs nil) list (car rs))))

Similarly quasiquotation and the expansion of square brackets can also be implemented as macros, with the same tradeoff: it simplifies the macro expander and the compiler, at the cost of making bootstrapping arc.arc more cumbersome since those features can’t be used until enough of Arc has been defined to implement them.

Lexical Variables

I prefer lexical variables to override macros of the same name. For example, if I write:

  (def report (out)
    (out "Say, you know what?")
    (out "Rizzledink and pugliquat!"))

I prefer my local usage of out to override Arc’s out macro, in the same way as it would override a global function with that name.

For the macro expander to be able to do this it needs to keep track of which variables in a scope are lexical variables, like the Arc compiler does passing env around in ac and keeping track of which variables have been defined in a fn:

  (define (ac-fn args body env)
    ... (append (ac-arglist args) env) ...)

It turns out that once the macro expander is keeping track of which variables are lexical, the compiler no longer needs to.

There are actually a couple options here.

With Global Variables in a Table

One option is to implement global variables in Arc in an Arc table. To do this with the macro expander I have references to globals expand into a invocation of a global macro. For example, a global variable reference like the prn in

  (prn "Hi there")

expands to

  (global pre)

The global macro in turn can implemented such as:

  (mac global (var)
   `(,globals ',var))

so that (prn "Hi there") becomes:

  ((globals 'prn) "Hi there" )

Which is the same as (globals!prn "Hi there").

Naturally globals itself can’t itself be a global variable (or we’d get into a infinite loop expanding global variable references), so the macro expander needs to recognize globals as a special form and inject a reference to the Arc table being used to store global variables in the macro expansion.

When I first noticed that global variables could be stored in a plain table I wondered how terrible it would be not to get an error when referencing an undefined global variable.

In Arc_3.1 storing nil in a table as a value removes the key, and fetching the value of a key which doesn’t exist returns nil, so there’s no way to distinguish between a global variable that’s been set to nil (such as flags can be) vs. a global variable that hasn’t been defined at all. Referencing an “undefined” global variable simply yields nil. How bad would that be?

Pretty terrible it turns out :-) I was surprised how often I mistyped function names, and having such a bug result in nothing more informative than “Function call on inappropriate object nil” led to an irritating debugging session tracking down which function I had misspelled.

However, if we change Arc tables to store nil values, and add a function such as “has” to tell us whether a table stores a particular key or not, we can extend our global macro to tell us when we’re referencing a global variable that hasn’t been defined:

  (mac global (var)
    `(,do (,unless (,bound ',var)
            (,err "global var not defined:" ',var))
          (,globals ',var)))

Or, Plain Global Variables

But you don’t have to go so far if you don’t want to. We can still avoid having the compiler need to keep track of which variables are lexical even without making a global macro (or storing global variables in an Arc table).

Since the macro expander already knows which variables are global variables which which are lexical, it could tag them for the compiler. For example, it could take

  (def hi (name)
    (prn "Hello " name))

and expand the body into something like

  (($global-ref prn) "Hello " ($lex-ref name))

where the compiler would know what to do with $global-ref and $lex-ref. I.e., in Arc_3.1 lexical variables expand as themselves and global variables have an underline prefixed to their name.

Implementing the Macro Expander

So what language can we write the macro expander in? Ideally, of course, Arc :-)

Since the macro expander lives entirely in the Arc universe, it translates Arc lists to Arc lists, and writing in Arc is the simplest way to do that. Writing it in another language (such as the target laguage that we’re compiling Arc into) would complicate the implementation because it would need to be both implementing the macro expansion algorithm and navigating whatever representation Arc lists had in the particular target runtime.

Of course writing the macro expander in Arc has the catch-22 that we need a working version of Arc (including a macro expander) to implement the macro expander!

That’s not a problem if you have a working version of Arc (Arc_3.1, Anarki, Arc/Nu, etc.), and you want to be running your code in the same runtime. But if you want to compile to a different target language or to have a different internal representation of Arc lists in your runtime, then you’ll either need to cross-compile the macro expander or translate it by hand.

There’s a couple of challenges with cross-compilation. The output of the macro expander includes generated symbols (produced by uniq in Arc) in macro expansion, which need to be unique, different than any variables the user might be using.

In Lisp the usual way to do this is to have “uninterned” symbols, symbols which aren’t equal to each other (as tested by eq? in Scheme or is in Arc), even if they happen to have the same name.

But, if I’m cross-compiling, I’m (presumably) generating a source code file in some representation of the target language, and when that gets loaded I’d need references to uniq symbols be properly uninterned. E.g. perhaps I’d a have a special form like ($gensym foo), and all references to ($gensym foo) in a particular source code file would expand into the same uninterned symbol, different than other symbols named foo loaded elsewhere.

It occurred to me though, if we want to have unique symbols, why not simply generate random symbols, using enough bits of random data so that they’ll be universally unique?

  (def uniq ()
    (coerce (rand-string 12) 'sym))

This answered another question of mine, what to name the primitives such as $fn. My hope was that the Expanding Macros with Values approach would allow a programmer to choose their own names for things without worry of conflict (and to do that without having to layer on namespaces).

Whatever name I give the primitive fn, whether $fn or something else, that name then gets hard-coded into the output of the macro expander and you can’t use that name for your own purposes.

Ah, but what if the primitives are given random names, globally unique names? Now there’s no chance of accidental conflict. You’ll only write code containing the name of the primitive if you’re deliberately copying its name.

Thus in my implementation of the macro expander the primivites are given names such as $fn--xVrP8JItk2Ot. I use the double dash to indicate that the first part is intended to be the readable name and the second part the random id which makes it globally unique.

It’s ugly, but it’s also something I never need to look at when programming. Only if I want to get into the internals of the macro expander and compiler do I need to see it.

fn vs. $fn

Using the approach of expanding macros with values, I could implement do with:

  (mac do body
    `((,fn () ,@body)))

By injecting the value of fn (the macro) by using ,fn rather than simply using fn (which would expand into the plain symbol fn), I can both use do in my own code and define fn to be something different with interfering with operation of do.

The primitive $fn--xVrP8JItk2Ot is different. It wouldn’t work to say:

  (mac do body
    `((,$fn--xVrP8JItk2Ot () ,@body)))

because $fn--xVrP8JItk2Ot is not a global variable (like how fn is a global variable named fn which has as its value a macro), and it doesn’t have a value to be injected into the expansion.

The right way to use the primitive $fn is to expand into the symbol itself:


  (mac do body
    `(($fn--xVrP8JItk2Ot () ,@body)))

Serializing Injected Values

Another challenge with cross-compilation is how to serialize injected values. If we’re injecting arbitrary values such as functions (and macros are simply functions tagged with 'mac) into the output code, then we’d need to be able to serialize functions in order to save the output of the cross-compilation to load into the target runtime.

However we only need to be able to cross-compile enough of arc.arc so that we can run the the macro expander and the compiler in the target runtime, and even when following the approach of injecting macro values, the value that do get injected in arc.arc are always the values of global variables.

For example, in the definition of unless:

  (mac unless (test . body)
    `(,if (,no ,test) (,do ,@body)))

The values that get injected could have been arbitrary function and macro definitions, but are in fact the values of existing global values (if, no, and do).

This suggests a trick. When serializing the output of the cross-compiler, when we come across a function value, we can look for that value in the globals table and find out what its name is. Then in the serialization output we can have a secial form to refer to the value of a global variable, ($ global if).

In the target runtime we read and execute the compiled output, form by form. Since what most of the forms do is set the values of global variables (with def, mac, etc), by the time we’re deserializing a reference to a value such as ($ global if), if will be defined as a global variable and we can inject its value.

Demo

This gives us three levels of language implementation. The first level, level 0, is the host language: here Arc_3.1 (though it could be any working implementation of Arc).

Since the macro expander macro.arc is written in Arc, we can load macro.arc into the level 0 implementation. The level 0 implementation doesn’t need to have the features implemented by the macro expander, such as storing globals in an Arc table, or allowing values to be injected by macros; but it does need to allow injected values to pass through the compiler.

  --- a/ac.scm	2009-11-01 14:02:06.000000000 -0500
  +++ b/ac.scm	2009-11-01 14:02:42.000000000 -0500
  @@ -34,7 +34,7 @@
            (ac (list 'no (cons (cadar s) (cdr s))) env))
           ((eq? (xcar (xcar s)) 'andf) (ac-andf s env))
           ((pair? s) (ac-call (car s) (cdr s) env))
  -        (#t (err "Bad object in expression" s))))
  +        (#t s)))

At level 0, with the macro expander loaded, I can take some source code and run it through the macro expander. I could eval the result, except that the macro expander produces primitive forms such as $fn--xVrP8JItk2Ot which the Arc_3.1 compiler doesn’t understand. These are only needed though to allow me to redefine fn if I wanted to, so I can convert those back to a plain fn etc.

   Arc  --->   macro    --->  unprimitive  --->  Arc
   (L1)       expander                           (L0)

This fully expands all the macros in the source, producing Arc code without any macros, which can then be eval’ed by the level 0 language.

In Arc Global Variables in an Arc Table, I modified Arc_3.1 to store global variables in an Arc table instead of in a Racket namespace. The level 1 language implemented by the translation process above also has global variables stored in an Arc table, but without having to modify the host language. By the time the host language at level 0 is eval’ing the code, references to “global varaiables” at level 1 have been expand into table references and the level 0 language neither knows nor cares how global variables are implemented in the level 1 language.

To explore how a cross-compiler would work I wrote a simple target runtime in Racket (ar.rkt), and a corresponding compiler (ac-racket.arc).

The runtime leaves out functionality such as sockets and atomic not needed for the demo, and it’s slow because I haven’t implemented the optimizations that ac.scm in Arc_3.1 does such as the special cases of ar-apply.

To cross-compile, the translation flow looks like:

  Arc   --->  macro   --->  ac-racket  --->  serialize
  (L1)        expand

The output of ac-racket is Racket source code (with lambda and so on), represented in Arc lists, and including injected values. The serializer serializes the injected values so that the output can be written out to a file.

By running arc.arc, the macro expander, and the compiler through the translation process, we end up with a file containing Racket source code (serialized with embedded values), an implementation in Racket of the Arc compiler, macro expander, and enough of Arc to be able to load and compile additional Arc code.

We can’t just run the source files through this translation process by itself though. The macro expansion of forms in arc.arc depend on the macros defined earlier. So we also need to eval the forms as we’re reading them so that we can macro expand the later ones.

The solution is to run both translations together in lockstep. We read a source form, macro expand it, and then both serialize it and eval it. That way later forms can use the macros defined earlier.

Home
@awwx
github/awwx
andrew.wilcox@gmail.com