It’s been six and a half years since my last installment in my series on Scheme macros, and I thought it was high time for another one.
If you’re new here (or, you know, you’ve only been around for the last six years and haven’t looked at the backlog), in this series we implement things usually reserved for language features as Scheme macros.
I haven’t written a Scheme macro in years. The last big one I’ve written is probably for my contribution to the paper “Efficient module-level dynamic analysis for dynamic languages with module recontextualization”, and that was four years ago and using Racket. Time to combat atrophy, and do something cool! I dusted off zepto, the Scheme I built a decade or so ago, and got to work (to my surprise, it even compiled after some minor GHC-mandated updates).
I decided it is time to implement algebraic data types (ADTs, also known as sum types), i.e. types with multiple constructors, and a pattern matching API to go with it, all in about 27 lines of code. But be warned, it’s pretty dense!
An API
First, let’s look at an API we want to implement. We want to be able to define a type like so:
(defadt Expr
(Lit n)
(Add x y)
(Mul x y))
This defines a type Expr that knows how to construct
literals, add, and multiply. An expression calculator could then look
like this:
(define (eval-expr e)
(match e
((Lit n) n)
((Add x y) (+ (eval-expr x) (eval-expr y)))
((Mul x y) (* (eval-expr x) (eval-expr y)))
(else (error "bad Expr"))))
It pattern-matches on the expression type and binds into variables. As an extra challenge, I also want to be able to write guard expressions:
(match (Lit 5)
((Lit n) :when (negative? n) 'neg)
((Lit _) 'pos)
(else 'not-a-lit))
These are arbitrary expressions that I can use for the components of my ADT.
And that’s about it! Let’s crack our knuckles and get to work!
An implementation
Let’s start with the ADT. For the purposes of this blog post, we will ignore all other affordances Scheme might have for us (record types and the like), and we will conjure data types out of nothing but lists.
Defining ADTs
(define-syntax defadt
(syntax-rules ()
((_ Name) (begin))
((_ Name (Ctor field ...) rest ...)
; what do we do here?
))))
First we define a based case for our macro, since it will be
recursive. Then we define the head clause of our main rule, where we
take one of the constructors. What do we do with it, though? We need to
at least define a function named after the constructor that constructs
the data type, and probably also a function that checks for the data
type. Taking our example from above and the Lit expression
that takes a literal, we would define (Lit <n>) and
(Lit? <to-check>), then go to our next constructor.
So, let’s do just that:
(define-syntax defadt
(syntax-rules ()
((_ Name) (begin))
((_ Name (Ctor field ...) rest ...)
(begin
(define (Ctor field ...) (cons 'Ctor (list field ...)))
(eval (macro-expand
`(define (,(string->symbol (++ (symbol->string 'Ctor) "?")) v)
(and (list? v) (eq? (car v) ','Ctor))))
(current-env))
(defadt Name rest ...)))))
This is a bit ugly, mostly due to me forgetting most of how I do
things in zepto. The first definition is easy, it literally just makes a
function that takes all the fields and puts them in a list that we then
tag with the name of the type, such that (Lit 5) would end
up being encoded as '(Lit 5).
So far, so good. The second definition is what I am not quite happy
with, since it calls macro-expand and eval
manually, a sign that we don’t know how to do something. My problem here
was that I needed to construct the symbol
ConstructorName + ? and splice that in the definition, and
I forgot how to do it. In the end, the check is literally just a check
whether we have a list and whether the constructor name is the first
argument. Good enough for us.
We then recurse, removing the first constructor, such that we eventually reach the empty base case.
Matching ADTs
The meat of our definition lies in the matcher, however. Let’s build a skeleton as before.
(define-syntax match
(syntax-rules (:when else)
((_ e (else body ...)) (begin body ...))
((_ e ((Ctor vars ...) :when g body ...) rest ...)
; match with guard
)
((_ e ((Ctor vars ...) body ...) rest ...)
; match without guard
)
((_ e) (error "match: no clause matched"))))
We define a match macro with the special keywords
else and :when. The else clause
is easy, it just unconditionally executes the body. Likewise, a matcher
without anything will just error. The complicated branches are the ones
that actually do the matching, with or without guard.
We will need recursion once more, and we will start with the slightly simpler matcher without guard expression:
(define-syntax match
(syntax-rules (:when else)
((_ e (else body ...)) (begin body ...))
((_ e ((Ctor vars ...) :when g body ...) rest ...)
; with guard
)
((_ e ((Ctor vars ...) body ...) rest ...)
(let ((tmp e))
(if (and (list? tmp) (eq? (car tmp) 'Ctor))
(apply (lambda (vars ...) body ...)
(cdr tmp))
(match tmp rest ...))))
((_ e) (error "match: no clause matched"))))
Oof, that’s a handful. So first we create a temporary variable for our matchable expression (this makes sure that it gets evaluated once, not multiple times). We then check if it quacks like a duck (it is a correctly tagged list), and if so, we apply the value to an anonymous function that’s made up of the match variables and body.
Let’s go through an illustrative example:
(match (Lit 5)
((Lit x) (* x 5)))
; expands to (partial expansion before let etc)
(let ((tmp (Lit 5)))
(if (and (list? tmp) (eq? (car tmp) 'Lit))
(apply (lambda (x) (* x 5)) '(5))
(error "match: no clause matched")))
You can also play around with it using macro-expand on
your own time, but be warned: let and and are
also macros, and the expanded expression is actually a bit more
complicated!
The guarded clause is only slightly more complicated.
(define-syntax match
(syntax-rules (:when else)
((_ e (else body ...)) (begin body ...))
((_ e ((Ctor vars ...) :when g body ...) rest ...)
(let ((tmp e))
(if (and (list? tmp) (eq? (car tmp) 'Ctor))
(apply (lambda (vars ...)
(if g
(begin body ...)
(match tmp rest ...)))
(cdr tmp))
(match tmp rest ...))))
((_ e ((Ctor vars ...) body ...) rest ...)
; without guard
((_ e) (error "match: no clause matched"))))
It essentially does the same thing, but has an if clause
inside the lambda that will go to the rest of the match if
the clause doesn’t hold.
Why there and not in the outside if? We need access to
the bound variables, and we only have that inside of the
lambda expression. The generated code will be quite ugly,
but we do not really care, since noone should ever need to see it.
And that’s it! We’ve implemented a full ADT and pattern matcher.
Next steps and exercises
A few potential next steps could be:
- Add a function that checks for the main algebraic data type
(i.e.
Expr?). This requires fiddling a bit with the recursion scheme in the macro. - Unify the guarded and unguarded macro clauses. This requires
rewriting the macro form to funnel one style into the other (probably
unguarded into guarded using a
:when trueform). - Add exhaustiveness checks in the
matchmacro. - Add nested destructuring for
match.
There are even more things you could spend time on, like
let-style destructuring and the like. The possibilities are
endless!
Fin
In this blog post, we implemented full ADT support and pattern matching in two macros and under 30 lines of code. Not too shabby, right?
Let me know if you want me to revive this series, I have a few more idea for funky macros up my sleeve!