Welcome back to another round of Simple Sudoku Solvers! If you are unfamiliar with the series, I suggest you start with the first post or by perusing the backlog, otherwise hold on tight, because today we’re stepping into the desert.
Forth is a language that feels like arriving at a ghost town. The streets are empty, the signs are weathered, and the few inhabitants look at you funny. But once you learn the local customs, you discover that everything you need is right there, and has been for decades, you just have to hold it right.
Today we’ll be using GNU Forth (gforth) as our runtime, which is the most widely available Forth implementation and the one I’d recommend for following along. As always, the code is on GitHub.
Why Forth?
Forth is not popular (well, maybe for a certain crowd maybe it is). It is most certainly not trendy. All it has is words, and it turns out that’s enough!
Here’s why I enjoyed work with it:
- Radical minimalism. Forth gives you a stack, a dictionary of “words” (think: functions), and gets out of your way. There are no types, no syntax to speak of, and almost no built-in abstractions (though I do not know what the different implementations offer). The language specification fits in a few pages.
- Everything is on the stack. There are no local variables (well, almost). Function arguments, intermediate values, return values, everything lives on the data stack. Reading Forth code means mentally simulating a stack machine, which is both a meditative exercise and a mild form of torture. I definitely have “first semester of undergrad” flashbacks whenever I do that exercise. I really sucked at it then, and maybe you can still tell.
- Opacity as a feature. Forth code is pretty hard to read for me. But it’s a trade-off, like most syntax. The extreme brevity and regularity of the language mean that once you can read it, you can read all of it. There are no dark corners, no hidden machinery. What you see is what gets executed. In a talk I gave a few years ago I talked about abstractions and how comments are implemented in Forth, because it’s a nice mind-bender.
- Real-world power. Forth has been used in embedded systems, spacecraft (it ran on the Philae lander), telescopes, and boot loaders. It’s a language that punches well above its weight class in terms of what you can do with it.
I’m not going to pretend I’m a Forth expert. Writing this solver was a lot of staring at stack diagrams, muttering “swap… over… nip…” under my breath, and occasionally getting it right. But I did come away with a genuine appreciation for the language’s strange beauty, and I expanded my horizon, and maybe you will, too.
A quick primer on reading Forth
Before we dive in, a few things will help you read the code.
Forth code is a sequence of words separated by spaces. Words are executed left to right. Each word can push values onto or pop values off the stack. That’s basically it.
By convention, stack effects are documented in parentheses:
( before -- after ). For example, ( n -- n n )
means “takes one value and leaves two copies of it”. The most common
stack manipulation words you’ll see are:
dup— duplicate the top value( a -- a a )swap— swap the top two( a b -- b a )over— copy the second value over the top( a b -- a b a )nip— remove the second value( a b -- b )drop— remove the top value( a b -- a )
Control flow uses if ... else ... then. Yes,
then comes last, it closes the conditional, it doesn’t
start one. Loops use do ... loop with i as the
loop variable. : starts a word definition, ;
ends it. And \ starts a line comment.
With that, you should be able to follow along even if you’ve never touched Forth. If it still feels confusing, that’s normal. It gets better, I think.
The implementation
As with all our solvers, we’ll go outside-in, starting with the high-level solver and working our way into the guts. But first, a quick look at how we represent the board.
Board and geometry
create board 81 allot
: board@ ( idx -- val ) board + c@ ;
: board! ( val idx -- ) board + c! ;
create board 81 allot reserves 81 bytes of memory.
That’s our 9×9 grid, flattened into a single array. board@
and board! read from and write to it by offsetting from the
base address. c@ and c! are Forth’s byte-level
memory access words. The c stands for “character” and is a
remnant from Forth’s early days.
Since we’re working with a flat array, we need helpers to find the starts of rows, columns, and boxes:
: row-start ( idx -- r ) 9 / 9 * ;
: col-start ( idx -- c ) 9 mod ;
: box-start ( idx -- b ) dup 27 / 27 * swap 9 mod 3 / 3 * + ;
These convert a cell index (0–80) into the starting
index of its row, column, or 3x3 box. box-start is the
gnarliest, it finds the top-left corner of the box by separately
computing the row contribution (which block-row times 27) and the column
contribution (which block-column times 3). It’s not elegant, but it’s
correct, and by now you might even know all the magic numbers (we’ve
seen all of them before).
The solver
Let’s look at the main event. Fair warning: this is the most involved
functionword in the whole program.
: solve ( -- solved? )
propagate 0= if false exit then
solved? if true exit then
find-mrv dup 0< if drop false exit then
dup candidates swap \ ( mask idx )
9 0 do
over 1 and if
push-board
i 1+ over board!
recurse if drop drop true unloop exit then
pop-board
then
swap 1 rshift swap
loop
2drop false ;
Let me break this down, because there’s a lot happening in these few lines.
The first three lines are our familiar pattern: propagate (fail if
inconsistent), check if solved (return if so), find the MRV cell (fail
if none). find-mrv returns an index or -1 as a
marker, so we check for that.
Then things get properly Forth-y. We compute the candidate mask for
our chosen cell and swap to get the stack into
( mask idx ) order. This pair will live on the stack for
the entire loop—our implicit “local variables”.
The loop goes from 0 to 8 (nine iterations, one per possible digit). At each step:
over 1 andpeeks at the lowest bit of the mask without consuming it—is this digit a candidate?- If yes:
push-boardsaves the entire board (more on this shortly),i 1+ over board!fills in the digit (iis the loop counter, soi 1+gives us the digit 1–9), andrecursetries to solve the resulting board. - If
recursesucceedsdrop dropremoves our mask and idx,unloopcleans the loop’s bookkeeping off the return stack, and weexitwithtrue. - If it fails:
pop-boardrestores the saved board, and we try the next candidate. swap 1 rshift swapshifts the mask right by one bit to line up the next candidate, keepingidxon top.
If we exhaust all candidates without finding a solution,
2drop false cleans up the mask and index and signals
failure to our caller.
A few things worth noting here. recurse is Forth’s
mechanism for self-reference: a word can’t refer to itself by name
during its own definition (the word isn’t in the dictionary yet!), so
recurse exists as an explicit “call the word currently
being defined”. And unloop before exit is
required to clean up the loop’s internal state from the return stack.
Without it, we’d corrupt the return path. This is a detail that for me,
as someone who’s written a fair amount of compilers, comes
semi-naturally, but I don’t think it does for most other
programmers.
The backtracking stack
This is probably the most Forth-specific piece of the whole solver. In every other language, we either copied the board, relied on immutable structures, or made stack copies the compiler optimized for us. In Forth, we do it ourselves.
50 constant max-depth
create board-stack max-depth 81 * allot
variable sp 0 sp !
: push-board ( -- )
board board-stack sp @ 81 * + 81 move 1 sp +! ;
: pop-board ( -- )
-1 sp +! board-stack sp @ 81 * + board 81 move ;
We pre-allocate a stack of 50 board snapshots
(50 x 81 => 4,050 bytes should be acceptable).
sp is our stack pointer. push-board copies the
current board into the next free slot using move (Forth’s
memcpy) and bumps the pointer. pop-board
decrements and copies back.
This is raw but completely transparent. There’s no hidden allocation or copying strategy to reason about. You’re just moving bytes between two regions of memory. It’s as close to the metal as we’ll get in this series, at least until assembly in season three (stay tuned and whatnot).
50 levels of depth is extremely generous for our purposes. In practice, even the hardest Sudoku puzzles rarely need more than a handful of branching levels. But memory is cheap and we’re a paranoid bunch.
Propagation
Propagation is split into three small words working together:
: solved? ( -- flag )
true 81 0 do i board@ 0= if drop false leave then loop ;
: contradiction? ( -- flag )
false
81 0 do
i board@ 0= if i candidates 0= if drop true leave then then
loop ;
: fill-singles ( -- changed? )
false
81 0 do
i board@ 0= if
i candidates
dup popcount 1 = if lowest-bit 1+ i board! drop true
else drop then
then
loop ;
: propagate ( -- ok? )
begin fill-singles 0= until contradiction? invert ;
solved? checks if every cell is nonzero.
contradiction? checks if any empty cell has zero
candidates. fill-singles scans for cells with exactly one
candidate and fills them in, returning whether it changed anything.
propagate ties them together by running
fill-singles until it reports no changes
(begin ... until) and then checking for contradictions.
I want to stay on fill-singles for a moment, because it
showcases how Forth handles what other languages would express with
mutable flags and return statements. We start with false on
the stack as our flag of change. For each empty cell, we compute
candidates, check the popcount, and if it’s 1, we extract the digit
(lowest-bit 1+—the lowest set bit’s position plus one gives
us the digit), write it to the board, drop the leftover
mask, and push true as our new flag. Otherwise we just
drop the mask and move on. At the end of the loop, whatever
boolean sits on top is our answer.
There’s no assignment to a variable or return statement. The flag materializes as a side effect of the computation. Expressing control through data flow, if you want to sound dramatic.
Candidate computation
We use the same bitmask trick as in Common Lisp, Rust, APL, and Carp:
a 9-bit mask where bit k represents digit
k+1.
: digit>bit ( d -- bit ) 1- 1 swap lshift ;
511 constant full-mask
: row-used ( idx -- mask )
row-start 0
9 0 do over i + board@ dup 0> if digit>bit or else drop then loop
nip ;
: col-used ( idx -- mask )
col-start 0
9 0 do over i 9 * + board@ dup 0> if digit>bit or else drop then loop
nip ;
: box-used ( idx -- mask )
box-start 0
3 0 do 3 0 do
over j 9 * i + + board@ dup 0> if digit>bit or else drop then
loop loop
nip ;
: candidates ( idx -- mask )
dup row-used over col-used or over box-used or nip full-mask xor ;
The three *-used words follow the same pattern: start
with the anchor index and an accumulator of 0, loop over
the relevant cells, and or each nonzero digit’s bit into
the accumulator. The nip at the end discards the anchor,
leaving just the mask.
candidates combines all three and flips the result. The
dup ... over ... over ... nip thing-a-majig keeps the index
alive across three word calls without ever storing it in a variable
(pure stack threading). If you trace through it carefully, you’ll see
the index getting duplicated, consumed by each *-used call,
duplicated again, consumed again, and finally discarded when we’re
done.
I find box-used particularly satisfying with its nested
do loops and the j 9 * i + offset computation.
It’s doing the same 3x3 walk we’ve done in every solver, but here the
loop variables i and j come from Forth’s own
loop mechanism, and the whole thing reads almost like pseudocode (albeit
in postfix).
Bit helpers
Since gforth doesn’t give us popcount or trailing-zero-count as primitives, we roll our own, it’s not our first rodeo:
: popcount ( mask -- n )
0 swap 9 0 do dup 1 and rot + swap 1 rshift loop drop ;
: lowest-bit ( mask -- pos )
0 begin over 1 and 0= while 1+ swap 1 rshift swap repeat nip ;
popcount walks through 9 bits, adding each to an
accumulator. lowest-bit shifts until it finds a set bit and
counts how many shifts it took. Neither is clever, but they get the job
done.
MRV selection
The last piece: finding the cell with the fewest candidates for branching.
variable mrv-best
: find-mrv ( -- idx )
10 mrv-best !
-1
81 0 do
i board@ 0= if
i candidates popcount
dup mrv-best @ < if
mrv-best ! drop i
else drop then
then
loop ;
Here we break our “no variables” streak. mrv-best holds
the current best candidate count, and the top of the data stack holds
the current best index (starting at -1 as a marker). For
each empty cell, we compute the popcount and compare. If it’s an
improvement, we update both.
This is one of the few places where a variable felt genuinely more natural than pure stack manipulation. Keeping both the best count and the best index on the stack while also juggling the loop counter and the candidate count probably would have been possible, but the resulting stack acrobatics would have made the code much harder to follow for very little benefit. Even my madness has its limits.
Fin
Forth asks more of you than almost any other language. It requires more mental computation and more discipline. A lot of the language features we’ve used over the last 1.5 seasons are notably absent.
And yet, there is something deeply compelling about it. The
transparency of every word doing exactly what it says, nothing being
hidden, and the entire system fitting in one person’s head without it
being too inscrutable. The way you build up from nothing, defining words
that become the vocabulary for your specific problem. By the end,
solve reads almost like a sentence: propagate, check if
solved, find the best cell, try each candidate, recurse. The fact that
this was built on nothing but a stack and some memory still feels like a
magic trick to me.
Our solver is about 140 lines, on the longer side in terms of raw code, but also the simplest in terms of what the language gives us to work with. No macros, no monads, no constraint solvers, no type inference. And yet we have all that we need.
If you’ve made it this far, thank you for walking through the desert with me. The next stop has considerably more amenities: we’re heading to Smalltalk, a language that is Forth’s philosophical opposite in almost every way. See you there!