cj: Making a minimal, complete JIT

7 years ago, I had an idea for a JIT compiler in C for x86 architectures with a completely auotgenerated backend based on machine-readable information on the instructions. Sounds cool, right?!

I immediately built the hand-written parts of a byte buffer I could turn into an executable function and all the infrastructure needed for two instructions, NOP and RET, the minimum I needed to prove it worked. Then I found a JS library that contains the instruction information I need, covering all of the x86 ISA (thanks, asmjit!). I would work with Node.js, no problem, I can just write an emitter struct!

And then I abandoned the project. I got a bit of a writer’s block, priorities shifted, and I ignored the idea for a few years ago, until last month. I was moving repositories around and the project was resurfaced. I realized my writer’s block had gone and I was actually quite eager to attack it again. So I did!

Over the course of around two weeks, I reworked the project to semi-autogenerate the x86 ISA (the code is still full of warts and special cases, because I had trouble finding the right abstraction at first and just kept going). I then found mra_tools, which provides a machine-readable spec for ARM, so I also added a backend generator for that architecture (full disclosure: I had an LLM program the specification extractor for me, because I couldn’t figure out the project layout quickly and figured it would be easy enough; I also let it write code comments for me, and they’re probably hilariously wrong). The JS code is admittedly brutally messy, but hopefully one doesn’t have to touch it all that much anymore (famous last words)!

Anyway, what we ended up with looks somewhat like this:

// x86 version of a tri-number calculator
#include <stdio.h>
#include "ctx.h"
#include "op.h"

int main(void) {
  cj_ctx* cj = create_cj_ctx();
  cj_operand rax = cj_make_register("rax");
  cj_operand rdi = cj_make_register("rdi");  // n (argument)
  cj_operand rcx = cj_make_register("rcx");  // loop counter
  cj_operand zero = cj_make_constant(0);
  cj_operand one = cj_make_constant(1);

  cj_mov(cj, rax, zero);         // sum = 0
  cj_mov(cj, rcx, one);          // i = 1
  cj_label loop = cj_create_label(cj);
  cj_label done = cj_create_label(cj);
  cj_mark_label(cj, loop);
  cj_cmp(cj, rcx, rdi);          // if (i > n) break;
  cj_jg(cj, done);
  cj_add(cj, rax, rcx);          // sum += i;
  cj_add(cj, rcx, one);          // ++i;
  cj_jmp(cj, loop);              // loop;
  cj_mark_label(cj, done);
  cj_ret(cj);                    // result already in rax

  typedef int (*tri_fn)(int);
  tri_fn fn = (tri_fn)create_cj_fn(cj);
  printf("tri(10) = %d\n", fn(10));  // prints 55

  destroy_cj_fn(cj, (cj_fn)fn);
  destroy_cj_ctx(cj);
  return 0;
}

Already quite good! We can basically handroll assembly at run-time, make it generate a function, and then jump in.

There are some limitations, however. Firstly, it’s not portable yet. The backend is detected automatically, but the register names and most of the instructions are still different, much like when we write assembly.

I then decided I would at least work on a few very simple “high-level” abstractions to also show how we could build out the JIT compiler to allow for backend-independent code:

#include <stdio.h>
#include "builder.h"

typedef int (*sum_fn)(int);

int main(void) {
  cj_ctx* cj = create_cj_ctx();
  cj_builder_frame frame;
  cj_builder_fn_prologue(cj, 0, &frame);

  cj_operand n = cj_builder_arg_int(cj, 0);
  cj_operand sum = cj_builder_scratch_reg(0);
  cj_operand i = cj_builder_scratch_reg(1);
  cj_operand one = cj_make_constant(1);

  cj_builder_assign(cj, sum, cj_builder_zero_operand());

  cj_builder_for_loop loop = cj_builder_for_begin(cj, i, one, n, one, CJ_COND_GE);
  cj_builder_add_assign(cj, sum, i);
  cj_builder_for_end(cj, &loop);

  cj_builder_return_value(cj, &frame, sum);

  sum_fn fn = (sum_fn)create_cj_fn(cj);
  printf("tri(5) = %d\n", fn ? fn(5) : -1);

  destroy_cj_fn(cj, (cj_fn)fn);
  destroy_cj_ctx(cj);
  return 0;
}

As you can see above, the code is playing at a whole other level of abstraction! We have for-loops and calling conventions and register names do not concern us anymore.

For now, I do not intend to move much further into building this out. The concept has more than been proven, and I can already feel the all-encompassing obsession consume me. Instead, I leave it as an artifact to play with as-is.

I’m quite proud of it. It took me seven years, but in the end I accomplished much of what I set out to do: understand the x86 ISA (and now ARM too!) and build a JIT in the process! Even if it’s just a toy, it’s a powerful toy, and it required a lot of fiddling to get it (hopefully somewhat) right.

So, what do I intend to do with it? To be honest, I’m not quite sure yet. I time-budgeted a timeboxed prorotype of a simple Forth for myself to see how language implementation in this framework would feel. From there, we will see where things take us. If I do end up finishing the Forth, you can look forward to another post!