Mikael Ronstrom

Experiences from the new wave of AI programming

How we built RonSQL in RonDB with Claude and Codex

A false start

A few years ago at Hopsworks we made an early attempt at using ChatGPT for coding. It turned out to be an expensive mistake: the generated code cost us several months of rewriting. So when I came back to AI programming this year, I did so with a fair amount of skepticism.

Trying again — the small wins

In January we tried again, and the first success came from an unexpected place. Our marketing manager put together a simple RonDB client in about two hours. That caught my interest. I took the code over and, in a few more hours, had extended it significantly.

Encouraged, I tried something closer to real engineering: extending our REST API server so that it could handle not only batched key reads, but also batched key inserts, writes, and deletes. The original code had taken significant effort to write; the extension took just two days. Clearly, extending existing code was something Claude and Codex could do well.

The big bet: a ten-year TODO item

With those successful experiments behind me, I realised this new tooling might be extremely useful for developing genuinely new features in RonDB.

Something our customers have wanted for a long time is more complex queries in real-time AI inferencing. Supporting that requires CTEs and parallel join aggregation — a capability that has sat on the RonDB TODO list for more than ten years. It is very complex development work; with conventional coding I would estimate it as a two-year project, at least.

What this really amounts to is adding subsets of complex SQL support to a key-value store. RonDB sits in the same category as key-value stores such as Redis and DynamoDB, which are traditionally used for exactly this kind of real-time, low-latency serving — but which have no support for complex SQL queries at all. Bringing CTEs and parallel join aggregation into that world is what makes the effort both unusual and worthwhile.

Would it be feasible now, with AI programming? I decided to find out. This post shares what I learned over the past five months.

The short version: today we released RonDB 26.04.1, which ships beta support for RonSQL with CTEs and pushdown join aggregation.

The development model

From the start it was clear that the proven development model still applied. You begin with a high-level plan, move to a detailed implementation plan, and only then implement it step by step.

We went through this loop several times. Each plan had at least 10–20 phases, and sometimes many more.

Claude vs. Codex

Working with both models, my conclusion is that their strengths are complementary. Claude is better at keeping track of plans and remaining work. Codex is often better at solving hard problems, but it is sloppier about maintaining the plan and can lose track of what to do next. So for the most part I let Claude own plan maintenance and the less complex tasks, and handed the hard bugs to Codex.

Massive testing across four layers

One thing that struck me immediately is that AI programming makes it possible to generate massive amounts of unit tests — even for the distributed cases that are normally painful to test.

RonSQL query execution spans four layers in RonDB:

LDM (Local Data Manager) — the lowest layer, six modules that handle local query execution, local recovery, and local checkpointing.
TC (Transaction Coordinator) — coordinates join execution for complex queries. Partial results flow from the first tables in the join to the later ones. The TC drives execution from a Query Tree: essentially a program that specifies how to run the query, including interpreted code to be sent down to the LDMs.
NDB API — defines the Query Tree through an API. Applications can use it directly, though that is uncommon because it is low-level.
RonSQL — the top layer, which translates SQL into NDB API calls. The NDB API builds a Query Tree and sends it to the TC; the TC distributes execution across the RonDB nodes, and each part runs on the LDM in every node.

Normally you write test programs in RonSQL, because writing and maintaining programs against the LDM, TC, and NDB API layers by hand is very hard. AI programming changes that. It became feasible to develop and test the LDM layer first, before touching the TC layer, and the TC layer before touching the NDB API layer. This made it much easier to reach a working implementation across all four layers quickly.

The asynchronous challenge

AI programming works very well on the kind of sequential code the models have seen in training. But RonDB uses an asynchronous programming model, and here the models sometimes needed firm guidance to understand how things actually work. Things that were obvious to me were sometimes completely invisible to the model — but it was usually easy enough to teach it through the prompts. The reverse was true just as often: the models generate new code far faster than a human can think.

How do you verify the AI got it right?

So how do you verify that the generated code is correct? Here is the model I try to follow.

Phase 1 — Build to the plan

Follow the high-level plan, then the implementation plan, then implement step by step. In this phase you are the architect: you tell the model what to do at a high level, and for the most part you act as an operator who accepts its suggestions. I found it essential not to let the model change any code without stating exactly what it changed and how. With Claude this was the default behaviour; Codex had to be kept on a tight leash to stop it running ahead and generating code without checking whether it was correct. Seeing every diff gives you at least a working idea of what the new code does — and this is the phase to generate lots of tests.

Crucially, each step includes writing at least one test case and verifying that the step has been successfully executed before moving on to the next one. A step is not “done” until there is a passing test that proves it. This keeps the implementation grounded and stops small mistakes from compounding across the many phases of a plan.

Phase 2 — Review

Once the steps are done, review all the new code. I focus mostly on the code that affects actual execution, and only glance at test cases and plans. I used the model to generate documentation — with images and signalling diagrams — to better understand the new code. This phase is where you ensure the model follows RonDB's memory model and correctly handles node failures and other failures in query execution. The models also tend to generate a lot of duplicated code, so removing duplicated code paths is an essential step — some of these removed thousands of lines of production code that would otherwise have had to be maintained.

Phase 3 — Stress the feature with tests

We then wrote CTE test cases without regard to whether they were supported. When a feature was not supported, we either fixed it or tagged it as unsupported.

What remains is to minimise query latency and to keep reviewing the generated code.

Working in parallel

One final observation: while the model is thinking, there is plenty of time to work on something else. Most of the time I had two to four projects running in parallel. That is how, alongside RonSQL, prototypes appeared for a JIT compiler for the RonDB interpreters, for using fibers in RonDB, and for Deadlock Discovery. Only Deadlock Discovery is in the source tree so far, and it is disabled by default. All three are still work in progress.

Takeaway

Looking back, a feature that had been on the TODO list for over a decade — and that I would have scoped as a two-year effort — reached beta in five months. The work did not disappear; it changed shape. My role shifted from writing the code to architecting it, guiding the models, reviewing the result, and above all testing it. That, more than raw code generation, is where the new wave of AI programming earned its place in how we build RonDB.