Friday, February 13, 2026

Developing RonDB with Claude: Lessons from 30 Years of Database Engineering

After 30 years of developing MySQL NDB Cluster and RonDB, a new tool has fundamentally changed how I write code. Here is what I have learned about working with Claude on a large-scale distributed database project.

A New Era for Database Development

A while ago I got a new tool that completely changes the way I develop code. I have worked on developing MySQL NDB Cluster — and now the fork RonDB — for more than 30 years. Over that time, many of the original findings about software engineering have shifted. In the 1990s, I learned that unit testing was important. However, I quickly discovered that in a startup with limited resources, it was not feasible to write unit tests for distributed database functionality. I even wrote a tool for generating distributed unit test programs, but the overhead remained too high.

With Claude, this equation completely changes. I can now modify 1,000 lines of code in a day — often more — and in parallel, instruct Claude to write 3–5x as many lines of test code for comprehensive unit testing. Claude does not just improve my coding productivity; it also makes the path to high-quality code faster by enabling test coverage that was previously impractical.

Will AI Cause Unemployment?

Some ask a philosophical question: does this mean the world will see unemployment due to AI coding tools? Personally, I think not. For me, it simply means that features I have been wanting to build for 20+ years are now suddenly possible. The only reason for unemployment would be if humanity ran out of ideas for what to develop next. I do not believe that will ever happen — just look into space and realise that God's creation is far too vast for us to explore, even with 1,000x more compute power than we have today. There will always be a new thing to understand and develop around the next corner.

A Word of Caution

Not all my experiences with AI coding have been positive. We had a REST API server written in Go that needed to be ported to C++. The AI performed a straightforward translation, but this created significant performance issues and produced code that was unreadable unless you had studied every new C++ feature in the latest standard. The translation took two months; fixing the resulting issues took six months. In retrospect, writing the C++ implementation from scratch would likely have been more efficient than using AI translation.

The lesson: AI works best when you guide it with clear architectural direction, not when you use it for mechanical translation without oversight.

Getting Started: The RonDB CLI

My first real attempt at using Claude for RonDB was creating a CLI tool. A colleague initally created it and I developed it further. I realised how well-suited this type of boilerplate code was for AI assistance. I extended our REST API, added new CLIs for Rondis, the REST API, and even a MySQL client interface. This was straightforward work — it would have been fairly easy even without Claude — but it still would have taken two to three months. With Claude, it was done within a week.

The Big Challenge: Pushdown Join Aggregation

Encouraged by the CLI experience, I decided to tackle something far more ambitious. For over ten years, I had wanted to develop Pushdown Join Aggregation in NDB/RonDB. This feature would allow complex join queries with aggregation to execute directly in the data nodes rather than pulling data up to the MySQL Server. However, it was a task that would have taken a year or more, so it never rose high enough on the priority list. With Claude, I estimated I could complete it in one to two months.

Background

NDB/RonDB already had two key building blocks in place. First, Pushdown Join has been supported for a long time, enabling complex join queries to run with higher parallelism and improving performance by more than 10x compared to executing them via the MySQL Server. Second, we developed RonSQL, which supports pushdown aggregation on a single table. Many pieces were already there, but extending from single-table aggregation to complex join queries was still a major undertaking.

The Development Approach

In his blog post How I Use Claude Code, Boris Tane describes the importance of planning your work before handing it to Claude. That is definitely true, but for a task of this complexity, even more structure was needed.

Divide and Conquer: Four Modules

The task naturally breaks down into four modules:

  1. Local Database — Where the actual aggregation happens and intermediate results are stored during query execution
  2. Coordinator — Distributes query fragments across nodes and coordinates their execution
  3. API — The application interface through which clients interact with the system
  4. SQL — Transforms SQL statements into query plans sent to the NDB API and coordinator
RonDB Pushdown Join Aggregation Architecture SQL Layer Transforms SQL statements into query plans Query Plan NDB API Layer Application interface for query execution Signals Coordinator Top-level query coordination Query Fragments Sub-coordinator 1 Node-level coordination Sub-coordinator 2 Node-level coordination Sub-coordinator N Node-level coordination Local DB Node 1 Aggregation + Storage Local DB Node 2 Aggregation + Storage Local DB Node N Aggregation + Storage

Figure 1: The five-layer architecture of Pushdown Join Aggregation in RonDB

Each module had to be developed separately. I started with the local database part, where the core aggregation logic lives.

Architecture First, Then Implementation

I began by asking Claude for an architecture description, providing the fundamentals of how I wanted the aggregation handling to work — something I had been thinking about for many years. Claude produced a phased development plan. The original plan contained six phases; by the end, I had gone through 15–20 phases with constant refinements.

Claude-Assisted Development Workflow Iterative module development cycle 1. Architecture Plan Define modules & interfaces 2. Implementation Plan Break into phases 3. Code with Claude Iterative implementation 4. Review Critical Code Performance & correctness 5. Write Unit Tests Signal-based test programs 6. Expand Test Coverage Benchmarks & edge cases Iterate & Refine Key Insight: You are the architect and reviewer. Claude handles volume; you ensure correctness and performance.

Figure 2: The iterative development workflow when working with Claude

From Implementation to Testing

After about two to three days, the local database implementation was ready. At that point I realised that Claude made it possible to unit test the new code — something that would have been prohibitively expensive before. I started a RonDB cluster and wrote a client that could send and receive signals directly, bypassing the real NDB API. With some modifications to the debug build of RonDB, I had a working unit test framework. It took slightly longer than expected since Claude needed to learn a few things about writing this kind of test program — very few existing test programs did similar things.

Scaling Up with Parallel Sessions

After writing the first test case, I wanted three things: deeper test coverage, a performance benchmark, and support for aggregation with CASE statements (very common in real-world queries, but not yet supported in single-table aggregation). Each of these was a self-contained mini-project that needed a test program similar to the one I had already built.

I had learned that Claude spends significant time thinking, so to maximise productivity, I launched three parallel Claude sessions — one for each task. All three were completed within two to three hours, even though I started late in the evening. The next morning, I could build a real-world benchmark running TPC-H Q12.

Parallel Claude Sessions: Maximising Throughput Three concurrent tasks completed in 2-3 hours Initial Test Program First working test case Session 1 Expanded Test Coverage Edge cases & error paths Session 2 CASE Statement Support Real-world SQL patterns Session 3 Benchmark Program Performance measurement TPC-H Q12 Benchmark Real-world validation next morning

Figure 3: Running three Claude sessions in parallel to maximise development throughput

Key Takeaways

  1. Unit testing distributed systems is now feasible — even with limited budgets, Claude can generate the 3–5x test code volume needed alongside your implementation.
  2. Divide your task into modules — start from the low-level parts and build upward. In this case, beginning with the local database layer worked best.
  3. Architecture first, then implementation — for each module, start by asking Claude for an architecture plan, then an implementation plan.
  4. Expect many iterations — plan for constant reviews of the code Claude produces, especially the performance-critical parts.
  5. Start with a simple test, then expand — write a basic test program first, then use it as a template for comprehensive coverage, benchmarks, and edge cases.

Your New Role: Architect, Manager, and Performance Expert

Claude can be remarkably productive when used correctly, but your role as a programmer fundamentally changes. You become an architect and manager while simultaneously needing to understand code at the deepest level. Learning low-level performance characteristics is just as important as it ever was — the performance-critical parts must still be fully understood by the developer. Claude can assist, but you need to know how to direct it.

Let Claude handle what it does best: building hash tables, linked lists, and other data structures it probably understands better than most developers. Let Claude suggest approaches where you are not certain of the best path forward. But keep the architectural vision firmly in your own hands.

Teaching Claude About Your Codebase

Programming with Claude is teamwork where you are the director, but your assistant has deep knowledge in some areas and can quickly absorb new information. Sometimes, though, it needs your high-level understanding to truly grasp what is happening in the code. Do not expect the code itself to describe all the details — the architecture is often invisible when you dive into the implementation. If it is invisible to a human, it is likely invisible to Claude as well.

To manage the knowledge Claude builds, we developed a structure using a root CLAUDE.md file that indexes all the domain knowledge in a directory called claude_files, with one subdirectory for each area we have built knowledge about. This is an early approach to managing institutional knowledge for AI assistants, but it is an important consideration for any team adopting these tools.


This article was written by the author and refined with Claude.

Monday, December 08, 2025

RonDB development moves on

 A few months ago we released RonDB 24.10 with 11 new features.

Development of RonDB doesn't stop there. We have continued developing RonDB 25.10, whether this release will be a LTS release or merely an intermediate release depends on the needs of our customers. RonDB 24.10 is currently being integrated into Hopsworks 4.6 and will imminently be released.

The work on RonDB 25.10 has taken up some challenges that have been lingering for many years. One of those are the number of API nodes supported, the maximum number of nodes have been limited to 255. For most users this is enough, but for massively large clusters and setups where one uses sharding of many clusters this limit is a bottleneck. Thus in RonDB 25.10 we increased the number of nodes to 2039. It is also very straightforward to extend this limit to higher values, but for now this should be sufficient.

Sorted index scans have previously been single-threaded which have limited the speed of such scans. With RonDB 25.10 we have increased the parallelism at least by a factor 3-4 and in some setups probably even more than that.

In Hopsworks RonDB tables are used to store features in feature groups. It is fairly common to have hundreds and even thousands of features in a feature group. To support this we now support up to 4096 columns in RonDB, this is also the maximum columns supported by MySQL. This feature is now ready for inclusion into RonDB 25.10.

While working on more columns it was natural to also extend the maximum row size. Columns in RonDB are separated into 3 parts. Fixed size in-memory columns, these can at most be 8052 bytes in size. Variable-sized in-memory columns and dynamic in-memory columns, these can at most be 32000 bytes. Finally disk columns can at most be 31120 bytes. MySQL have a limit of 65536 bytes in row size. Extending these parts is future work.

We have also a new Vector Search feature ready for inclusion, this allows a vector search on a traditional index scan or a full table scan to do a search for nearest neighbour combined with normal filters on the table. Vector index is future work.

We are also working on extending the REST API server with new features and more elaborate handling of rate limits and more things will come as well.

Thursday, September 25, 2025

Datagraph releases an extension of RonDB, a Common Lisp NDB API

Datagraph develops a Graph database called Dydra that can handle SPARQL, GraphQL and Linked Data Platform (LDP). Dydra stores a revisioned history, this means that you have access to the full history of your data. This could be a development of some document, a piece of software, a piece of HW like a SoC (System-on-a-Chip) or a building or something else. Essentially any data.

This blog describes this development by the team at Datagraph.

Traditionally Dydra has used a memory-mapped key-value store for this. Now Hopsworks and Dydra have worked together for a while to provide features in RonDB that makes it possible to run Dydra on a highly available platform which is distributed that makes it possible to parallelise many of the searches.

RonDB is distributed key-value store with SQL capabilities. Traditionally distributed key-value stores offer the possibility to read and write the data in highly efficient manners using key lookups. RonDB offers this capability as well with extremely good performance (RonDB showed how to achieve 100M key lookups per second using a REST API ). However, the SQL capabilities mean that RonDB can also push down filters and projections to the RonDB data nodes. This means that searches can be parallelised.

Thus, RonDB will also be able to handle complex joins efficiently in many cases. Some of the queries in TPC-H (a standard analytical database benchmark) can be executed 50x faster in RonDB compared to using MySQL/InnoDB.

Now working with Dydra on their searches we realised that they store data structures in columns using the data type VARBINARY. SQL doesn't really have any way to define searches on complex data structures inside a VARBINARY.

When using RonDB there are many ways to access it. Many people find the use of MySQL APIs to be the preferrable method. These are APIs that are well known and there is plentiful of literature on how to use hem. However, RonDB is a key-value store as well, this means that a lower-level interface is much more efficient.

The base of all interactions with RonDB is through the C++ NDB API. On top of this API there is a Java API called ClusterJ, there is a NodeJS API called Database Jones. As mentioned there are the MySQL APIs as RonDB is an NDB storage engine for MySQL. With RonDB 24.10 we introduced also a C++ REST API server that can be used to retrieve batches of key lookups at very low latency. There is even an experimental Redis interface for RonDB that we call Rondis, it is integrated in the RonDB REST API server (RDRS).

In 2022, Datagraph released one more option: to use Common Lisp bindings for the C++ NDB API. With the release of RonDB 24.10, they just released a much improved version of the cl-ndbapi for RonDB 24.10.

As discussed above a Dydra query often entails a scan operation where one has to analyse the content of the VARBINARY column. In a first step all this functionality was performed by shipping the VARBINARY to the Common Lisp environment. This gave pretty decent performance, but we realised we could do better.

RonDB has had a simple interpreter for things such as filters, auto increment, and the like. However, to make complex analysis of VARBINARY columns we needed to extend the RonDB interpreter.

MySQL has a similar feature where one can integrate a C program into MySQL called user-defined functions (UDF). However this has two implications if we were to use a similar thing for RonDB, first it is a security issue, this program could easily crash the RonDB data nodes and this is in conflict with the high availability features of RonDB. The second issue is that RonDB is a distributed architecture, so the program would be required on every RonDB data node, thus complicating the installation process of RonDB.

Instead we opted for the approach of extending the RonDB interpreter. The RonDB interpreter has 8 registers; these registers store 64-bit signed integers. An interpreted execution always has access to a single row, it cannot acccess any other rows or data outside the interpreter. Interpreted execution has several steps, one can first ready columns, next execute the interpreted program, next one can write some columns and finally one can again read columns. In this manner one can combine normal simple reads with an interpreted program. In MySQL the interpreted program is used to execute WHERE clauses to filter away those rows not interesting for the query. The program can also have a section of input parameters making it possible to reuse an interpreted program with different input. It is also possible to return calculated results using output parameters.

To handle the new requirements the RonDB interpreter was extended with a memory area of a bit more than 64 kB.

To ensure that one can handle a generic program RonDB added a long list of new operations like Shift Left/Right, multiplication, divison, modulo and so forth. In addition instructions to read columns into the memory area and even read only parts of a column if desired. Similarly instructions to write columns.

Dydra used these new approaches and saw a handsome improvement to the results delivered by RonDB.

Now analysing the use case for Dydra we found that they used some variants of binary search on parts of the VARBINARY columns. Thus RonDB also implemented a set of higher level instruction such as binary search, search intervals, memory copy and text-to-number conversion and vice versa.

Using those new instructions Dydra saw a bit more improvements. Those new instructions also ensure that the interpreted programs are quicker to develop. As requirements for other algorithms arise it is fairly easy to add new instructions to the RonDB interpreter and should be possible for other community developers.

The most innovative part of the new Common Lisp NDB API is the handling of the interpreted instructions. It contains a language-to-language compiler, so you can write the interpreted program as a Lisp program using normal IF, WHEN and COND (IF, ELSE constructs in Lisp). You can even decide to run the program in the client using Lisp (mainly for testing and debugging) or push it down to RonDB for execution in the RonDB data nodes (for speed).

One benchmark that Dydra used to evaluate RonDB performance compared MySQL/InnoDB using an UDF with using RonDB using pushdown of the evaluation. The data set consisted of 4.5M rows where essentially all rows were scanned and for each row one executed a program that checked if the row was visible in the revision asked for. About 2% of the rows were returned.

In MySQL/InnoDB the query took 8.89 seconds to execute, in RonDB the query took 0.51 seconds to execute. Thus a nice speedup of around 17 times. Most of the speedup is dependent on the amount of parallelism used in RonDB. The MySQL execution is single-threaded. The cost of scanning one row in MySQL/InnoDB and in RonDB is very similar, RonDB is a bit faster, but there is not a major difference in speed.

Wednesday, August 27, 2025

How to design a DBMS for Telco requirements

 My colleague Zhao Song presented a walkthrough of the evolution of the DBMSs and how it relates to Google Spanner, Aurora, PolarDB and MySQL NDB Cluster.

I had some interesting discussions with him on the topic and it makes sense to return to the 1990s when I designed NDB Cluster and the impact on the recovery algorithms from the requirements for a Telco DBMS.

A Telco DBMS is a DBMS that operates in a Telco environment, this DBMS is involved in each interaction with the Telco system through smartphones such as call setup, location updates, SMS, Mobile Data. If the DBMS is down it means no service available for smartphones. Obviously there is no time of day or night when it is ok to be down. Thus even a few seconds of downtime is important to avoid.

Thus in the design of NDB Cluster I had to take into account the following events:

  • DBMS Software Upgrade
  • Application Software Upgrade
  • SW Failure in DBMS Node
  • SW Failure in Application Service
  • HW Failure in DBMS Node
  • HW Failure in Application Service
  • SW Failure in DBMS Cluster
  • Region Failure
It was clear that the design had to be a distributed DBMS, in Telcos it was not uncommon to build HW Redundant solutions with a single node but redundant HW. But this solution will obviously have difficulties with SW failures. Also it requires very specialised HW which costs hundreds of million of dollars to develop. Today this solution is very rarely used.

One of the first design decisions would be to choose between a disk-based DBMS and an in-memory DBMS. This was settled by the fact that latency requirement was to handle transactions involving tens of rows within around 10 milliseconds, thus with the hard drives of those days not really possible. Today with the introduction of SSDs and NVMe drives there is still a latency impact of at least 3x in using disk drives compared to using an in-memory DBMS.

If we play with the thought of using a Shared Disk DBMS using modern HW we still have a problem. The Shared Disk solution requires a storage solution which is a Shared Nothing solution. In addition Shared Disk DBMS commit by writing the REDO log to the Shared Disk. This means at recovery we need to replay part of the REDO log to allow the node to take over after a failed node. Thus since the latest state of some disk pages is only available in the Shared Disk, we cannot serve any transactions of these pages until we replayed the REDO log. This used to be a period of around 30 seconds, it is shorter now, but it is still not good enough for the Telco requirements.

Thus we have settled for a Shared Nothing DBMS solution using in-memory tables. The next problem is how to handle replication in a Shared Nothing. The replication sends REDO logs or something similar to this towards the backup replicas. Now one has a choice, either one applies the REDO logs immediately or one only writes them to the REDO log and applies them later.

Again applying them later means that we will suffer downtime if the backup replica is forced to take over as primary replica. Thus we have to apply the REDO logs immediately as part of the transaction execution. This means we are able to takeover within milliseconds after a node failure.

Failures could happen in two ways, most SW failures will be discovered by the other nodes in the cluster immediately. In this case node failures are discovered very quickly. However in particular HW failures can lead to silent failures, here one is required to use some sort of I-am-alive protocol (heartbeat in NDB). The discovery time here is a product of the real-time properties of the operating system and of the DBMS.

Now transaction execution can be done using a replication protocol such as PAXOS where a global order of transactions is maintained or through a non-blocking 2PC protocol. Both are required to handle failures of the coordinator through a leader-selection algorithm and handling the ongoing transactions that are affected by this.

The benefits of the non-blocking 2PC is that it can handle millions of concurrent transactions since the coordinator role can be handled by any node in the cluster. There is no central point limiting the transaction throughput. To be a non-blocking 2PC it is required to handle failed coordinators by finishing ongoing transactions using a take-over protocol. To handle cluster recovery an epoch transaction is created that regularly creates consistent recovery points. This epoch transaction can also be used to replicate to other regions even supporting Active-Active replication using various Conflict Detection Algorithms.

So the conclusion of how to design a DBMS for Telco requirements is:
  • Use an in-memory DBMS
  • Use a Shared Nothing DBMS
  • Apply the changes on both primary replica and backup replica as part of transaction
  • Use non-blocking 2PC for higher concurrency of write transactions
  • Implement Heartbeat protocol to discover silent failures in both APIs and DBMS nodes
  • Implement Take-over protocols for each distributed protocol, especially Leader-Selection
  • Implement Software Upgrade mechanisms in both APIs and DBMS nodes
  • Implement Failure Handling of APIs and DBMS nodes
  • Support Online Schema Changes
  • Support Regional Replication
The above implementation makes it possible to run a DBMS with Class 6 availability (less than 30 seconds of downtime per year). This means that all SW, HW and regional failures, including the catastrophic ones are accounted for within this 30 seconds per year.

MySQL NDB Cluster has been used at this level for more than 20 years and continues to serve billions of people with a highly available service.

At Hopsworks MySQL NDB Cluster was selected as the platform to build a highly available real-time AI platform. To make MySQL NDB Cluster accessible for anyone to use we forked it and call it RonDB. RonDB has made many improvements of ease-of-use, scalable reads, creating a managed service that makes it possible to easily install and manage RonDB. We have also added a set of new interfaces, a REST server to handle batches of lookups for generic database lookups and for feature store lookups, RonSQL to handle optimised aggregation queries that are very common in AI applications and finally an experimental Redis interface called Rondis.

Check out rondb.com for more information, you can try it out and if you want to read the latest scalable benchmark go directly here. If you want to have a walkthrough of the challenges in running a highly scalable benchmark you can find it here.

Happy reading!