Thursday, June 04, 2026

RonDB gets a RDMA transporter supporting pNFS Lattice

Today, PEAK-AIO announced pNFS Lattice, a metadata server for Parallel NFS. It is running on top of RonDB using an RDMA transporter.

Now you might wonder, does RonDB have an RDMA transporter? A few weeks ago the answer would have been no. It has been on the TODO list for more than a decade, but it was not prioritized. However, a few weeks ago Eyal Lemberger from PEAK-AIO contributed an RDMA transporter to RonDB. Me and Eyal worked on the contribution for a few weeks and it was stable enough to integrate into RonDB 26.02, our latest stable release.

In the normal builds, the RDMA transporter is disabled, so it requires manual building to integrate it. The aim is to integrate the RDMA transporter into the RonDB release planned for the end of the year 2026.

3x Latency Improvement

RDMA improved the latency of metadata operations by a factor of 3. Actually, this brings RonDB back to its roots. The very first transporter that NDB Cluster (RonDB is a fork of NDB Cluster) had was a Dolphin SCI transporter. I have worked with Dolphin for more than 30 years. Today Dolphin technology can be used with a normal TCP/IP transporter using normal sockets.

The Wakeup Cost Challenge

The biggest challenge of an RDMA transporter is the same as the challenge of the shared memory transporter. It is about how to wake up the receiving process—it is important not to pay the wakeup cost for every signal.

In RDMA, one can send batches and the wakeup is handled by the network card. Thus, the sender has no cost attached to sending other than writing to the receivers memory. The wakeup cost is only at the receiving side and is integrated in the network card, so the only cost is an occasional interrupt to ensure that the receiver is awake.

Great to see RonDB coming to good use and the open-source license making the technology available to many. Many thanks to PEAK-AIO for the contribution.

Friday, February 13, 2026

Developing RonDB with Claude: Lessons from 30 Years of Database Engineering

After 30 years of developing MySQL NDB Cluster and RonDB, a new tool has fundamentally changed how I write code. Here is what I have learned about working with Claude on a large-scale distributed database project.

A New Era for Database Development

A while ago I got a new tool that completely changes the way I develop code. I have worked on developing MySQL NDB Cluster — and now the fork RonDB — for more than 30 years. Over that time, many of the original findings about software engineering have shifted. In the 1990s, I learned that unit testing was important. However, I quickly discovered that in a startup with limited resources, it was not feasible to write unit tests for distributed database functionality. I even wrote a tool for generating distributed unit test programs, but the overhead remained too high.

With Claude, this equation completely changes. I can now modify 1,000 lines of code in a day — often more — and in parallel, instruct Claude to write 3–5x as many lines of test code for comprehensive unit testing. Claude does not just improve my coding productivity; it also makes the path to high-quality code faster by enabling test coverage that was previously impractical.

Will AI Cause Unemployment?

Some ask a philosophical question: does this mean the world will see unemployment due to AI coding tools? Personally, I think not. For me, it simply means that features I have been wanting to build for 20+ years are now suddenly possible. The only reason for unemployment would be if humanity ran out of ideas for what to develop next. I do not believe that will ever happen — just look into space and realise that God's creation is far too vast for us to explore, even with 1,000x more compute power than we have today. There will always be a new thing to understand and develop around the next corner.

A Word of Caution

Not all my experiences with AI coding have been positive. We had a REST API server written in Go that needed to be ported to C++. The AI performed a straightforward translation, but this created significant performance issues and produced code that was unreadable unless you had studied every new C++ feature in the latest standard. The translation took two months; fixing the resulting issues took six months. In retrospect, writing the C++ implementation from scratch would likely have been more efficient than using AI translation.

The lesson: AI works best when you guide it with clear architectural direction, not when you use it for mechanical translation without oversight.

Getting Started: The RonDB CLI

My first real attempt at using Claude for RonDB was creating a CLI tool. A colleague initally created it and I developed it further. I realised how well-suited this type of boilerplate code was for AI assistance. I extended our REST API, added new CLIs for Rondis, the REST API, and even a MySQL client interface. This was straightforward work — it would have been fairly easy even without Claude — but it still would have taken two to three months. With Claude, it was done within a week.

The Big Challenge: Pushdown Join Aggregation

Encouraged by the CLI experience, I decided to tackle something far more ambitious. For over ten years, I had wanted to develop Pushdown Join Aggregation in NDB/RonDB. This feature would allow complex join queries with aggregation to execute directly in the data nodes rather than pulling data up to the MySQL Server. However, it was a task that would have taken a year or more, so it never rose high enough on the priority list. With Claude, I estimated I could complete it in one to two months.

Background

NDB/RonDB already had two key building blocks in place. First, Pushdown Join has been supported for a long time, enabling complex join queries to run with higher parallelism and improving performance by more than 10x compared to executing them via the MySQL Server. Second, we developed RonSQL, which supports pushdown aggregation on a single table. Many pieces were already there, but extending from single-table aggregation to complex join queries was still a major undertaking.

The Development Approach

In his blog post How I Use Claude Code, Boris Tane describes the importance of planning your work before handing it to Claude. That is definitely true, but for a task of this complexity, even more structure was needed.

Divide and Conquer: Four Modules

The task naturally breaks down into four modules:

  1. Local Database — Where the actual aggregation happens and intermediate results are stored during query execution
  2. Coordinator — Distributes query fragments across nodes and coordinates their execution
  3. API — The application interface through which clients interact with the system
  4. SQL — Transforms SQL statements into query plans sent to the NDB API and coordinator
RonDB Pushdown Join Aggregation Architecture SQL Layer Transforms SQL statements into query plans Query Plan NDB API Layer Application interface for query execution Signals Coordinator Top-level query coordination Query Fragments Sub-coordinator 1 Node-level coordination Sub-coordinator 2 Node-level coordination Sub-coordinator N Node-level coordination Local DB Node 1 Aggregation + Storage Local DB Node 2 Aggregation + Storage Local DB Node N Aggregation + Storage

Figure 1: The five-layer architecture of Pushdown Join Aggregation in RonDB

Each module had to be developed separately. I started with the local database part, where the core aggregation logic lives.

Architecture First, Then Implementation

I began by asking Claude for an architecture description, providing the fundamentals of how I wanted the aggregation handling to work — something I had been thinking about for many years. Claude produced a phased development plan. The original plan contained six phases; by the end, I had gone through 15–20 phases with constant refinements.

Claude-Assisted Development Workflow Iterative module development cycle 1. Architecture Plan Define modules & interfaces 2. Implementation Plan Break into phases 3. Code with Claude Iterative implementation 4. Review Critical Code Performance & correctness 5. Write Unit Tests Signal-based test programs 6. Expand Test Coverage Benchmarks & edge cases Iterate & Refine Key Insight: You are the architect and reviewer. Claude handles volume; you ensure correctness and performance.

Figure 2: The iterative development workflow when working with Claude

From Implementation to Testing

After about two to three days, the local database implementation was ready. At that point I realised that Claude made it possible to unit test the new code — something that would have been prohibitively expensive before. I started a RonDB cluster and wrote a client that could send and receive signals directly, bypassing the real NDB API. With some modifications to the debug build of RonDB, I had a working unit test framework. It took slightly longer than expected since Claude needed to learn a few things about writing this kind of test program — very few existing test programs did similar things.

Scaling Up with Parallel Sessions

After writing the first test case, I wanted three things: deeper test coverage, a performance benchmark, and support for aggregation with CASE statements (very common in real-world queries, but not yet supported in single-table aggregation). Each of these was a self-contained mini-project that needed a test program similar to the one I had already built.

I had learned that Claude spends significant time thinking, so to maximise productivity, I launched three parallel Claude sessions — one for each task. All three were completed within two to three hours, even though I started late in the evening. The next morning, I could build a real-world benchmark running TPC-H Q12.

Parallel Claude Sessions: Maximising Throughput Three concurrent tasks completed in 2-3 hours Initial Test Program First working test case Session 1 Expanded Test Coverage Edge cases & error paths Session 2 CASE Statement Support Real-world SQL patterns Session 3 Benchmark Program Performance measurement TPC-H Q12 Benchmark Real-world validation next morning

Figure 3: Running three Claude sessions in parallel to maximise development throughput

Key Takeaways

  1. Unit testing distributed systems is now feasible — even with limited budgets, Claude can generate the 3–5x test code volume needed alongside your implementation.
  2. Divide your task into modules — start from the low-level parts and build upward. In this case, beginning with the local database layer worked best.
  3. Architecture first, then implementation — for each module, start by asking Claude for an architecture plan, then an implementation plan.
  4. Expect many iterations — plan for constant reviews of the code Claude produces, especially the performance-critical parts.
  5. Start with a simple test, then expand — write a basic test program first, then use it as a template for comprehensive coverage, benchmarks, and edge cases.

Your New Role: Architect, Manager, and Performance Expert

Claude can be remarkably productive when used correctly, but your role as a programmer fundamentally changes. You become an architect and manager while simultaneously needing to understand code at the deepest level. Learning low-level performance characteristics is just as important as it ever was — the performance-critical parts must still be fully understood by the developer. Claude can assist, but you need to know how to direct it.

Let Claude handle what it does best: building hash tables, linked lists, and other data structures it probably understands better than most developers. Let Claude suggest approaches where you are not certain of the best path forward. But keep the architectural vision firmly in your own hands.

Teaching Claude About Your Codebase

Programming with Claude is teamwork where you are the director, but your assistant has deep knowledge in some areas and can quickly absorb new information. Sometimes, though, it needs your high-level understanding to truly grasp what is happening in the code. Do not expect the code itself to describe all the details — the architecture is often invisible when you dive into the implementation. If it is invisible to a human, it is likely invisible to Claude as well.

To manage the knowledge Claude builds, we developed a structure using a root CLAUDE.md file that indexes all the domain knowledge in a directory called claude_files, with one subdirectory for each area we have built knowledge about. This is an early approach to managing institutional knowledge for AI assistants, but it is an important consideration for any team adopting these tools.


This article was written by the author and refined with Claude.

Monday, December 08, 2025

RonDB development moves on

 A few months ago we released RonDB 24.10 with 11 new features.

Development of RonDB doesn't stop there. We have continued developing RonDB 25.10, whether this release will be a LTS release or merely an intermediate release depends on the needs of our customers. RonDB 24.10 is currently being integrated into Hopsworks 4.6 and will imminently be released.

The work on RonDB 25.10 has taken up some challenges that have been lingering for many years. One of those are the number of API nodes supported, the maximum number of nodes have been limited to 255. For most users this is enough, but for massively large clusters and setups where one uses sharding of many clusters this limit is a bottleneck. Thus in RonDB 25.10 we increased the number of nodes to 2039. It is also very straightforward to extend this limit to higher values, but for now this should be sufficient.

Sorted index scans have previously been single-threaded which have limited the speed of such scans. With RonDB 25.10 we have increased the parallelism at least by a factor 3-4 and in some setups probably even more than that.

In Hopsworks RonDB tables are used to store features in feature groups. It is fairly common to have hundreds and even thousands of features in a feature group. To support this we now support up to 4096 columns in RonDB, this is also the maximum columns supported by MySQL. This feature is now ready for inclusion into RonDB 25.10.

While working on more columns it was natural to also extend the maximum row size. Columns in RonDB are separated into 3 parts. Fixed size in-memory columns, these can at most be 8052 bytes in size. Variable-sized in-memory columns and dynamic in-memory columns, these can at most be 32000 bytes. Finally disk columns can at most be 31120 bytes. MySQL have a limit of 65536 bytes in row size. Extending these parts is future work.

We have also a new Vector Search feature ready for inclusion, this allows a vector search on a traditional index scan or a full table scan to do a search for nearest neighbour combined with normal filters on the table. Vector index is future work.

We are also working on extending the REST API server with new features and more elaborate handling of rate limits and more things will come as well.

Thursday, September 25, 2025

Datagraph releases an extension of RonDB, a Common Lisp NDB API

Datagraph develops a Graph database called Dydra that can handle SPARQL, GraphQL and Linked Data Platform (LDP). Dydra stores a revisioned history, this means that you have access to the full history of your data. This could be a development of some document, a piece of software, a piece of HW like a SoC (System-on-a-Chip) or a building or something else. Essentially any data.

This blog describes this development by the team at Datagraph.

Traditionally Dydra has used a memory-mapped key-value store for this. Now Hopsworks and Dydra have worked together for a while to provide features in RonDB that makes it possible to run Dydra on a highly available platform which is distributed that makes it possible to parallelise many of the searches.

RonDB is distributed key-value store with SQL capabilities. Traditionally distributed key-value stores offer the possibility to read and write the data in highly efficient manners using key lookups. RonDB offers this capability as well with extremely good performance (RonDB showed how to achieve 100M key lookups per second using a REST API ). However, the SQL capabilities mean that RonDB can also push down filters and projections to the RonDB data nodes. This means that searches can be parallelised.

Thus, RonDB will also be able to handle complex joins efficiently in many cases. Some of the queries in TPC-H (a standard analytical database benchmark) can be executed 50x faster in RonDB compared to using MySQL/InnoDB.

Now working with Dydra on their searches we realised that they store data structures in columns using the data type VARBINARY. SQL doesn't really have any way to define searches on complex data structures inside a VARBINARY.

When using RonDB there are many ways to access it. Many people find the use of MySQL APIs to be the preferrable method. These are APIs that are well known and there is plentiful of literature on how to use hem. However, RonDB is a key-value store as well, this means that a lower-level interface is much more efficient.

The base of all interactions with RonDB is through the C++ NDB API. On top of this API there is a Java API called ClusterJ, there is a NodeJS API called Database Jones. As mentioned there are the MySQL APIs as RonDB is an NDB storage engine for MySQL. With RonDB 24.10 we introduced also a C++ REST API server that can be used to retrieve batches of key lookups at very low latency. There is even an experimental Redis interface for RonDB that we call Rondis, it is integrated in the RonDB REST API server (RDRS).

In 2022, Datagraph released one more option: to use Common Lisp bindings for the C++ NDB API. With the release of RonDB 24.10, they just released a much improved version of the cl-ndbapi for RonDB 24.10.

As discussed above a Dydra query often entails a scan operation where one has to analyse the content of the VARBINARY column. In a first step all this functionality was performed by shipping the VARBINARY to the Common Lisp environment. This gave pretty decent performance, but we realised we could do better.

RonDB has had a simple interpreter for things such as filters, auto increment, and the like. However, to make complex analysis of VARBINARY columns we needed to extend the RonDB interpreter.

MySQL has a similar feature where one can integrate a C program into MySQL called user-defined functions (UDF). However this has two implications if we were to use a similar thing for RonDB, first it is a security issue, this program could easily crash the RonDB data nodes and this is in conflict with the high availability features of RonDB. The second issue is that RonDB is a distributed architecture, so the program would be required on every RonDB data node, thus complicating the installation process of RonDB.

Instead we opted for the approach of extending the RonDB interpreter. The RonDB interpreter has 8 registers; these registers store 64-bit signed integers. An interpreted execution always has access to a single row, it cannot acccess any other rows or data outside the interpreter. Interpreted execution has several steps, one can first ready columns, next execute the interpreted program, next one can write some columns and finally one can again read columns. In this manner one can combine normal simple reads with an interpreted program. In MySQL the interpreted program is used to execute WHERE clauses to filter away those rows not interesting for the query. The program can also have a section of input parameters making it possible to reuse an interpreted program with different input. It is also possible to return calculated results using output parameters.

To handle the new requirements the RonDB interpreter was extended with a memory area of a bit more than 64 kB.

To ensure that one can handle a generic program RonDB added a long list of new operations like Shift Left/Right, multiplication, divison, modulo and so forth. In addition instructions to read columns into the memory area and even read only parts of a column if desired. Similarly instructions to write columns.

Dydra used these new approaches and saw a handsome improvement to the results delivered by RonDB.

Now analysing the use case for Dydra we found that they used some variants of binary search on parts of the VARBINARY columns. Thus RonDB also implemented a set of higher level instruction such as binary search, search intervals, memory copy and text-to-number conversion and vice versa.

Using those new instructions Dydra saw a bit more improvements. Those new instructions also ensure that the interpreted programs are quicker to develop. As requirements for other algorithms arise it is fairly easy to add new instructions to the RonDB interpreter and should be possible for other community developers.

The most innovative part of the new Common Lisp NDB API is the handling of the interpreted instructions. It contains a language-to-language compiler, so you can write the interpreted program as a Lisp program using normal IF, WHEN and COND (IF, ELSE constructs in Lisp). You can even decide to run the program in the client using Lisp (mainly for testing and debugging) or push it down to RonDB for execution in the RonDB data nodes (for speed).

One benchmark that Dydra used to evaluate RonDB performance compared MySQL/InnoDB using an UDF with using RonDB using pushdown of the evaluation. The data set consisted of 4.5M rows where essentially all rows were scanned and for each row one executed a program that checked if the row was visible in the revision asked for. About 2% of the rows were returned.

In MySQL/InnoDB the query took 8.89 seconds to execute, in RonDB the query took 0.51 seconds to execute. Thus a nice speedup of around 17 times. Most of the speedup is dependent on the amount of parallelism used in RonDB. The MySQL execution is single-threaded. The cost of scanning one row in MySQL/InnoDB and in RonDB is very similar, RonDB is a bit faster, but there is not a major difference in speed.