Mikael Ronstrom

2026-02-13T20:53:00.002+01:00

Developing RonDB with Claude: Lessons from 30 Years of Database Engineering

After 30 years of developing MySQL NDB Cluster and RonDB, a new tool has fundamentally changed how I write code. Here is what I have learned about working with Claude on a large-scale distributed database project.

A New Era for Database Development

A while ago I got a new tool that completely changes the way I develop code. I have worked on developing MySQL NDB Cluster — and now the fork RonDB — for more than 30 years. Over that time, many of the original findings about software engineering have shifted. In the 1990s, I learned that unit testing was important. However, I quickly discovered that in a startup with limited resources, it was not feasible to write unit tests for distributed database functionality. I even wrote a tool for generating distributed unit test programs, but the overhead remained too high.

With Claude, this equation completely changes. I can now modify 1,000 lines of code in a day — often more — and in parallel, instruct Claude to write 3–5x as many lines of test code for comprehensive unit testing. Claude does not just improve my coding productivity; it also makes the path to high-quality code faster by enabling test coverage that was previously impractical.

Will AI Cause Unemployment?

Some ask a philosophical question: does this mean the world will see unemployment due to AI coding tools? Personally, I think not. For me, it simply means that features I have been wanting to build for 20+ years are now suddenly possible. The only reason for unemployment would be if humanity ran out of ideas for what to develop next. I do not believe that will ever happen — just look into space and realise that God's creation is far too vast for us to explore, even with 1,000x more compute power than we have today. There will always be a new thing to understand and develop around the next corner.

A Word of Caution

Not all my experiences with AI coding have been positive. We had a REST API server written in Go that needed to be ported to C++. The AI performed a straightforward translation, but this created significant performance issues and produced code that was unreadable unless you had studied every new C++ feature in the latest standard. The translation took two months; fixing the resulting issues took six months. In retrospect, writing the C++ implementation from scratch would likely have been more efficient than using AI translation.

The lesson: AI works best when you guide it with clear architectural direction, not when you use it for mechanical translation without oversight.

Getting Started: The RonDB CLI

My first real attempt at using Claude for RonDB was creating a CLI tool. A colleague initally created it and I developed it further. I realised how well-suited this type of boilerplate code was for AI assistance. I extended our REST API, added new CLIs for Rondis, the REST API, and even a MySQL client interface. This was straightforward work — it would have been fairly easy even without Claude — but it still would have taken two to three months. With Claude, it was done within a week.

The Big Challenge: Pushdown Join Aggregation

Encouraged by the CLI experience, I decided to tackle something far more ambitious. For over ten years, I had wanted to develop Pushdown Join Aggregation in NDB/RonDB. This feature would allow complex join queries with aggregation to execute directly in the data nodes rather than pulling data up to the MySQL Server. However, it was a task that would have taken a year or more, so it never rose high enough on the priority list. With Claude, I estimated I could complete it in one to two months.

Background

NDB/RonDB already had two key building blocks in place. First, Pushdown Join has been supported for a long time, enabling complex join queries to run with higher parallelism and improving performance by more than 10x compared to executing them via the MySQL Server. Second, we developed RonSQL, which supports pushdown aggregation on a single table. Many pieces were already there, but extending from single-table aggregation to complex join queries was still a major undertaking.

The Development Approach

In his blog post How I Use Claude Code, Boris Tane describes the importance of planning your work before handing it to Claude. That is definitely true, but for a task of this complexity, even more structure was needed.

Divide and Conquer: Four Modules

The task naturally breaks down into four modules:

Local Database — Where the actual aggregation happens and intermediate results are stored during query execution
Coordinator — Distributes query fragments across nodes and coordinates their execution
API — The application interface through which clients interact with the system
SQL — Transforms SQL statements into query plans sent to the NDB API and coordinator

Figure 1: The five-layer architecture of Pushdown Join Aggregation in RonDB

Each module had to be developed separately. I started with the local database part, where the core aggregation logic lives.

Architecture First, Then Implementation

I began by asking Claude for an architecture description, providing the fundamentals of how I wanted the aggregation handling to work — something I had been thinking about for many years. Claude produced a phased development plan. The original plan contained six phases; by the end, I had gone through 15–20 phases with constant refinements.

Figure 2: The iterative development workflow when working with Claude

From Implementation to Testing

After about two to three days, the local database implementation was ready. At that point I realised that Claude made it possible to unit test the new code — something that would have been prohibitively expensive before. I started a RonDB cluster and wrote a client that could send and receive signals directly, bypassing the real NDB API. With some modifications to the debug build of RonDB, I had a working unit test framework. It took slightly longer than expected since Claude needed to learn a few things about writing this kind of test program — very few existing test programs did similar things.

Scaling Up with Parallel Sessions

After writing the first test case, I wanted three things: deeper test coverage, a performance benchmark, and support for aggregation with CASE statements (very common in real-world queries, but not yet supported in single-table aggregation). Each of these was a self-contained mini-project that needed a test program similar to the one I had already built.

I had learned that Claude spends significant time thinking, so to maximise productivity, I launched three parallel Claude sessions — one for each task. All three were completed within two to three hours, even though I started late in the evening. The next morning, I could build a real-world benchmark running TPC-H Q12.

Figure 3: Running three Claude sessions in parallel to maximise development throughput

Key Takeaways

Unit testing distributed systems is now feasible — even with limited budgets, Claude can generate the 3–5x test code volume needed alongside your implementation.
Divide your task into modules — start from the low-level parts and build upward. In this case, beginning with the local database layer worked best.
Architecture first, then implementation — for each module, start by asking Claude for an architecture plan, then an implementation plan.
Expect many iterations — plan for constant reviews of the code Claude produces, especially the performance-critical parts.
Start with a simple test, then expand — write a basic test program first, then use it as a template for comprehensive coverage, benchmarks, and edge cases.

Your New Role: Architect, Manager, and Performance Expert

Claude can be remarkably productive when used correctly, but your role as a programmer fundamentally changes. You become an architect and manager while simultaneously needing to understand code at the deepest level. Learning low-level performance characteristics is just as important as it ever was — the performance-critical parts must still be fully understood by the developer. Claude can assist, but you need to know how to direct it.

Let Claude handle what it does best: building hash tables, linked lists, and other data structures it probably understands better than most developers. Let Claude suggest approaches where you are not certain of the best path forward. But keep the architectural vision firmly in your own hands.

Teaching Claude About Your Codebase

Programming with Claude is teamwork where you are the director, but your assistant has deep knowledge in some areas and can quickly absorb new information. Sometimes, though, it needs your high-level understanding to truly grasp what is happening in the code. Do not expect the code itself to describe all the details — the architecture is often invisible when you dive into the implementation. If it is invisible to a human, it is likely invisible to Claude as well.

To manage the knowledge Claude builds, we developed a structure using a root CLAUDE.md file that indexes all the domain knowledge in a directory called claude_files, with one subdirectory for each area we have built knowledge about. This is an early approach to managing institutional knowledge for AI assistants, but it is an important consideration for any team adopting these tools.

This article was written by the author and refined with Claude.

RonDB development moves on

2025-12-08T14:11:00.001+01:00

A few months ago we released RonDB 24.10 with 11 new features.

Development of RonDB doesn't stop there. We have continued developing RonDB 25.10, whether this release will be a LTS release or merely an intermediate release depends on the needs of our customers. RonDB 24.10 is currently being integrated into Hopsworks 4.6 and will imminently be released.

The work on RonDB 25.10 has taken up some challenges that have been lingering for many years. One of those are the number of API nodes supported, the maximum number of nodes have been limited to 255. For most users this is enough, but for massively large clusters and setups where one uses sharding of many clusters this limit is a bottleneck. Thus in RonDB 25.10 we increased the number of nodes to 2039. It is also very straightforward to extend this limit to higher values, but for now this should be sufficient.

Sorted index scans have previously been single-threaded which have limited the speed of such scans. With RonDB 25.10 we have increased the parallelism at least by a factor 3-4 and in some setups probably even more than that.

In Hopsworks RonDB tables are used to store features in feature groups. It is fairly common to have hundreds and even thousands of features in a feature group. To support this we now support up to 4096 columns in RonDB, this is also the maximum columns supported by MySQL. This feature is now ready for inclusion into RonDB 25.10.

While working on more columns it was natural to also extend the maximum row size. Columns in RonDB are separated into 3 parts. Fixed size in-memory columns, these can at most be 8052 bytes in size. Variable-sized in-memory columns and dynamic in-memory columns, these can at most be 32000 bytes. Finally disk columns can at most be 31120 bytes. MySQL have a limit of 65536 bytes in row size. Extending these parts is future work.

We have also a new Vector Search feature ready for inclusion, this allows a vector search on a traditional index scan or a full table scan to do a search for nearest neighbour combined with normal filters on the table. Vector index is future work.

We are also working on extending the REST API server with new features and more elaborate handling of rate limits and more things will come as well.

Datagraph releases an extension of RonDB, a Common Lisp NDB API

2025-09-25T22:31:00.001+02:00

Datagraph develops a Graph database called Dydra that can handle SPARQL, GraphQL and Linked Data Platform (LDP). Dydra stores a revisioned history, this means that you have access to the full history of your data. This could be a development of some document, a piece of software, a piece of HW like a SoC (System-on-a-Chip) or a building or something else. Essentially any data.

This blog describes this development by the team at Datagraph.

Traditionally Dydra has used a memory-mapped key-value store for this. Now Hopsworks and Dydra have worked together for a while to provide features in RonDB that makes it possible to run Dydra on a highly available platform which is distributed that makes it possible to parallelise many of the searches.

RonDB is distributed key-value store with SQL capabilities. Traditionally distributed key-value stores offer the possibility to read and write the data in highly efficient manners using key lookups. RonDB offers this capability as well with extremely good performance (RonDB showed how to achieve 100M key lookups per second using a REST API ). However, the SQL capabilities mean that RonDB can also push down filters and projections to the RonDB data nodes. This means that searches can be parallelised.

Thus, RonDB will also be able to handle complex joins efficiently in many cases. Some of the queries in TPC-H (a standard analytical database benchmark) can be executed 50x faster in RonDB compared to using MySQL/InnoDB.

Now working with Dydra on their searches we realised that they store data structures in columns using the data type VARBINARY. SQL doesn't really have any way to define searches on complex data structures inside a VARBINARY.

When using RonDB there are many ways to access it. Many people find the use of MySQL APIs to be the preferrable method. These are APIs that are well known and there is plentiful of literature on how to use hem. However, RonDB is a key-value store as well, this means that a lower-level interface is much more efficient.

The base of all interactions with RonDB is through the C++ NDB API. On top of this API there is a Java API called ClusterJ, there is a NodeJS API called Database Jones. As mentioned there are the MySQL APIs as RonDB is an NDB storage engine for MySQL. With RonDB 24.10 we introduced also a C++ REST API server that can be used to retrieve batches of key lookups at very low latency. There is even an experimental Redis interface for RonDB that we call Rondis, it is integrated in the RonDB REST API server (RDRS).

In 2022, Datagraph released one more option: to use Common Lisp bindings for the C++ NDB API. With the release of RonDB 24.10, they just released a much improved version of the cl-ndbapi for RonDB 24.10.

As discussed above a Dydra query often entails a scan operation where one has to analyse the content of the VARBINARY column. In a first step all this functionality was performed by shipping the VARBINARY to the Common Lisp environment. This gave pretty decent performance, but we realised we could do better.

RonDB has had a simple interpreter for things such as filters, auto increment, and the like. However, to make complex analysis of VARBINARY columns we needed to extend the RonDB interpreter.

MySQL has a similar feature where one can integrate a C program into MySQL called user-defined functions (UDF). However this has two implications if we were to use a similar thing for RonDB, first it is a security issue, this program could easily crash the RonDB data nodes and this is in conflict with the high availability features of RonDB. The second issue is that RonDB is a distributed architecture, so the program would be required on every RonDB data node, thus complicating the installation process of RonDB.

Instead we opted for the approach of extending the RonDB interpreter. The RonDB interpreter has 8 registers; these registers store 64-bit signed integers. An interpreted execution always has access to a single row, it cannot acccess any other rows or data outside the interpreter. Interpreted execution has several steps, one can first ready columns, next execute the interpreted program, next one can write some columns and finally one can again read columns. In this manner one can combine normal simple reads with an interpreted program. In MySQL the interpreted program is used to execute WHERE clauses to filter away those rows not interesting for the query. The program can also have a section of input parameters making it possible to reuse an interpreted program with different input. It is also possible to return calculated results using output parameters.

To handle the new requirements the RonDB interpreter was extended with a memory area of a bit more than 64 kB.

To ensure that one can handle a generic program RonDB added a long list of new operations like Shift Left/Right, multiplication, divison, modulo and so forth. In addition instructions to read columns into the memory area and even read only parts of a column if desired. Similarly instructions to write columns.

Dydra used these new approaches and saw a handsome improvement to the results delivered by RonDB.

Now analysing the use case for Dydra we found that they used some variants of binary search on parts of the VARBINARY columns. Thus RonDB also implemented a set of higher level instruction such as binary search, search intervals, memory copy and text-to-number conversion and vice versa.

Using those new instructions Dydra saw a bit more improvements. Those new instructions also ensure that the interpreted programs are quicker to develop. As requirements for other algorithms arise it is fairly easy to add new instructions to the RonDB interpreter and should be possible for other community developers.

The most innovative part of the new Common Lisp NDB API is the handling of the interpreted instructions. It contains a language-to-language compiler, so you can write the interpreted program as a Lisp program using normal IF, WHEN and COND (IF, ELSE constructs in Lisp). You can even decide to run the program in the client using Lisp (mainly for testing and debugging) or push it down to RonDB for execution in the RonDB data nodes (for speed).

One benchmark that Dydra used to evaluate RonDB performance compared MySQL/InnoDB using an UDF with using RonDB using pushdown of the evaluation. The data set consisted of 4.5M rows where essentially all rows were scanned and for each row one executed a program that checked if the row was visible in the revision asked for. About 2% of the rows were returned.

In MySQL/InnoDB the query took 8.89 seconds to execute, in RonDB the query took 0.51 seconds to execute. Thus a nice speedup of around 17 times. Most of the speedup is dependent on the amount of parallelism used in RonDB. The MySQL execution is single-threaded. The cost of scanning one row in MySQL/InnoDB and in RonDB is very similar, RonDB is a bit faster, but there is not a major difference in speed.

How to design a DBMS for Telco requirements

2025-08-27T17:25:00.002+02:00

My colleague Zhao Song presented a walkthrough of the evolution of the DBMSs and how it relates to Google Spanner, Aurora, PolarDB and MySQL NDB Cluster.

I had some interesting discussions with him on the topic and it makes sense to return to the 1990s when I designed NDB Cluster and the impact on the recovery algorithms from the requirements for a Telco DBMS.

A Telco DBMS is a DBMS that operates in a Telco environment, this DBMS is involved in each interaction with the Telco system through smartphones such as call setup, location updates, SMS, Mobile Data. If the DBMS is down it means no service available for smartphones. Obviously there is no time of day or night when it is ok to be down. Thus even a few seconds of downtime is important to avoid.

Thus in the design of NDB Cluster I had to take into account the following events:

DBMS Software Upgrade
Application Software Upgrade
SW Failure in DBMS Node
SW Failure in Application Service
HW Failure in DBMS Node
HW Failure in Application Service
SW Failure in DBMS Cluster
Region Failure

It was clear that the design had to be a distributed DBMS, in Telcos it was not uncommon to build HW Redundant solutions with a single node but redundant HW. But this solution will obviously have difficulties with SW failures. Also it requires very specialised HW which costs hundreds of million of dollars to develop. Today this solution is very rarely used.

One of the first design decisions would be to choose between a disk-based DBMS and an in-memory DBMS. This was settled by the fact that latency requirement was to handle transactions involving tens of rows within around 10 milliseconds, thus with the hard drives of those days not really possible. Today with the introduction of SSDs and NVMe drives there is still a latency impact of at least 3x in using disk drives compared to using an in-memory DBMS.

If we play with the thought of using a Shared Disk DBMS using modern HW we still have a problem. The Shared Disk solution requires a storage solution which is a Shared Nothing solution. In addition Shared Disk DBMS commit by writing the REDO log to the Shared Disk. This means at recovery we need to replay part of the REDO log to allow the node to take over after a failed node. Thus since the latest state of some disk pages is only available in the Shared Disk, we cannot serve any transactions of these pages until we replayed the REDO log. This used to be a period of around 30 seconds, it is shorter now, but it is still not good enough for the Telco requirements.

Thus we have settled for a Shared Nothing DBMS solution using in-memory tables. The next problem is how to handle replication in a Shared Nothing. The replication sends REDO logs or something similar to this towards the backup replicas. Now one has a choice, either one applies the REDO logs immediately or one only writes them to the REDO log and applies them later.

Again applying them later means that we will suffer downtime if the backup replica is forced to take over as primary replica. Thus we have to apply the REDO logs immediately as part of the transaction execution. This means we are able to takeover within milliseconds after a node failure.

Failures could happen in two ways, most SW failures will be discovered by the other nodes in the cluster immediately. In this case node failures are discovered very quickly. However in particular HW failures can lead to silent failures, here one is required to use some sort of I-am-alive protocol (heartbeat in NDB). The discovery time here is a product of the real-time properties of the operating system and of the DBMS.

Now transaction execution can be done using a replication protocol such as PAXOS where a global order of transactions is maintained or through a non-blocking 2PC protocol. Both are required to handle failures of the coordinator through a leader-selection algorithm and handling the ongoing transactions that are affected by this.

The benefits of the non-blocking 2PC is that it can handle millions of concurrent transactions since the coordinator role can be handled by any node in the cluster. There is no central point limiting the transaction throughput. To be a non-blocking 2PC it is required to handle failed coordinators by finishing ongoing transactions using a take-over protocol. To handle cluster recovery an epoch transaction is created that regularly creates consistent recovery points. This epoch transaction can also be used to replicate to other regions even supporting Active-Active replication using various Conflict Detection Algorithms.

So the conclusion of how to design a DBMS for Telco requirements is:

Use an in-memory DBMS
Use a Shared Nothing DBMS
Apply the changes on both primary replica and backup replica as part of transaction
Use non-blocking 2PC for higher concurrency of write transactions
Implement Heartbeat protocol to discover silent failures in both APIs and DBMS nodes
Implement Take-over protocols for each distributed protocol, especially Leader-Selection
Implement Software Upgrade mechanisms in both APIs and DBMS nodes
Implement Failure Handling of APIs and DBMS nodes
Support Online Schema Changes
Support Regional Replication

The above implementation makes it possible to run a DBMS with Class 6 availability (less than 30 seconds of downtime per year). This means that all SW, HW and regional failures, including the catastrophic ones are accounted for within this 30 seconds per year.

MySQL NDB Cluster has been used at this level for more than 20 years and continues to serve billions of people with a highly available service.

At Hopsworks MySQL NDB Cluster was selected as the platform to build a highly available real-time AI platform. To make MySQL NDB Cluster accessible for anyone to use we forked it and call it RonDB. RonDB has made many improvements of ease-of-use, scalable reads, creating a managed service that makes it possible to easily install and manage RonDB. We have also added a set of new interfaces, a REST server to handle batches of lookups for generic database lookups and for feature store lookups, RonSQL to handle optimised aggregation queries that are very common in AI applications and finally an experimental Redis interface called Rondis.

Check out rondb.com for more information, you can try it out and if you want to read the latest scalable benchmark go directly here. If you want to have a walkthrough of the challenges in running a highly scalable benchmark you can find it here.

Happy reading!

How to reach 100M Key lookups using REST server with Python clients

2025-08-21T11:53:00.004+02:00

A few months ago I decided to run a benchmark to showcase how RonDB 24.10 can handle 100M Key lookups per second using our REST API server from Python client. This exercise is meant to show both how RonDB can scale to handle throughput requirements as well as latency requirements for Personalised Recommendation systems that are commonly used by companies such as Spotify, E-commerce sites and so forth.

The exercise started at 2M Key lookups per second. Running a large benchmark like this means that you hit all sorts of bottlenecks. Some of the bottlenecks are due to configuration issues, some are due to load balancers, some due to quota constraints and networking within the cloud vendor, some are due to bugs and yet some required some new features in RonDB. It also includes a comparison of VM types using Intel, AMD and ARM CPUs. It also included managing multiple Availability Zones.

I thought reporting on this exercise could be an interesting learning also for others, so the whole process can be found in this blog.

At rondb.com you can find other blogs about RonDB 24.10 and you can even try out RonDB in a Test Cluster. You can start a small benchmark and check 12 dashboards of monitoring information about RonDB while it is running.

Release of RonDB 22.10.7

2024-12-05T20:46:00.000+01:00

Today we released a new release of the stable series of RonDB. This version RonDB 22.10.7 is mostly a bug fix release, but also contains a few new features that were required for the Kubernetes integration of RonDB.

The major development in RonDB is currently around RonDB 24.10 which is aimed for a first release in 1-2 months.

RonDB 22.10.7 contains the following new features:

RONDB-789: Find out memory availability in a container

RonDB uses Automatic Memory Configuration as default. In this setting RonDB will discover the amount of memory available and allocate most of the available memory to the RonDB data nodes. In Linux using VMs or bare metal servers this information is found in /proc/meminfo. However running in a container the information is instead stored in /sys/fs/cgroup/memory.max. The setting of the amount of memory to be available is set in the RonDB Helm charts and can thus now be automatically detected without extra configuration variables. Setting TotalMemoryConfig will still override the discovered memory size.

RONDB-785: Set LocationDomainId dynamically

With RonDB Kubernetes support it is very easy to setup the cluster in such a way that nodes in the RonDB cluster are spread in several Availability Zones (Availability Domains in Oracle Cloud). In order to avoid sending network messages over Availability Zone boundaries more than necessary we try to locate the transaction coordinator in our domain and read data from our domain if possible.

To avoid complex Kubernetes setups this required the ability to set the domain in the RonDB data node container using a RonDB management client command. In RonDB we use a Location Domain Id to figure out which Availability Zone we are in. How this is set is up to the management software (Kubernetes and containers in our case).

This features makes it possible to access RonDB through a network load balancer that chooses a MySQL Server (or RDRS server) in the same domain, this MySQL Server will contact a RonDB data node in the same domain and finally the RonDB data node will ensure that it reads data from the same domain. Thus we can completely avoid any network messages that passes over domain boundaries for key lookups that reads.

RONDB-784: Performance improvement for Complex Features in Go REST API Server

Bump GO version to 1.22.9.
Use hamba avro as a replacement for linkedin avro library to deserialize complex features
Avoid json.Unmarshal when parsing complex feature field
Use Sonic library to serialize JSON before sending to the client.

This feature cuts latency of complex feature processing to half of what it used to be. This significantly improves latency of Feature Store REST API lookups.

RONDB-776: Changed hopsworks.schema to use TEXT data type

Impacts the GO REST API Server and its Feature Store REST API.

Coroutines in RonDB,

2024-10-16T17:56:00.002+02:00

A while ago C++ standard added a new feature to C++ 20 called coroutines. I thought it was an interesting thing to try out for RonDB and used some time this summer to read more about it. My findings was that C++ coroutines can only be used for special tasks that require no stacks. The problem is that a coroutine cannot save the stack.

My hope was to find that I could have a single thread that could work using fibers where the fiber belongs to one thread and the thread could switch between different fibers. The main goal of fibers would be to improve throughput by doing work instead of blocking the CPU on cache misses. However fibers can also be a tool to allow a scalable server where the process runs in a single thread when the load is low, and scale up to hundreds of threads (if the process has access to that many CPUs) when required. This will provide better power efficiency, better latency at low loads. Also if we ever get VMs that can dynamically scale up and down the number of CPUs we can use fibers to scale up and down the number of threads in this case.

My read up on C++ 20 coroutines was that it could not deliver on this.

However my read up found an intriguingly simple and elegant solution to the problem. See this blog for a description and here is the GitHub tree with the code. So a small header file of around 300 lines solves the problem elegantly for x86_64 both on Macs and on Linux and similarly for ARM64. Thus all the platforms RonDB supports. The header file can also be used on Windows (Windows supports fibers).

I developed a small test program to see the code in action:

#include <iostream>

#include "tiny_fiber.h"

/**

* A very simple test program checking how fibers and threads interact.

* The program will printout the following:

hello from fibermain

hello from main

hello from fibermain 2

hello from main 2

tiny_fiber::FiberHandle thread_fiber;

tiny_fiber::FiberHandle fiber;

void fibermain(void* arg) {

tiny_fiber::FiberHandle fiber =

*reinterpret_cast<tiny_fiber::FiberHandle*>(arg);

std::cout<<"hello from fibermain"<<std::endl;

tiny_fiber::SwitchFiber(fiber, thread_fiber);

std::cout<<"hello from fibermain 2"<<std::endl;

tiny_fiber::SwitchFiber(fiber, thread_fiber);

}

int main(int argc, char** argv) {

const int stack_size = 1024 * 16;

thread_fiber = tiny_fiber::CreateFiberFromThread();

fiber = tiny_fiber::CreateFiber(stack_size, fibermain, &fiber);

tiny_fiber::SwitchFiber(thread_fiber, fiber);

std::cout<<"hello from main"<<std::endl;

tiny_fiber::SwitchFiber(thread_fiber, fiber);

std::cout<<"hello from main 2"<<std::endl;

return 0;

}

Equipped with this I have the tools I need to develop an experiment and see how fibers works with RonDB. Good news is that I need no learn any complex C++ syntax to do this. It is all low level system programming. I have learnt through long experience that it is not a certain success if you have a theory. A computer is sufficiently complex to not understand the impact of changes that one does. So I am excited to see how this particular new idea works out.

The concept of fibers fits very nicely into the RonDB runtime scheduler and the division of work between threads. It even provides the ability for a thread to be turned into a fiber and moved to another OS thread and it can be returned to its original thread again as well.

Early design choices for RonDB and InnoDB

2024-10-04T00:34:00.002+02:00

I have had many interesting discussions with Zhao Song about RonDB and its internals. Since both Zhao and myself also worked on MySQL/InnoDB features as well, it becomes natural that we sometimes compare the features of RonDB with the features in InnoDB.

In this blog I will discuss what is the basis for the very different solutions that we have in RonDB compared to what we find in InnoDB. Since this blog is mostly about the early history of RonDB and InnoDB I will use the NDB name which is still the name of the product Oracle develops and that RonDB is a fork of.

The story starts more than 30 years ago in the early 1990s. I was just starting my Ph.D studies of databases. I was working at Ericsson, the world's leading telecom provider. In the late 1980s the telecom industry started using databases in the telecom applications. The first application was the network databases for enterprise companies called SCPs (Service Control Point). These network databases was used when calling a company number to intelligently control who would pick up the phone. These numbers are still in use today, 020 numbers in Sweden and 800 numbers in the US.

Traditionally DBMSs had not been used in the telecom industry due to the real-time requirements that the application had on the DBMS. This started to change in the early 1990s when many managers at Ericsson started to understand how important it was to be on top of this new technology in the telecom networks. Actually Ericsson started 4 projects within 10 years to develop new DBMS engines. The first was DBS, an internal SQL database used in AXE switches, the second was DBN, developed as part of a large research project called AXE-N. The third DBMS developed was NDB Cluster and finally the fourth one was Mnesia. All these DBMSs are still in active use, DBS still in AXE systems, DBN renamed to TelORB and used in a number legacy applications within Ericsson, NDB Cluster is nowadays known as MySQL NDB Cluster and RonDB is fork of this DBMS and finally Mnesia is an open source DBMS for Erlang applications.

At the same time the database industry had introduced the relational database and Oracle had won this category in the early 1990s. However Oracle wasn't anything that could be used in the telecom applications in the 1990s.

My personal start in the database market was to develop a course in DBS (Database Subsystem) a new subsystem in the telecom switch AXE that used a compiler to compile SQL queries to assembly (always primary key lookup queries). Actually I still think it might be the most efficient SQL implementation in the world. A SQL Select query could be translated to as little as 2 assembler instructions!

However my research started more in earnest with a EU research project on 3G mobile networks that I participated in 1992-1995. I worked in the network simulation task where we analysed the requirements on the various network nodes in the telecom network. I focused on the requirements of the network databases in the 3G mobile network. The network database was used to track where the mobile was through location updates, the services of the mobile was also stored in the network database.

So from these requirements through network simulations, it was clear that the network database had to handle multiple queries within a time span of about 10 millisecond. The telecom network would be completely dependent on that these network databases would be available at all times. Downtime meant no phone calls for the mobiles, obviously unacceptable.

There was lots of speculation about the killer application for mobiles, today we know this is the smartphone with all its apps. At this time we speculated on multimedia email, on-demand news and I also looked into genealogy applications.

This meant that I had a pretty good idea about the performance requirements, the latency requirements, the availability requirements and the storage requirements of any future telecom DBMS.

In the database industry at the same time there had been lots of research on how to build recovery algorithms. The most important report was written by C. Mohan at IBM that presented the ARIES algorithm.

Both InnoDB and NDB (yes, it is short for Network DataBase) were invented and developed in the 1990s (Ericsson was really keen on three-letter abbreviations). I don't know all the details about the reasons behind InnoDB, but it developed a fairly straightforward architecture based on the knowledge in the 1990s. Thus it has used many ideas from ARIES and from the development of scalable disk-based B+trees. Essentially providing an open source solution that enterprise databases at the time was built like.

NDB on the other hand tried to solve a new problem. How to write an always available DBMS with very low latency and making extremely efficient use of CPUs to ensure that the performance requirements could be met. Much of the early prototypes of NDB used special hardware (Dolphin SCI) for networking that made it possible already in the 1990s to have response times within less than a millisecond.

The requirements and the research led to the following decisions.

The DBMS must primarily be an in-memory DBMS to meet both performance and latency requirements.
The Always Available means that it must have failover times that are instant for Software failures and at most a few seconds at Hardware failures.
The complexity of the Availability requirements meant that a traditional 2-phase locking was used since it simplified the reasoning in recovery algorithms.
The DBMS must avoid context switches as much as possible. This led to NDB becoming the first database that used an internal asynchronous programming style. This led to both superior performance and latency. NDB never lost a benchmark competition, not even when the rules were set by a competing department! The basis of this asynchronous technology came from the AXE system that used blocks (modules) and signals (messages) as main concept in the programming style.
The most important query in a telecom DBMS is the key lookup for read and write.
The DBMS must have a hash index built for efficient use CPU caches as the main index.
SQL was not a suitable programming API for the network database, it required too much overhead for key lookup queries.

If you read those requirements, you probably understand why NDB Cluster became the first Key-value data store, even 10-15 years before the concept was even introduced. Our sales guy went into a market where database was equivalent to use of SQL, so the marketing was mainly done through showing off significant performance benefits and this continues unto this day, RonDB is still the most performant key-value DBMS in the world.

In ARIES the idea was to use the REDO log to roll forward and then using the UNDO log to roll back to a state where there are only completed transactions left. In addition Compensation Log records was used. The writing of the log had to abide by the WAL (Write Ahead Log) algorithm.

Both REDO and UNDO was page-oriented logs, that could sometimes use logical writes within each page.

This meant extra overhead on logging for updating transactions. It is normal that 100 transactions updating 1 row in a DBMS is much more costly than running 1 transaction with 100 rows updated. Not so in NDB, in NDB the cost is more or less the same for the two, can even be more efficient to split into small transactions in a large cluster.

NDB stored transaction state in memory and thus no dirty state was written to the database memory. Thus only a REDO log was required and this was logical, thus only storing changed columns, no need to store all the columns unless if they were all updated.

NDB used a two-phase commit protocol, we used a combination of linear 2-phase commit protocol (between replicas of a row) and a normal 2-phase commit protocol between operations. To avoid blocking states the transaction state can be rebuilt at node failures to ensure that transaction can be finished quickly even in crash situations.

Thus we are ready to make some comparisons between InnoDB and NDB.

1. InnoDB use a traditional ARIES algorithm with REDO and UNDO log.

NDB use a logical REDO log.

2. InnoDB was designed as a single node DBMS that uses replication algorithms on top to handle availability.

NDB is a distributed DBMS that is designed for high availability environments and HA is built into the product, thus replication is part of the architecture.

3. InnoDB is a traditional disk-based DBMS

NDB is an in-memory DBMS

4. InnoDB failover times depends on the time it takes to roll forward the log after a crash.

NDB failover time is instant.

5. InnoDB use a traditional B+tree, thus best used for small number of larger queries

NDB tables always have a distributed hash table as main index for efficient key lookups NDB have shown already 11 years ago the ability to handle 200M queries per second.

6. InnoDB use a traditional OS model with lots of threads interacting through mutexes, condition variables and atomic variables. (The author spent a few years making this architecture scale to 64 CPUs).

NDB used as a base a single-threaded without context switches to execute a query. It has now scaled this up to a set of single-CPU threads that interact and can scale to hundreds of CPUs.

7. InnoDB was designed with most of the focus on access to disk pages being as efficient as possible.

NDB was designed with data structures that focused on minimising the number of CPU cache misses. NDB can still execute with 2-4x more instructions per CPU cycle compared to traditonal DBMSs. In the early days the difference was even bigger. All disk writes are sequential in nature and thus very efficient.

Both NDB and InnoDB have developed a lot since its early days in the 1990s and both Zhao and myself have participated in both InnoDB and NDB developments. The main reason for their difference is that they focus on solving quite different problems.

Zhao has spent the last year implementing pushdown aggregates in RonDB meaning that aggregate queries can be evaluated right at the time when we access the data and can also be parallelised. This means that those queries can be 10-20x faster in RonDB than previously in NDB and about 5-10x faster than in MySQL/InnoDB.

RonDB is now focused on the requirements of AI applications. This still means a focus on key loookups, but also specialised aggregate queries and a lot of data changes that flows in and out of the RonDB database. The high availability is still a very important requirement. Nowadays RonDB can also stores large parts of the rows on disk using a traditional disk page cache.

In conclusion InnoDB is a very capable database backend for traditional database applications. It has been able to handle competition from many competing products, both within MySQL and outside.

RonDB based on NDB is also a very capable database backend for real-time applications with high-availability surpassing that accomplished with the Oracle DBMS and a perfect fit for the new era of AI applications. It has been used to develop an LDAP server on top of it, an SQL database, a distributed file system and many HA applications. Thus RonDB is an important tool in your toolbox as developer of the most demanding applications.

But InnoDB and RonDB serve very different customer segments, thus their differences simply comes from serving different customer requirements and this has led to quite different technology choices.

Rate limits and Quotas in RonDB

2024-09-11T22:12:00.001+02:00

Hopsworks-RonDB Background

One of the services that Hopsworks provides is a free service to run Hopsworks workloads in a managed cloud. This service has been used by many thousands of individuals and companies wanting to experiment with AI Lakehouse applications of various sorts such as predicting weather in your location and other experimental machine learning applications.

This Hopsworks service means that thousands of projects can run concurrently on one Hopsworks cluster. A Hopsworks cluster uses RonDB for three different things. It is used as an online feature store. This means that users import data from their data pipelines and some of this is directed to the online feature store and some of it is directed towards the offline feature store. The data in the online feature store is used for machine learning inferencing in real-time applications. This could be services such as personalised search services, credit fraud analysis and many, many other applications.

The offline feature store uses HopsFS, a distributed file system that is built on top of RonDB. It stores the file data in a backend storage system, often using storage systems such as S3, Scality, Google Cloud Storage, Azure Blob Storage or other storage systems. The metadata of the file system is stored in RonDB.

The data in HopsFS can be used by Hudi, Apache Iceberg, DuckDB and other query services for training machine learning models and performing batch inferencing.

Thirdly RonDB is also used to store metadata about the features, job control and many other things that are used to operate Hopsworks.

Having a multi-tenant DBMS with thousands of concurrent projects running in one RonDB cluster is obviously a challenge. RonDB is required to provide response times down to less than a millisecond and tens of milliseconds while fetching a batch of hundreds of rows to serve a single personalised search request.

Tables in RonDB can store the payload data which could be hundreds of features in one table (feature group) either in memory for best latency and performance or it could be stored in a disk column that provides a cheaper storage at a bit higher cost of accessing the data.

To handle these thousands of projects each project has one database in RonDB. In order to handle many tables in RonDB we have added the capability to configure RonDB to support hundreds of thousands of tables concurrently.

The figure below shows how the data server side is implemented by the RonDB data nodes together with the RonDB management server. The HopsFS access RonDB through ClusterJ, the native Java NDB API. The applications can either access RonDB using a set of MySQL servers or through the RonDB REST API Server. The REST API server delivers capabilities for key lookups and batched key lookups in real-time. It also provides in RonDB 24.10 a new endpoint of RonSQL. RonSQL can handle simple SQL queries that retrieve aggregate information from a table in RonDB. The same query in RonSQL is about 20x times faster than sending it through the MySQL Server.

Why Rate Limits and Quotas?

To manage thousands of concurrent users in a real-time DBMS with each using only a fraction of a CPU is indeed a challenge. In RonDB 24.10 which will be released in a few weeks we have added the capability to limit the use of CPUs, memory and disk space per database.

Managing memory and disk space is fairly straightforward. RonDB tracks each and every memory page and disk page used by a specific database. When the database has reached its limit, it will no longer be possible to insert, write and update the data. It will still be possible to delete and read the data.

Managing CPU resources is handled by keeping track of the CPU usage for each database in real-time. As long as the application using the database is within the limits of its CPU rate, the operation works as normal.

If the application tries to use more CPU than it has requested it will soon be discovered. At this point RonDB needs to slow down this project to ensure that all the other projects get a real-time service. The more overload the project tries to create, the more it will be slowed down. If the slow down doesn't work, eventually the project will not be able to complete any queries in RonDB until it has paid back its CPU "debt". Each time interval the database pays the "debt" and if things go as normal without rate limitation this will put the "debt" back to zero every time interval.

When defining the rate limits we measure it in microseconds of CPU time per second per data node. Thus in a 2-node RonDB cluster setting the rate limit to 100.000 means you get access to 0.1 CPUs in each data node. Setting it to 1000 means you get access to 1 millisecond of CPU time per second per data node. Thus 0.001 CPUs in each data node.

Different projects can have very different requirements, one could require 0.001 CPUs and another one could 1.5 CPUs, these databases can easily co-exist in the RonDB cluster and they will both be able to get a reliable real-time service.

An example of how this works was that we ran a Sysbench OLTP RW benchmark and set the rate limit to 100.000 (0.1 CPU per data node), this made it possible to run 330 TPS (thus 6600 SQL queries per second delivering almost 150.000 rows to the application). This TPS was achieved whatever the number of threads was used from 1 thread to 256 thread. Increasing the rate limit 500.000 meant that we got 5x more TPS.

Obviously this service can also be useful for a company with many departments that want to share one Hopsworks cluster as well.

Now this shows how we can operate the actual data servers in RonDB. The data server clients are stateless clients and can thus scale up and down as needs come and go. In addition the clients in Hopsworks can access the data server clients using load balancers. Together with Hopsworks 4.0 that is operated by Kubernetes it means that RonDB 24.10 will be extremely flexible in scaling up and down CPU, memory and disk on the data server side as well as CPU on the client side.

875X improvement from RonDB 21.04.17 to 22.10.4

2024-05-27T20:21:00.004+02:00

At Hopsworks we are working on ensuring that the online feature store will be able to perform complex join operations in real-time. This means that queries that could use data from multiple tables can be easily integrated into machine learning applications.

Today most feature stores use key-value stores like Redis and DynamoDB. These systems have no capability to issue complex join queries, if this is required the feature store will have to write complex code to handle this and this is likely to involve multiple roundtrips and thus cause unwanted latency.

Hopsworks feature store uses RonDB as its online feature store. RonDB can handle any SQL operations that MySQL can handle. Actually RonDB has even support for parallelising the join queries and pushing the filtering and joining down to the RonDB data nodes where data resides.

This means that users of the Hopsworks feature store can integrate more features from multiple feature groups in online inferencing requests. This means that things credit fraud detection can be made much more intelligent by taking more features into account in the inferencing requests.

This means that performance of real-time join queries becomes more important in RonDB. To evaluate how RonDB develops in this are I ran a set of tests using TPC-H queries from DBT3 against RonDB 21.04.17 and RonDB 22.10.4 (not released yet). I also ran tests against MySQL 8.0.35 (RonDB 22.10.4 is based on MySQL 8.0.35 with loads of added RonDB features).

The results were interesting, the improvement in Q20 was the highest I have seen in my career. The performance improved from 70 seconds to 80 milliseconds, thus an 875x speedup or 87500% improvement. Q2 had a 360x improvement. So RonDB 22.10.4 is much better equipped for more complex queries compared to RonDB 21.04. MySQL 8.0.35 had similar performance to RonDB 22.10.4 with an average of around 20% slower, this is mostly due to performance improvements in RonDB, not algorithmic changes.

When using complex queries the query optimiser tries to find an optimal plan, sometimes however better plans are available and one can add hints in the SQL query to ensure a better plan is used.

The RonDB team isn't satisfied with this however, we have realised that evaluating aggregation is also very important when the online feature store stores a time window of certain features. This means that RonDB can compute aggregate dynamically and thus provide more accurate predictions.

Early tests of some simple single table queries showed an improvement of 4-5x and we expect we will be able to get to 10-20x improvements in quite a few queries of this sort.

What's cooking in RonDB

2024-05-24T19:52:00.000+02:00

Here is a short update on what is going on in the RonDB development. We recently launched the new RonDB release 22.10.2.

We are now working on a number of major improvements and new features.

The first feature we are working on is what we call Pushdown Aggregation. In the first step it will be able to perform Pushdown of aggregation on a single table. This is useful for Hopsworks Feature Store in that it will enable us to store time windows of certain features and compute aggregates as an online feature. This means that data will live in RonDB for a certain time and after that it will be deleted. So a part of this feature will be a Time-To-Live (TTL) feature that will enable users to declare what time window they want the data to visible. The deletion process will be handled by the new C++ REST API Server that will replace the current Go REST API Server. The C++ REST API Server have superior performance to the Go REST Server and will have exactly the same features with a set of new ones added as well.

So Pushdown Aggregation contains five parts, first a new interpreter that calculates the aggregates as part of scan operations. Second new additions to the NDB API to be able to issue those aggregation queries towards the RonDB data nodes. Third a new SQL engine that can execute SQL queries using aggregates on a single table. Fourth, the ability to call this SQL engine through a REST call and finally the TT'L feature to ensure that the rows can automatically disappear when no longer needed by the online data store.

The second feature we are working is a Kubernetes operator for RonDB. This will make it possible to start and stop RonDB clusters in on-prem settings, cloud settings and hybrid cloud settings. In addition it will contain Autoscaling of both MySQL Servers, REST servers and RonDB data nodes. Eventually it will also support global replication between multiple RonDB clusters.

We are also working on a Rate limit and Quota feature that will enable setting limits on both memory and disk usage and the amount of CPU usage a specific database is allowed to use.

So in short a number of very interesting new additions to RonDB that will make it even easier to use RonDB and prosper from the use of it.

There is also some minor features added to support even more pushdown of calculations that makes it possible to push filters based on values of array objects.

New LTS version of RonDB, RonDB 22.10.2

2024-03-22T12:15:00.001+01:00

After a very thorough development and test period we are proud to announce the general availability of the RonDB 22.10 LTS serie with the release of RonDB 22.10.2 today. There is also a new version of the old LTS version RonDB 21.04.16 released today.

A complete list of the new features is provided in the RonDB Documentation.

The most important new feature is the support of variable sized disk rows. This means a very significant saving of disk space, up to 10x more data can be stored on the same disk space as with RonDB 21.04. In addition RonDB 22.10 contains a major quality improvement using disk columns. This means that RonDB is now prepared for the introduction of features in the Hopsworks Feature Store using disk columns. This will provide significant cost savings in storing lots of features in the Hopsworks Online Feature Store.

Another important feature in RonDB 22.10 is that all major data structures are now using the Global Data Manager, this ensures that the memory management inside RonDB is much more flexible. This was the final step of a long project described here.

As usual we have also been working on performance in this new version. Write throughput has been siginficantly improved, enterprise applications using read locks will see greater scalability and also throughput at very high load has been significantly improved leading to up to 30% better throughput in RonDB 22.10.2 compared to RonDB 21.04. This improvement comes from further improvements to ideas presented in this blog. This change makes it possible to be much more flexible in using CPU resources. More details on performance comes later.

In this blog we have described the testing process that we have had with RonDB. RonDB 22.10 is already running in production in a number of installations and is part of Hopsworks 3.7 released recently. This Hopsworks version introduces support of fine-tuning LLMs for GenAI. This version of Hopsworks also supports multi-region support.

Come to our webinar where we will present more information about RonDB and what it can be used for.

Since Oracle now made MySQL 8.0 an LTS version we plan to merge the changes from MySQL 8.0 series into future RonDB 22.10 versions. Bug fixes and features of interest to Oracle is contributed back to MySQL.

For those interested in following the development of RonDB in real-time the tree is here, the branch where RonDB 22.10.2 is found is called 22.10.1. The development branch of RonDB 22.10 is found in 22.10-main. We are also working on new features of RonDB in forks of this tree by the developers. 22.10-main already have a new feature supporting more than 20k table objects and an even more improved thread model providing an improvement of around 5% better throughput. There is also long-term development of rate limits and quotas, enabling RonDB to be used in multi-tenant environments and pushdown of aggregations, enabling a speedup of 10-1000x of some queries used in Feature Stores.

Testing of RonDB releases

2024-03-05T17:40:00.000+01:00

Since RonDB is a fork of MySQL NDB Cluster it contains a lot of tests that is part of the RonDB development tree. This includes unit tests for various functionalities. It includes many hundreds of MTR test cases that takes between a few seconds to a few minutes to run. These tests are mostly test cases that use SQL commands to test the functionality of RonDB, in addition it tests backup and restore and a few other tools in RonDB. These tests are executed with debug compiled binaries, binaries compiled with error injection, binaries compiled for production and finally the binaries we use in the releases.

Another very important part of RonDB testing is the autotests. These tests are using the NDB API to test its functionality, it also has a lot of focus on testing recovery. This test suite contains thousands of tests that takes 36 hours to go through one test run when executed serially. It can be parallelised by running it on multiple clusters. This test suite can also be executed on different configurations with different number of replicas, different number of node groups, different number of CPUs per node and different memory sizes in the nodes.

RonDB is heavily used in Hopsworks. One part of Hopsworks is HopsFS. This is a distributed file system which is built on top of RonDB. It is written in Java and thus interfaces with ClusterJ, the Java API to RonDB that uses an easy to program model of the NDB API. HopsFS has a whole range of test cases related to it that also will be executed on a daily basis, this includes both functional tests and load tests.

RonDB is also used to handle metadata in Hopsworks and it is used as the Online Feature Store in Hopsworks. This means that the Hopsworks users will define new tables and new table structures on the fly. These parts of Hopsworks again have a set of functional tests and load tests.

Next there are upgrade tests verifying that we can perform an online upgrade of RonDB and these tests also include verifying that we can downgrade back to the old version if the upgrade didn't work as it should.

There are test cases also to handle replication to other clusters. This is a very important part of the Hopsworks framework that we support setups with multiple regions.

There are also benchmark suites, mostly Sysbench, DBT2 (~ TPC-C), DBT3 (~TPC-H) and YCSB that we regularly execute.

Hopsworks supports managed RonDB in the cloud. This offering includes support for reconfiguration of the RonDB Cluster as an online operation where we can scale resources such as MySQL Servers, REST API servers, RonDB data nodes. This management framework also has its own set of test suites that is regularly executed.

We are developing a REST API server, it is already completed in a Go version and a new C++ version of it is in development. This adds yet more tests of the RonDB functionality.

The latest addition is that we are now also developing a Kubernetes operator for RonDB. Again this operator contains CI/CD that ensures that every RonDB releases can be handled in this Kubernetes framework.

When a RonDB release is finished it has gone through all of those stages.

After release the RonDB software is used by community users and the Hopsworks customers. Any bugs found by them is immediately fed into the development process. Among other things a community user has added a CommonLisp NDB API to what is supported by RonDB.

As is hopefully clear from this picture a RonDB release is heavily tested before its release. The next LTS version of RonDB will be RonDB 22.10.1. This software have been moving through all these test frameworks and is going to be made into GA very soon. Since this is a new LTS version we have been especially careful in our testing of this version. At the moment the RonDB 22.10.2 version is going through heavy MTR testing.

This hopefully makes it clear that building a DBMS and building a data platform that uses it actually is very beneficial for the quality of the DBMS product. Thus RonDB have been through a much more varied set of tests than most DBMSs are facing that works strictly as a DBMS.

We often find bugs that originates from MySQL NDB Cluster. We try our best to be a good open source citizens by feeding back those as contributions to Oracle so that they can be included in future releases of MySQL NDB Cluster. In our view a bigger community for MySQL NDB Cluster is also good for RonDB.

Similarly of course we benefit from bug fixes that originates from Oracle. We are currently integrating MySQL 8.0.35 and 8.0.36 into RonDB 22.10 series. RonDB 22.10.1 is based on MySQL 8.0.34.

The completion of a 12 year long project in RonDB

2024-01-31T00:38:00.000+01:00

In 2012 a project was started to change the memory model of MySQL NDB Cluster. The first step was some early prototypes developed in 2012 and 2013 by Jonas Oreland. When Jonas left Oracle for Google it took a while before the project got up and running again. The first project was to change the memory model for the operation records used by transactions. This project started in 2015.

It took quite some time to complete. The requirement on maintained performance was high, this required going through the changes ensuring that we either gained or at most lost 1-2% performance. The traditional model used in NDB had a very simple model that had extremely good performance, to maintain the good performance the developer Mauritz experimented with eight different new memory models before settling for a model we call TransientPool. This pool relies on that memory objects are allocated for a short time (typical for short transactions). I assisted Mauritz in ensuring that we maintained performance.

Finishing this step completed most of the framework for the new memory management model. However it only took care of a fairly small part of all memory parts in NDB. Another step was completed around 2018-2019 that finalised all work on operation records. This was the most significant part of the change and the most important one.

When I joined Hopsworks we wanted to avoid having loads of configuration parameters affecting setup of RonDB 21.04. To handle this we simply configure to support 20000 table objects (table, ordered indexes and unique indexes). This still used the old memory management model. In RonDB 22.10 the work was finalised, the final part was to move also all memory related to metadata to the new memory management model (called SchemaMemory) and also the memory used by replication to other RonDB clusters (called ReplicationMemory).

Thus with the release of RonDB 22.10.1 we have finished this very long project transitioning MySQL NDB Cluster to a new memory management model in RonDB. This means that all memory parts share a common memory pool that is allocated at startup. This pool have around 11 different parts and when one part requires much memory it can get from the shared global memory and there is a priority of who gets memory in a situation when the free memory is low.

The new memory management model in RonDB 22.10.1 also includes that one can use a malloc and free-like model to get memory from the different pools. This will be useful for all sorts of new developments in RonDB.

Major update to the RonDB documentation

2024-01-02T16:17:00.005+01:00

My colleague Vincent has spent some time improving the RonDB documentation.

New/rewritten chapters/sections are:

Main page: https://docs.rondb.com
Installing: https://docs.rondb.com/rondb_installation/
Local Quickstart: https://docs.rondb.com/rondb_quickstart_local/
Start Distributed: https://docs.rondb.com/rondb_programs/
Recovery (entire chapter): https://docs.rondb.com/rondb_high_availability/
Two-Phase Commit Protocol: https://docs.rondb.com/rondb_nonblocking_2pc/
Transaction Model (only ACID section, further PR incoming) https://docs.rondb.com/intro_transactions/

Further UI changes/fixes:

Added dark mode
HTTP links are visible again
Recognition of programming language in code snippets (using Lua filter)
Order & naming of chapters
A number of new images based on our Cheetah logo

Presentation of RonDB at Meetup

2023-11-03T08:12:00.001+01:00

For those that didn't have a chance to come to Stockholm and listen to the presentation of RonDB, here are the slides from the presentation.

The presentation presents the Requirements, Architecture, Status of RonDB and its use in Hopsworks and other applications.

Results on comparing new Intel/AMD VMs with older VM types using RonDB

2023-10-26T10:42:00.006+02:00

In Hopsworks cloud offering for GCP one can select a fairly large variety of VM types. I am currently working on extending this list to also include the latest generation of VM types. This blog will focus on the impact of those new VM types for benchmarks using RonDB.

The newer VM types is the c3d-serie that uses AMD EPYC CPUs of the 4th generation and the c3-series which contains VMs using the Intel Saphire Rapid CPUs. Also AWS has introduced similar new VM types, but this blog discuss tests performed on VMs in GCP.

The older VM types we compared with for the MySQL Servers was the n2-standard-16 VM type. This VM uses an Intel Cascade Lake Xeon processor. This represents the second generation Intel Xeon chips whereas Intel Saphire Rapid represents the 4th generation Intel Xeon.

The RonDB data nodes used the e2-highmem-16 as the baseline for comparison. This VM types uses either an Intel Xeon of the second generation or an AMD EPYC of the second generation.

The benchmark used was Sysbench OLTP RW based on version 0.4.12.19 which is included in the RonDB tarball and is setup in the API nodes automatically by our cloud offering. This makes it extremely easy to replicate the benchmarks. We use Consul as a load balancer, so the benchmark process is setup to a single host onlinefs.mysql.service.consul. In reality this address maps to the number of MySQL Servers in the RonDB cluster. We used 3 MySQL Servers in the tests. The setup used 2 RonDB data nodes in one node group.

Thus in the Hopsworks cloud we get a load balanced RonDB Data Service as part of the infrastructure of the Hopsworks Feature Store.

We first executed the benchmark using the old VM types to get a baseline. The next step was to upgrade the RonDB MySQL Servers to use c3d-highmem-16. Thus the same amount of memory and number of CPUs as in n2-standard-16 but upgraded from Intel 2nd generation to AMD 4th generation.

This impacted the throughput mainly. The baseline experiment executed 9000 TPS and was limited by the CPUs in the MySQL Servers (they used 1550% of the 1600% available). The c3d-highmem-16 delivered 11400 TPS but only using 1000% of the available 1600%. Thus the throughput per CPU increased by around 100%. In this execution the bottleneck of the benchmark was the RonDB data nodes.

The benchmark API node was consistently a n2-standard-48 VM. This meant that most communication went from API VM of old type, to MySQL Server of new type, to RonDB data node VM of old type. Thus in all communication an old VM type was involved. The network latency was the same in this experiment as in the baseline experiment.

The change from one VM type was using the Reconfiguration support RonDB have in its Cloud offering. This change is an online operation where the cluster remains operational and the new MySQL Servers are included in the Consul setup as soon as they have started up. Only when nodes are stopped could temporary errors happen that can be handled with a simple retry logic.

Next we changed also the VM type of the RonDB data nodes to be c3d-highmem-16 using the same online reconfiguration as for the MySQL Servers.

What we quickly noted in this setup was that the latency per transaction was cut in half. Thus performance using a single thread decreased to less than half. Thus it is clear that communication between 2 VMs of the new type have more than 100% improvements on network latency. The throughput now increased to 17800 TPS and the bottleneck was now in the MySQL Servers. Thus throughput improvement is almost 98% and network latency improved by more than 100%.

When reading the announcement of the C3 machine series and the description of the C3D machine series, it is clear that the new IPU (Infrastructure Processing Unit) that takes care of offloading networking is a major reason for this improved network latency.

Analysing the Sysbench transaction in this setup there will be around 100 network messages, most of them in serial order. Still the latency of a transaction execution is no more than 6 milliseconds to execute the 20 SQL queries involved in the OLTP RW transaction. Thus a medium of 60 microsecond per message and this includes the time to also execute the RonDB Data node code and the RonDB MySQL Server code.

Next step was to again change MySQL Server VMs. This time we changed to c3-highmem-22. Unfortunately the VM type c3-highmem-16 didn't exist. So the comparison isn't perfect, but at least it gives a good estimate of the improvements in Intel's 4th generation CPUs.

The network latency was the same for Intel and AMD 4th generation VM types. The throughput increased by around 40% up to around 24000 TPS. Since the number of CPUs increased by around 40% as well, it seems that c3-serie and c3d-serie is very similar in handling throughput when used in RonDB MySQL Servers.

To test the throughput of those new VMs we ran the test using c3-highmem-8 and c3d-highmem-8 VM types as RonDB Data node VMs. The performance of those two VM types was almost indistinguishable, to the point where I started wondering if they were the same CPUs. Throughput was half the throughput of the 16 VCPU VMs.

The main conclusion of these tests is that upgrading from 2nd generation x86 CPUs to 4th generation x86 CPUs in the GCP cloud provides a 100% improvement in throughput and a similar improvement of the network latency.

The price of those VMs is higher, but substantially less than 100%. So it makes a lot of sense to start using those new VM types for new applications.

The tests were performed using the RonDB version 21.04.15. We are about to release a new LTS version of RonDB, version 22.10.1. There will be a more thorough benchmark report when this is released.

Release of RonDB 21.04.15

2023-09-29T18:22:00.000+02:00

We have worked hard on ensuring stability and adding the required features for our customers lately. Thus the RonDB 21.04.15 release has reached a very high quality level and will be able to sustain users of it until they desire to upgrade to a newer release of RonDB.

Most of the changes in this release is related to the new REST API server that makes it possible to read using single reads or batch reads using primary key lookups through a REST protocol or through a gRPC protocol. The REST API server also supports reading directly from the Hopsworks Feature Store that takes into account the metadata model of the Hopsworks Feature Store.

Much of the work around RonDB is centered around automated management of RonDB. To this end we have developed the ndb-agent that makes it possible to create a cluster, stop the cluster, start the cluster again, take a backup, delete a backup, restore from backup and finally to reconfigure the cluster as an online operation.

Reconfigure the cluster means adding or removing replicas, increasing the size of data node VMs. It means that MySQL Server VMs can be added, changed and dropped as needed by the application.

All of those operations are already operational and working. We are now working on an improvement that speeds up the change process significantly. Adding a new MySQL Server can now be done in 2-3 minutes and most of this time is spent on creating the new VM in the choosen cloud (Hopsworks supports AWS, GCP and Azure).

The new ndb-agent works in the same fashion as Kubernetes through maintaining a desired state. This means that it is fairly straightforward for the ndb-agent to support both our cloud offering and a Kubernetes setup.

RonDB development is now focused on the new RonDB release 22.10.1. This will introduce 8 new features. The most important feature is supporting variable sized disk columns. RonDB 22.10 has been in development and testing for almost 3 years already, so it is already a very stable release. It brings in addition a number of performance improvements.

The release notes for RonDB 21.04.15.

The full set of new features in RonDB 21.04.

The full set of new features in RonDB 22.10.

The new Hopsworks release also makes use of Replication between RonDB clusters. A Hopsworks cluster can use a single small RonDB cluster and can grow into an Enterprise setup with several large RonDB clusters and replicated between regions far away from each other.

RonDB is used to handle the Online Feature Store, the metadata of the Hopsworks Feature Store and the metadata of HopsFS. HopsFS is the storage of the Offline Feature Store. HopsFS is a distributed file system that can store many petabytes of data in an efficient manner. Hopsworks Offline Feature Store makes use of DuckDB to perform complex analysis of the data to train AI models and perform batch inferencing.

Thus RonDB is a critically important component in the next generation AI system developed at Hopsworks. All large companies around the world is considering how they can build their AI models and supporting system. Hopsworks is providing a platform for those companies, both small and very large companies.

Hopsworks provides a free service where anyone can get a free Hopsworks account at https://app.hopsworks.ai and try out the service themselves.

Modernising a Distributed Hash implementation

2023-08-01T17:28:00.001+02:00

As part of writing my Ph.D thesis about 30 years ago I invented a new distributed hash algorithm called LH^3. The idea is that I apply the hashing in 3 levels. The first level uses the hash to find the table partition where the key is stored, the second level uses the hash to find the page where the key is stored and the final step uses the hash to find the hash bucket where the key is stored.

The algorithm is based on linear hashing and distributed linear hashing developed by Witold Litwin that I had the privilege at the time to have many interesting discussions with. My professor Tore Risch had collaborated a lot with Witold Litwin. I also took the idea of storing the hash value in the hash bucket to avoid having to compare every key from Mikael Pettersson, another researcher at University of Linköping.

The basic idea is described in my Ph.D thesis. The implementation in MySQL Cluster and in RonDB (fork of MySQL Cluster) is still very much similar to this. This hash table is one of the reasons of the good performance of RonDB, it makes sure that the hash lookup normally only hits one CPU cache miss during the hash search.

At Hopsworks we are moving the implementation of RonDB forward with a new generation of developers, in this particular work I am collaborating with Axel Svensson. The best method to learn the code is as usual to rewrite the code. RonDB has access to more memory allocation interfaces compared to MySQL Cluster, so I thought this could be useful.

Interestingly going through the requirements on memory allocations with a fresh mind more or less comes to the same conclusions as 30 years ago. So after 30 years of developing the product one can rediscover the basic ideas underlying the product.

The original implementation made it possible to perform scan operations using the hash index. However this led to a 3x increase of complexity of the implementation. Luckily nowadays one can also scan using the row storage. Thus in RonDB we have removed the possibility to scan using the hash index. This opens up for rewriting the hash index with much less complexity.

A hash implementation thus consists of the following parts, a dynamic array to find the page, a method to handle the memory layout of the page, a method to handle the individual hash buckets and finally a method to handle overflow buckets.

What we found is that the dynamic array can be much more efficiently implemented using the new memory allocation interfaces. The overflow buckets can potentially be handled with other techniques other than just overflow buckets, one could also handle them using recursive hashing.

What we have found is that the idea of using pages and hash buckets inside those pages is still a very good idea for a hash table that must be very adaptable to both increasing sizes and decreasing sizes.

Modern CPUs have new instructions to handle parallel execution of searches, this can be used to speed up the lookup in the hash buckets.

On top of this the hash function used in RonDB is MD5, this is replaced with a new hash function XX3_HASH64 that is about 30x faster.

A new requirement in RonDB compared to MySQL Cluster is that we work with applications that constantly create and drop tables and also the number of tables can be substantial and thus there could be many very small tables. This means that a small table could make use of an alternative and much simpler implementation to save memory.

This is work in progress, it serves a number of purposes, it is a nice way to learn the RonDB code base for new developers, it means that we can save memory for hash indexes, it means that we can make the implementation even more optimised, it simplifies the code thus making it easier to support it and it makes use of the new modern CPU instructions to substantially speed up the hash index lookups.

Number theory for birthdays and IQ tests

2023-07-08T16:02:00.002+02:00

I have always been interested in numbers and playing with them since I was a small kid. Every time someone has a birthday I am always ready to provide an alternative to have the normal decimal birthday. So e.g. having your 100th birthday when you really have your 49 birthday in decimal numbers.

So here is some number theory for birthdays and IQ tests that you can play around with on your vacations days and prepare for future birthdays and IQ tests. Have fun.

First some short introduction to numbers and the number base. When we use numbers we always assume we're counting with decimal numbers. Decimal numbers means that we are using 10 as the base. Thus when we are saying that someone is 25 years old we really mean that he is 2 * 10^1 + 5 * 10^0 = 2 * 10 + 5 year old. If instead someone has his 25 year birthday in octal number what we are saying is that he has his 2 * 8^1 + 5 * 8^0 = 2 * 8 + 5 = 21 in decimal numbers.

So by varying the number base we can have almost any birthday changed into an even birthday. For example when can we say that we have our 100th birthday. The smallest base is 2, this means that our first 100th birthday happens already at our 4th birthday using base 2. Later in life we can have a 100th birthday when we have 9th birthday, 16th birthday, 25th, 36th, 49th, 64th, 81st and 100th. It is very unlikely that someone will celebrate their 100th birthday in base 11 which would happen at age of 121.

Thus celebrating 100 years happens quite a few times, but not very often still.

Other even numbers are more common. We can have our 20th birthday every second year from our 6th birthday. To be 20, the minimum base is 3 since the number 2 cannot be used in base 2 that only have numbers 0 and 1. Thus 2 * 3 + 0 = 6 is the minimum age to become 20.

However after 6 years old you can always have your 20th birthday at any birthday which is an even number. Thus e.g. at age 38 you will be 20 using base 19, 2 * 19 + 0 = 38.

If you want to search for an appropriate age to celebrate on your next birthday, start by dividing your age into a product of prime numbers. So e.g. 38 is the product of 2 and 19 which both are prime numbers. Thus the most even numbers you can get here is 20 in base 19 and 100110 in base 2. If your age is 18 you have more options, this is divided into prime numbers 2,3 and 3 since 18 = 2 * 3 * 3. So here you can have your 200th birthday in base 3 and 10010 in base 2.

However you stumble into issues with the above approach when the age you have achieved is a prime number itself. So for example when your 37th birthday approaches, how will you divide this into an even number to celebrate. The only obvious even number to reach here is 10 years old which can be achieved with any prime number by using the prime number itself as the base.

Here the age 25 comes to the rescue which is seen as an even birthday by most people. Actually we can prove that every birthday with an odd number of years can become 25 in some base if the odd number is at least 17.

Proof: The proof is fairly simple, first of all an odd number is always written as 2 * k + 1 where k is any number. Second the minimum base to use for an age of 25 is 6 since the number 5 doesn't exist in bases 2,3,4 and 5. Thus the first time to have your 25th birthday happens on your 2 * 6 + 5 = 17th birthday.

So choose any odd number larger than or equal to 17. This number can always be written as 2 * k + 1 where k is at least 8. But it can also be written as 2 * (k - 2) + (1 + 2 * 2) = 2 * (k - 2) + 5 = 25 in base k - 2. Thus to calculate the number base to use one calculates:

(Odd - 5) / 2. Thus with 37 you get (37 - 5) / 2 = 16. Thus at your 37th birthday you have 25th birthday in base 16.

Isn't it nice to know that you can always claim to be 20 or 25 years old after reaching 17 years of age for the rest of your life :)

Have fun on future birthday in figuring what age you want to have this time.

Actually the base 10 was selected in Arabia, in many older cultures the base 12 was used, even some money systems still have the number 12 in them. If you are working with computer programs it is very popular to use hexadecimal numbers using base 16 with digits 1,2,3,4,5,6,7,8,9,A,B,C,D,E and F.

So on to IQ tests. Most of you have seen tests like the below one:

1, 4, 9, 16, ?

This one is fairly easy, it is the square of the index, thus x^2 is the function in this number series. Thus the next number in the series is 5 * 5 = 25.

Let's take a bit more complex number series now.

2, 9, 28, 65, ?

This one is a bit more difficult to see directly, I will give a hint, it is based on the function x^3 + 1. Thus the next number is 5 * 5 * 5 + 1 = 126.

Now let's take another one, we use the function x^2 - 2 * x - 2.

-3, -2, 1, 6, ?

This looks difficult at the outset, but since we know the answer we cheat and simply set it to 5 * 5 - 2 * 5 - 2 = 13.

So how does one solve this type of IQ tests in a quick manner. Well it is fairly simple using difference techniques, a bit like Fibonaccis tree.

So write the difference between the numbers and then the difference of the differences.

In the above calculation we write it up as follows.

-3, -2, 1, 6, 13

1, 3, 5, 7

2, 2, 2

Interesting the difference is simply a linearly increasing function which is very easy to see and the second difference is simply constant so even easier.

We can see that the difference function is simply 2 * x - 2 and the second difference is simply 2.

For those familiar with derivatives, you can see that 2 * x - 2 is the derivative function of x^2 - 2 * x - 2. and 2 is the second derivative of this function.

So now let's try if this works in practice, here is a number series again:

0, 1, 8, 27, ?

We use the difference technique:

0, 1, 8, 27, => 64

1, 7, 19, => 37,

6, 12, => 18

So we write the answer to be 64. Now let's check the answer, the function I used in this case was:

x^3 - 3 * x^2 + 3 * x - 1

Thus using x = 5 we get 5^3 - 3 * 5^5 + 3 * 5 - 1 = 125 - 75 + 15 - 1 = 64

Thus we found the correct answer of a fairly complex IQ test and we can claim to be more intelligent than we really are :)

Have fun in showing off your capabilities in IQ tests.

Status report RonDB development

2023-04-29T02:15:00.005+02:00

What is going on with RonDB development. Actually a lot, but most happens under the radar at the moment. So this blog will give any interested some idea about what is going on.

RonDB core development is further development of the fork of MySQL NDB Cluster. For the most part this development is focused on our production version RonDB 21.04 that is used at numerous companies in production. Development is very centered around supporting the Hopsworks platform. This means that we now have added 27 new features on top of MySQL NDB Cluster and 127 bug fixes. The latest feature is an improvement of the node recovery. This improvement can bring up to 4-8x shorter restart times. This was seen as an important improvement to ensure that Online Reconfiguration of RonDB in our cloud setting is speedy.

We now have 3 main versions of RonDB core. The RonDB 21.04 that we use in production. RonDB 22.10 that is prepared for use in production. It brings the possibility to store 10x more data in RonDB compared to RonDB 21.04 important for large customers and large applications. We have also started work on the next RonDB generation in RonDB 23.04 that is integrated with MySQL 8.0.33 already.

Managed RonDB has been delivered in two steps. The first integrated the possibility to start up, backup, stop and restore a RonDB database. The configuration is specified in numbers of replicas, number of MySQL Servers and type of VMs for the various node types. One can start the cluster either through a UI or through Terraform.

Now the second step is working as well, this step introduces Online RonDB Reconfiguration. One can change the number of replicas, change the VM types of the nodes and increase/decrease the number of MySQL Servers. This is currently an experimental feature available to our customers on request. The change is fully online and has been verified in internal Hackathons where our developers test various Hopsworks features while the RonDB cluster is reconfiguring.

We are now working on a third step that makes changes more efficient and uses the Kubernetes model with desired state. So the cloud specifies the new desired state and the agent software will ensure that the RonDB cluster moves to this new desired state. Anyone can run RonDB in Docker and try out those new changes on their own laptops.

Those steps are also available using Docker with the rondb-docker github tree. We use Docker as a development platform making it easy to test thousands of state transformations at various levels. Soon there will be videos and blogs describing how to use Docker to test RonDB Reconciliation that will be accessible from the github tree.

It doesn't stop there, a major focus is currently on developing the first version of the RonDB REST API server. This makes it easy to access RonDB using a REST service in parallel with the MySQL Server and more efficient NDB API applications. We have already seen a great interest in this API even before it is completed.

We are also working on automating replication between clusters in different regions.

As usual there is also a set of interesting product ideas on how to improve the RonDB core with even more flexibility in growing and shrinking, making use of SIMD operations to speed up various parts of RonDB and some thoughts on long-term development projects as well.

As usual a benchmark or two is in the works as well. These are further developments of the benchmark described on www.rondb.com where we show throughput and latency of YCSB both in normal operations as well as during recovery.

Laptop vs Desktop for RonDB development

2023-03-23T12:42:00.003+01:00

Most developers today use laptops for all development work. For the last 15 years I have considered desktops and laptops to be very similar in performance and use cases. This is no longer the case as I will discuss in this blog.

Personally I use a mix of laptops and desktops. For me the most important thing as a developer is the screen resolution and the speed of compilation. But I have now found that desktops can be very useful for the test phase of a development project, in particular the later testing phases.

Many years ago I read that one can increase productivity by 30% by using a screen with higher resolution thus fitting more things at the same time on the screen. Personally I have so far found 27" screens to be the best size, larger size means neck pain and smaller means that productivity suffers. The screen resolution should be as high as your budget allows.

My experience is that modern laptops can be quite efficient in compilation. There is very little difference in compilation time towards desktops.

However recently I tested running our new RonDB Docker clusters on laptops and desktops. What I have seen is that the performance of these tests can differ up to 4x.

I think the reason for this large difference is that desktops can sustain high performance for a long time. Some modern desktops can handle CPUs that use more than 200W whereas most laptops will be limited to about 45W. For a compilation that only runs for about 5 minutes and have some serialisation the difference becomes very small. The most important part for compilation is how fast the CPU is on single-threaded performance and that it can scale the compilation to a decent number of CPUs.

However running a development environment for RonDB means running a cluster on a single machine where there are two data node processes, two MySQL server processes and a management process and of course any number of application processes. A laptop can handle this nicely and the performance for a single-threaded application is the same for laptop and desktop. However when scaling the test to many threads the laptop hits a wall whereas the desktop simply continues to scale.

The reason is twofold, the desktop CPUs can have more CPU cores. Most high-end laptops today have around 8-10 CPU cores. The high-end desktops today however goes to around 16-24 CPU cores. In addition the desktop can usually handle more than 4x as much power. The power difference and the core difference delivers a 4x higher throughput in heavy testing.

Thus my conclusion is that laptops are good enough for the development phase together with an external screen. However when you enter the testing phase when you need to run heavy functional tests and load tests on your software a desktop or a workstation will be very useful.

In my tests on a high-end desktop I ran a Sysbench OLTP RW benchmark using the RonDB Docker environment, I managed to run up to 15.000 TPS. This means running 300.000 SQL queries per second towards the MySQL servers and the data nodes. The laptop could handle roughly 25% of this throughput.

Obviously the desktop could be a virtual desktop in the modern development environment. But a physical machine is still a lot more fun.

RonDB is part of the Hopsworks Feature Store platform.

3 commands to start a RonDB cluster

2023-03-02T18:09:00.001+01:00

RonDB is a key-value store with SQL capabilities. We are working on making it really easy to develop applications against RonDB. You can now get a RonDB cluster up and running using 3 commands on your development machine assuming you have Docker installed there.

Here are the commands:

1. git clone https://github.com/logicalclocks/rondb-docker rondb-docker

2. cd rondb-docker

3. ./run.sh

Prerequisites is that you have git installed and Docker or Docker Desktop. Using Docker Desktop and a new Resource Extension one can see the usage of the various containers in both memory and CPU usage. Using it on Windows also requires WSL 2 to be installed.

If you are using Windows it is important that you have set it to use WSL 2 as the engine. One might also have to activate WSL 2 integration with the Linux distribution you are using in the WSL 2. Both of those can be set from the Docker Desktop settings pages. One need to start a new Linux terminal after changing those settings before it actually works.

When trying it on Windows 11 it has worked like a charm for me. But trying it on Windows 10 had issues with firewalls preventing the MySQL Server to start. Feel free to post comments to this blog if you found issues and workarounds for those.

The run.sh command will create the docker image by pulling it from DockerHub. It is a download of a several hundred MBytes, so the time takes depends on the speed of your interconnection. Next it starts a RonDB cluster with 1 MGM server, 2 MySQL Server and 2 Data nodes.

When it started you can access the MySQL Servers on port 15000 and 15001 using a normal MySQL client or the application you are developing.

To access the MySQL Servers you can run the below command using a MySQL client.

mysql --protocol=tcp --user=mysql --host=localhost --port=15000 -p

Enter the password Abc123?e and you are connected to the MySQL Server and can use it as a normal MySQL client connected to a MySQL Cluster. The mysql user have full access to the ycsb% databases, the sbtest% databases, sysbench% databases and the dbt% databases.

You can enter the docker containers in the normal manner using

docker exec -it docker_id /bin/bash

You find the docker_id using the docker ps command.

You can use the run.sh script to create the RonDB cluster of your choice. It has 5 predefined profiles (mini, small, medium, large, xlarge). All profiles have the same nodes except mini which only creates 1 MySQL Server and 1 data node.

We have tested this using Docker Desktop on Mac OS X, Docker Desktop on Windows using WSL 2 and using Docker on Linux. So most developers should be able to try it out in their environment of choice.

The flagship feature in the new LTS version RonDB 22.10.0

2023-01-10T19:12:00.000+01:00

In RonDB 22.10.0 we added a new major feature to RonDB. This feature means that variable sized disk columns in RonDB are stored in variable sized rows instead of using fixed size rows.

The history of disk data in RonDB starts already in 2004 when the NDB team at Ericsson had been acquired by MySQL AB. NDB Cluster was originally designed as an in-memory DBMS. The reason for this was based on that a disk based DBMS couldn't handle the latency requirements in telecom applications.

Thus NDB was developed using a distributed architecture using Network Durability (meaning that a transaction is made durable by writing the transaction into memory in several computers in a network). Long-term durability of data is achieved by a background process ensuring that data is written to disk.

When the NDB team joined MySQL we looked for many other application categories as well and thus increasing the database sizes NDB could handle was seen as important. Thus we started on developing support for disk-based columns. The design decisions of this design was accepted as a paper at VLDB in Trondheim in 2005.

The use of this feature didn't really take off in any significant manner for a few years since the latency of hard drives and also the performance of hard drives made it too different from the performance of in-memory data.

That problem has been solved by technology development of SSDs and with the introduction of NVMe drives and newer versions of PCI Express 3,4 and now 5. As an anecdote I installed a set of NVMe drives on my workstation capable of handling millions of IOPS and able to deliver 66 GBytes per second of bandwidth to these NVMe drives. However while installing I discovered that I had only 1 memory card which meant that I had 3x more bandwidth to my NVMe drives compared to my memory bandwidth. So in order to make use of those NVMe drives I had to install a number of memory cards to get the required memory bandwidth to handle those NVMe drives.

So with the introduction of NVMe drives the feature became useful, actually one of the main users of this feature is HopsFS, a distributed file system in the Hopsworks platform which uses RonDB for metadata management. HopsFS can use disk columns in RonDB for storing small files.

Performance of disk columns is really good. This blog presents a benchmark with YCSB using disk-based columns in NDB Cluster. We get a bandwidth of more than 1 GByte per second of application data read and written.

The latency on NVMe drives is 100x lower than on hard drives. This means that previously latency on hard drives was a lot more than 100x higher than in-memory latency for database operations. With modern NVMe drives the difference on latency between in-memory columns and disk columns is down to a factor of 2. We analysed performance and latency using the YCSB benchmark and compared it to in-memory columns in this blog.

One problem with the original implementation is that the disk columns was always stored in fixed size rows. In HopsFS we found ways to handle this by using multiple tables for different row sizes.

In a traditional application and in the Feature Store it is very common to store data in variable sized columns. To ensure that the data fits the maximum size of the column can be 10x higher than the average size of the column. Thus we can easily waste 90% of the disk space. This means that to use disk columns in Feature Store applications we have to enable support of variable sized rows on disk.

Thus with the release of the new LTS RonDB version 22.10.0 the disk columns is now as useful as the in-memory columns. They have excellent performance, the latency is very good, even better than the in-memory latency of some competitors and the storage efficiency is now high as well.

This means that with RonDB 22.10.0 we can handle nodes with TBytes of in-memory and many tens of TBytes of disk columns. Thus RonDB can scale all the way up to handling database sizes up to the petabyte level with latency of read and write operations in less than a millisecond.

Summary of RonDB 21.04.9 changes

2023-01-10T17:41:00.001+01:00

RonDB 21.04 main use case is being the base of the data management platform in Hopsworks. As such every now and then some new requirements on RonDB emerges. But obviously the most important feature of development of RonDB 21.04 is on stability.

Hopsworks provides a free Serverless use case to try out the Hopsworks platform. Check it out on https://app.hopsworks.ai. Each user gets their own database in RonDB and can create a number of tables. Then one can load data from various sources using the OnlineFS (a feature using Kafka and ClusterJ to load data from external sources into Feature Groups, a Feature Group is a table in RonDB).

Previously ClusterJ was limited to using only one database per cluster connection which led to a lot of unnecessary connect and disconnect of connections to the RonDB cluster. In RonDB 21.04.9 it is now possible for one cluster connection to use any number of databases.

In addition we did a few changes to RonDB to make it easier to manage RonDB in our managed platform.

In preparation for releasing Hopsworks 3.1 which includes RonDB 21.04.9 we extended the tests for the Hopsworks platform, among other things for HopsFS, a distributed file system that uses RonDB to store metadata and small files. We fixed all issues found in these extended tests and any other problems found in the last couple of months.