Mikael Ronstrom: 2024

Thursday, December 05, 2024

Release of RonDB 22.10.7

Today we released a new release of the stable series of RonDB. This version RonDB 22.10.7 is mostly a bug fix release, but also contains a few new features that were required for the Kubernetes integration of RonDB.

The major development in RonDB is currently around RonDB 24.10 which is aimed for a first release in 1-2 months.

RonDB 22.10.7 contains the following new features:

RONDB-789: Find out memory availability in a container

RonDB uses Automatic Memory Configuration as default. In this setting RonDB will discover the amount of memory available and allocate most of the available memory to the RonDB data nodes. In Linux using VMs or bare metal servers this information is found in /proc/meminfo. However running in a container the information is instead stored in /sys/fs/cgroup/memory.max. The setting of the amount of memory to be available is set in the RonDB Helm charts and can thus now be automatically detected without extra configuration variables. Setting TotalMemoryConfig will still override the discovered memory size.

RONDB-785: Set LocationDomainId dynamically

With RonDB Kubernetes support it is very easy to setup the cluster in such a way that nodes in the RonDB cluster are spread in several Availability Zones (Availability Domains in Oracle Cloud). In order to avoid sending network messages over Availability Zone boundaries more than necessary we try to locate the transaction coordinator in our domain and read data from our domain if possible.

To avoid complex Kubernetes setups this required the ability to set the domain in the RonDB data node container using a RonDB management client command. In RonDB we use a Location Domain Id to figure out which Availability Zone we are in. How this is set is up to the management software (Kubernetes and containers in our case).

This features makes it possible to access RonDB through a network load balancer that chooses a MySQL Server (or RDRS server) in the same domain, this MySQL Server will contact a RonDB data node in the same domain and finally the RonDB data node will ensure that it reads data from the same domain. Thus we can completely avoid any network messages that passes over domain boundaries for key lookups that reads.

RONDB-784: Performance improvement for Complex Features in Go REST API Server

Bump GO version to 1.22.9.
Use hamba avro as a replacement for linkedin avro library to deserialize complex features
Avoid json.Unmarshal when parsing complex feature field
Use Sonic library to serialize JSON before sending to the client.

This feature cuts latency of complex feature processing to half of what it used to be. This significantly improves latency of Feature Store REST API lookups.

RONDB-776: Changed hopsworks.schema to use TEXT data type

Impacts the GO REST API Server and its Feature Store REST API.

Wednesday, October 16, 2024

Coroutines in RonDB,

A while ago C++ standard added a new feature to C++ 20 called coroutines. I thought it was an interesting thing to try out for RonDB and used some time this summer to read more about it. My findings was that C++ coroutines can only be used for special tasks that require no stacks. The problem is that a coroutine cannot save the stack.

My hope was to find that I could have a single thread that could work using fibers where the fiber belongs to one thread and the thread could switch between different fibers. The main goal of fibers would be to improve throughput by doing work instead of blocking the CPU on cache misses. However fibers can also be a tool to allow a scalable server where the process runs in a single thread when the load is low, and scale up to hundreds of threads (if the process has access to that many CPUs) when required. This will provide better power efficiency, better latency at low loads. Also if we ever get VMs that can dynamically scale up and down the number of CPUs we can use fibers to scale up and down the number of threads in this case.

My read up on C++ 20 coroutines was that it could not deliver on this.

However my read up found an intriguingly simple and elegant solution to the problem. See this blog for a description and here is the GitHub tree with the code. So a small header file of around 300 lines solves the problem elegantly for x86_64 both on Macs and on Linux and similarly for ARM64. Thus all the platforms RonDB supports. The header file can also be used on Windows (Windows supports fibers).

I developed a small test program to see the code in action:

#include <iostream>

#include "tiny_fiber.h"

/**

* A very simple test program checking how fibers and threads interact.

* The program will printout the following:

hello from fibermain

hello from main

hello from fibermain 2

hello from main 2

tiny_fiber::FiberHandle thread_fiber;

tiny_fiber::FiberHandle fiber;

void fibermain(void* arg) {

tiny_fiber::FiberHandle fiber =

*reinterpret_cast<tiny_fiber::FiberHandle*>(arg);

std::cout<<"hello from fibermain"<<std::endl;

tiny_fiber::SwitchFiber(fiber, thread_fiber);

std::cout<<"hello from fibermain 2"<<std::endl;

tiny_fiber::SwitchFiber(fiber, thread_fiber);

}

int main(int argc, char** argv) {

const int stack_size = 1024 * 16;

thread_fiber = tiny_fiber::CreateFiberFromThread();

fiber = tiny_fiber::CreateFiber(stack_size, fibermain, &fiber);

tiny_fiber::SwitchFiber(thread_fiber, fiber);

std::cout<<"hello from main"<<std::endl;

tiny_fiber::SwitchFiber(thread_fiber, fiber);

std::cout<<"hello from main 2"<<std::endl;

return 0;

}

Equipped with this I have the tools I need to develop an experiment and see how fibers works with RonDB. Good news is that I need no learn any complex C++ syntax to do this. It is all low level system programming. I have learnt through long experience that it is not a certain success if you have a theory. A computer is sufficiently complex to not understand the impact of changes that one does. So I am excited to see how this particular new idea works out.

The concept of fibers fits very nicely into the RonDB runtime scheduler and the division of work between threads. It even provides the ability for a thread to be turned into a fiber and moved to another OS thread and it can be returned to its original thread again as well.

Friday, October 04, 2024

Early design choices for RonDB and InnoDB

I have had many interesting discussions with Zhao Song about RonDB and its internals. Since both Zhao and myself also worked on MySQL/InnoDB features as well, it becomes natural that we sometimes compare the features of RonDB with the features in InnoDB.

In this blog I will discuss what is the basis for the very different solutions that we have in RonDB compared to what we find in InnoDB. Since this blog is mostly about the early history of RonDB and InnoDB I will use the NDB name which is still the name of the product Oracle develops and that RonDB is a fork of.

The story starts more than 30 years ago in the early 1990s. I was just starting my Ph.D studies of databases. I was working at Ericsson, the world's leading telecom provider. In the late 1980s the telecom industry started using databases in the telecom applications. The first application was the network databases for enterprise companies called SCPs (Service Control Point). These network databases was used when calling a company number to intelligently control who would pick up the phone. These numbers are still in use today, 020 numbers in Sweden and 800 numbers in the US.

Traditionally DBMSs had not been used in the telecom industry due to the real-time requirements that the application had on the DBMS. This started to change in the early 1990s when many managers at Ericsson started to understand how important it was to be on top of this new technology in the telecom networks. Actually Ericsson started 4 projects within 10 years to develop new DBMS engines. The first was DBS, an internal SQL database used in AXE switches, the second was DBN, developed as part of a large research project called AXE-N. The third DBMS developed was NDB Cluster and finally the fourth one was Mnesia. All these DBMSs are still in active use, DBS still in AXE systems, DBN renamed to TelORB and used in a number legacy applications within Ericsson, NDB Cluster is nowadays known as MySQL NDB Cluster and RonDB is fork of this DBMS and finally Mnesia is an open source DBMS for Erlang applications.

At the same time the database industry had introduced the relational database and Oracle had won this category in the early 1990s. However Oracle wasn't anything that could be used in the telecom applications in the 1990s.

My personal start in the database market was to develop a course in DBS (Database Subsystem) a new subsystem in the telecom switch AXE that used a compiler to compile SQL queries to assembly (always primary key lookup queries). Actually I still think it might be the most efficient SQL implementation in the world. A SQL Select query could be translated to as little as 2 assembler instructions!

However my research started more in earnest with a EU research project on 3G mobile networks that I participated in 1992-1995. I worked in the network simulation task where we analysed the requirements on the various network nodes in the telecom network. I focused on the requirements of the network databases in the 3G mobile network. The network database was used to track where the mobile was through location updates, the services of the mobile was also stored in the network database.

So from these requirements through network simulations, it was clear that the network database had to handle multiple queries within a time span of about 10 millisecond. The telecom network would be completely dependent on that these network databases would be available at all times. Downtime meant no phone calls for the mobiles, obviously unacceptable.

There was lots of speculation about the killer application for mobiles, today we know this is the smartphone with all its apps. At this time we speculated on multimedia email, on-demand news and I also looked into genealogy applications.

This meant that I had a pretty good idea about the performance requirements, the latency requirements, the availability requirements and the storage requirements of any future telecom DBMS.

In the database industry at the same time there had been lots of research on how to build recovery algorithms. The most important report was written by C. Mohan at IBM that presented the ARIES algorithm.

Both InnoDB and NDB (yes, it is short for Network DataBase) were invented and developed in the 1990s (Ericsson was really keen on three-letter abbreviations). I don't know all the details about the reasons behind InnoDB, but it developed a fairly straightforward architecture based on the knowledge in the 1990s. Thus it has used many ideas from ARIES and from the development of scalable disk-based B+trees. Essentially providing an open source solution that enterprise databases at the time was built like.

NDB on the other hand tried to solve a new problem. How to write an always available DBMS with very low latency and making extremely efficient use of CPUs to ensure that the performance requirements could be met. Much of the early prototypes of NDB used special hardware (Dolphin SCI) for networking that made it possible already in the 1990s to have response times within less than a millisecond.

The requirements and the research led to the following decisions.

The DBMS must primarily be an in-memory DBMS to meet both performance and latency requirements.
The Always Available means that it must have failover times that are instant for Software failures and at most a few seconds at Hardware failures.
The complexity of the Availability requirements meant that a traditional 2-phase locking was used since it simplified the reasoning in recovery algorithms.
The DBMS must avoid context switches as much as possible. This led to NDB becoming the first database that used an internal asynchronous programming style. This led to both superior performance and latency. NDB never lost a benchmark competition, not even when the rules were set by a competing department! The basis of this asynchronous technology came from the AXE system that used blocks (modules) and signals (messages) as main concept in the programming style.
The most important query in a telecom DBMS is the key lookup for read and write.
The DBMS must have a hash index built for efficient use CPU caches as the main index.
SQL was not a suitable programming API for the network database, it required too much overhead for key lookup queries.

If you read those requirements, you probably understand why NDB Cluster became the first Key-value data store, even 10-15 years before the concept was even introduced. Our sales guy went into a market where database was equivalent to use of SQL, so the marketing was mainly done through showing off significant performance benefits and this continues unto this day, RonDB is still the most performant key-value DBMS in the world.

In ARIES the idea was to use the REDO log to roll forward and then using the UNDO log to roll back to a state where there are only completed transactions left. In addition Compensation Log records was used. The writing of the log had to abide by the WAL (Write Ahead Log) algorithm.

Both REDO and UNDO was page-oriented logs, that could sometimes use logical writes within each page.

This meant extra overhead on logging for updating transactions. It is normal that 100 transactions updating 1 row in a DBMS is much more costly than running 1 transaction with 100 rows updated. Not so in NDB, in NDB the cost is more or less the same for the two, can even be more efficient to split into small transactions in a large cluster.

NDB stored transaction state in memory and thus no dirty state was written to the database memory. Thus only a REDO log was required and this was logical, thus only storing changed columns, no need to store all the columns unless if they were all updated.

NDB used a two-phase commit protocol, we used a combination of linear 2-phase commit protocol (between replicas of a row) and a normal 2-phase commit protocol between operations. To avoid blocking states the transaction state can be rebuilt at node failures to ensure that transaction can be finished quickly even in crash situations.

Thus we are ready to make some comparisons between InnoDB and NDB.

1. InnoDB use a traditional ARIES algorithm with REDO and UNDO log.

NDB use a logical REDO log.

2. InnoDB was designed as a single node DBMS that uses replication algorithms on top to handle availability.

NDB is a distributed DBMS that is designed for high availability environments and HA is built into the product, thus replication is part of the architecture.

3. InnoDB is a traditional disk-based DBMS

NDB is an in-memory DBMS

4. InnoDB failover times depends on the time it takes to roll forward the log after a crash.

NDB failover time is instant.

5. InnoDB use a traditional B+tree, thus best used for small number of larger queries

NDB tables always have a distributed hash table as main index for efficient key lookups NDB have shown already 11 years ago the ability to handle 200M queries per second.

6. InnoDB use a traditional OS model with lots of threads interacting through mutexes, condition variables and atomic variables. (The author spent a few years making this architecture scale to 64 CPUs).

NDB used as a base a single-threaded without context switches to execute a query. It has now scaled this up to a set of single-CPU threads that interact and can scale to hundreds of CPUs.

7. InnoDB was designed with most of the focus on access to disk pages being as efficient as possible.

NDB was designed with data structures that focused on minimising the number of CPU cache misses. NDB can still execute with 2-4x more instructions per CPU cycle compared to traditonal DBMSs. In the early days the difference was even bigger. All disk writes are sequential in nature and thus very efficient.

Both NDB and InnoDB have developed a lot since its early days in the 1990s and both Zhao and myself have participated in both InnoDB and NDB developments. The main reason for their difference is that they focus on solving quite different problems.

Zhao has spent the last year implementing pushdown aggregates in RonDB meaning that aggregate queries can be evaluated right at the time when we access the data and can also be parallelised. This means that those queries can be 10-20x faster in RonDB than previously in NDB and about 5-10x faster than in MySQL/InnoDB.

RonDB is now focused on the requirements of AI applications. This still means a focus on key loookups, but also specialised aggregate queries and a lot of data changes that flows in and out of the RonDB database. The high availability is still a very important requirement. Nowadays RonDB can also stores large parts of the rows on disk using a traditional disk page cache.

In conclusion InnoDB is a very capable database backend for traditional database applications. It has been able to handle competition from many competing products, both within MySQL and outside.

RonDB based on NDB is also a very capable database backend for real-time applications with high-availability surpassing that accomplished with the Oracle DBMS and a perfect fit for the new era of AI applications. It has been used to develop an LDAP server on top of it, an SQL database, a distributed file system and many HA applications. Thus RonDB is an important tool in your toolbox as developer of the most demanding applications.

But InnoDB and RonDB serve very different customer segments, thus their differences simply comes from serving different customer requirements and this has led to quite different technology choices.

Wednesday, September 11, 2024

Rate limits and Quotas in RonDB

Hopsworks-RonDB Background

One of the services that Hopsworks provides is a free service to run Hopsworks workloads in a managed cloud. This service has been used by many thousands of individuals and companies wanting to experiment with AI Lakehouse applications of various sorts such as predicting weather in your location and other experimental machine learning applications.

This Hopsworks service means that thousands of projects can run concurrently on one Hopsworks cluster. A Hopsworks cluster uses RonDB for three different things. It is used as an online feature store. This means that users import data from their data pipelines and some of this is directed to the online feature store and some of it is directed towards the offline feature store. The data in the online feature store is used for machine learning inferencing in real-time applications. This could be services such as personalised search services, credit fraud analysis and many, many other applications.

The offline feature store uses HopsFS, a distributed file system that is built on top of RonDB. It stores the file data in a backend storage system, often using storage systems such as S3, Scality, Google Cloud Storage, Azure Blob Storage or other storage systems. The metadata of the file system is stored in RonDB.

The data in HopsFS can be used by Hudi, Apache Iceberg, DuckDB and other query services for training machine learning models and performing batch inferencing.

Thirdly RonDB is also used to store metadata about the features, job control and many other things that are used to operate Hopsworks.

Having a multi-tenant DBMS with thousands of concurrent projects running in one RonDB cluster is obviously a challenge. RonDB is required to provide response times down to less than a millisecond and tens of milliseconds while fetching a batch of hundreds of rows to serve a single personalised search request.

Tables in RonDB can store the payload data which could be hundreds of features in one table (feature group) either in memory for best latency and performance or it could be stored in a disk column that provides a cheaper storage at a bit higher cost of accessing the data.

To handle these thousands of projects each project has one database in RonDB. In order to handle many tables in RonDB we have added the capability to configure RonDB to support hundreds of thousands of tables concurrently.

The figure below shows how the data server side is implemented by the RonDB data nodes together with the RonDB management server. The HopsFS access RonDB through ClusterJ, the native Java NDB API. The applications can either access RonDB using a set of MySQL servers or through the RonDB REST API Server. The REST API server delivers capabilities for key lookups and batched key lookups in real-time. It also provides in RonDB 24.10 a new endpoint of RonSQL. RonSQL can handle simple SQL queries that retrieve aggregate information from a table in RonDB. The same query in RonSQL is about 20x times faster than sending it through the MySQL Server.

Why Rate Limits and Quotas?

To manage thousands of concurrent users in a real-time DBMS with each using only a fraction of a CPU is indeed a challenge. In RonDB 24.10 which will be released in a few weeks we have added the capability to limit the use of CPUs, memory and disk space per database.

Managing memory and disk space is fairly straightforward. RonDB tracks each and every memory page and disk page used by a specific database. When the database has reached its limit, it will no longer be possible to insert, write and update the data. It will still be possible to delete and read the data.

Managing CPU resources is handled by keeping track of the CPU usage for each database in real-time. As long as the application using the database is within the limits of its CPU rate, the operation works as normal.

If the application tries to use more CPU than it has requested it will soon be discovered. At this point RonDB needs to slow down this project to ensure that all the other projects get a real-time service. The more overload the project tries to create, the more it will be slowed down. If the slow down doesn't work, eventually the project will not be able to complete any queries in RonDB until it has paid back its CPU "debt". Each time interval the database pays the "debt" and if things go as normal without rate limitation this will put the "debt" back to zero every time interval.

When defining the rate limits we measure it in microseconds of CPU time per second per data node. Thus in a 2-node RonDB cluster setting the rate limit to 100.000 means you get access to 0.1 CPUs in each data node. Setting it to 1000 means you get access to 1 millisecond of CPU time per second per data node. Thus 0.001 CPUs in each data node.

Different projects can have very different requirements, one could require 0.001 CPUs and another one could 1.5 CPUs, these databases can easily co-exist in the RonDB cluster and they will both be able to get a reliable real-time service.

An example of how this works was that we ran a Sysbench OLTP RW benchmark and set the rate limit to 100.000 (0.1 CPU per data node), this made it possible to run 330 TPS (thus 6600 SQL queries per second delivering almost 150.000 rows to the application). This TPS was achieved whatever the number of threads was used from 1 thread to 256 thread. Increasing the rate limit 500.000 meant that we got 5x more TPS.

Obviously this service can also be useful for a company with many departments that want to share one Hopsworks cluster as well.

Now this shows how we can operate the actual data servers in RonDB. The data server clients are stateless clients and can thus scale up and down as needs come and go. In addition the clients in Hopsworks can access the data server clients using load balancers. Together with Hopsworks 4.0 that is operated by Kubernetes it means that RonDB 24.10 will be extremely flexible in scaling up and down CPU, memory and disk on the data server side as well as CPU on the client side.

Monday, May 27, 2024

875X improvement from RonDB 21.04.17 to 22.10.4

At Hopsworks we are working on ensuring that the online feature store will be able to perform complex join operations in real-time. This means that queries that could use data from multiple tables can be easily integrated into machine learning applications.

Today most feature stores use key-value stores like Redis and DynamoDB. These systems have no capability to issue complex join queries, if this is required the feature store will have to write complex code to handle this and this is likely to involve multiple roundtrips and thus cause unwanted latency.

Hopsworks feature store uses RonDB as its online feature store. RonDB can handle any SQL operations that MySQL can handle. Actually RonDB has even support for parallelising the join queries and pushing the filtering and joining down to the RonDB data nodes where data resides.

This means that users of the Hopsworks feature store can integrate more features from multiple feature groups in online inferencing requests. This means that things credit fraud detection can be made much more intelligent by taking more features into account in the inferencing requests.

This means that performance of real-time join queries becomes more important in RonDB. To evaluate how RonDB develops in this are I ran a set of tests using TPC-H queries from DBT3 against RonDB 21.04.17 and RonDB 22.10.4 (not released yet). I also ran tests against MySQL 8.0.35 (RonDB 22.10.4 is based on MySQL 8.0.35 with loads of added RonDB features).

The results were interesting, the improvement in Q20 was the highest I have seen in my career. The performance improved from 70 seconds to 80 milliseconds, thus an 875x speedup or 87500% improvement. Q2 had a 360x improvement. So RonDB 22.10.4 is much better equipped for more complex queries compared to RonDB 21.04. MySQL 8.0.35 had similar performance to RonDB 22.10.4 with an average of around 20% slower, this is mostly due to performance improvements in RonDB, not algorithmic changes.

When using complex queries the query optimiser tries to find an optimal plan, sometimes however better plans are available and one can add hints in the SQL query to ensure a better plan is used.

The RonDB team isn't satisfied with this however, we have realised that evaluating aggregation is also very important when the online feature store stores a time window of certain features. This means that RonDB can compute aggregate dynamically and thus provide more accurate predictions.

Early tests of some simple single table queries showed an improvement of 4-5x and we expect we will be able to get to 10-20x improvements in quite a few queries of this sort.

Friday, May 24, 2024

What's cooking in RonDB

Here is a short update on what is going on in the RonDB development. We recently launched the new RonDB release 22.10.2.

We are now working on a number of major improvements and new features.

The first feature we are working on is what we call Pushdown Aggregation. In the first step it will be able to perform Pushdown of aggregation on a single table. This is useful for Hopsworks Feature Store in that it will enable us to store time windows of certain features and compute aggregates as an online feature. This means that data will live in RonDB for a certain time and after that it will be deleted. So a part of this feature will be a Time-To-Live (TTL) feature that will enable users to declare what time window they want the data to visible. The deletion process will be handled by the new C++ REST API Server that will replace the current Go REST API Server. The C++ REST API Server have superior performance to the Go REST Server and will have exactly the same features with a set of new ones added as well.

So Pushdown Aggregation contains five parts, first a new interpreter that calculates the aggregates as part of scan operations. Second new additions to the NDB API to be able to issue those aggregation queries towards the RonDB data nodes. Third a new SQL engine that can execute SQL queries using aggregates on a single table. Fourth, the ability to call this SQL engine through a REST call and finally the TT'L feature to ensure that the rows can automatically disappear when no longer needed by the online data store.

The second feature we are working is a Kubernetes operator for RonDB. This will make it possible to start and stop RonDB clusters in on-prem settings, cloud settings and hybrid cloud settings. In addition it will contain Autoscaling of both MySQL Servers, REST servers and RonDB data nodes. Eventually it will also support global replication between multiple RonDB clusters.

We are also working on a Rate limit and Quota feature that will enable setting limits on both memory and disk usage and the amount of CPU usage a specific database is allowed to use.

So in short a number of very interesting new additions to RonDB that will make it even easier to use RonDB and prosper from the use of it.

There is also some minor features added to support even more pushdown of calculations that makes it possible to push filters based on values of array objects.

Friday, March 22, 2024

New LTS version of RonDB, RonDB 22.10.2

After a very thorough development and test period we are proud to announce the general availability of the RonDB 22.10 LTS serie with the release of RonDB 22.10.2 today. There is also a new version of the old LTS version RonDB 21.04.16 released today.

A complete list of the new features is provided in the RonDB Documentation.

The most important new feature is the support of variable sized disk rows. This means a very significant saving of disk space, up to 10x more data can be stored on the same disk space as with RonDB 21.04. In addition RonDB 22.10 contains a major quality improvement using disk columns. This means that RonDB is now prepared for the introduction of features in the Hopsworks Feature Store using disk columns. This will provide significant cost savings in storing lots of features in the Hopsworks Online Feature Store.

Another important feature in RonDB 22.10 is that all major data structures are now using the Global Data Manager, this ensures that the memory management inside RonDB is much more flexible. This was the final step of a long project described here.

As usual we have also been working on performance in this new version. Write throughput has been siginficantly improved, enterprise applications using read locks will see greater scalability and also throughput at very high load has been significantly improved leading to up to 30% better throughput in RonDB 22.10.2 compared to RonDB 21.04. This improvement comes from further improvements to ideas presented in this blog. This change makes it possible to be much more flexible in using CPU resources. More details on performance comes later.

In this blog we have described the testing process that we have had with RonDB. RonDB 22.10 is already running in production in a number of installations and is part of Hopsworks 3.7 released recently. This Hopsworks version introduces support of fine-tuning LLMs for GenAI. This version of Hopsworks also supports multi-region support.

Come to our webinar where we will present more information about RonDB and what it can be used for.

Since Oracle now made MySQL 8.0 an LTS version we plan to merge the changes from MySQL 8.0 series into future RonDB 22.10 versions. Bug fixes and features of interest to Oracle is contributed back to MySQL.

For those interested in following the development of RonDB in real-time the tree is here, the branch where RonDB 22.10.2 is found is called 22.10.1. The development branch of RonDB 22.10 is found in 22.10-main. We are also working on new features of RonDB in forks of this tree by the developers. 22.10-main already have a new feature supporting more than 20k table objects and an even more improved thread model providing an improvement of around 5% better throughput. There is also long-term development of rate limits and quotas, enabling RonDB to be used in multi-tenant environments and pushdown of aggregations, enabling a speedup of 10-1000x of some queries used in Feature Stores.

Tuesday, March 05, 2024

Testing of RonDB releases

Since RonDB is a fork of MySQL NDB Cluster it contains a lot of tests that is part of the RonDB development tree. This includes unit tests for various functionalities. It includes many hundreds of MTR test cases that takes between a few seconds to a few minutes to run. These tests are mostly test cases that use SQL commands to test the functionality of RonDB, in addition it tests backup and restore and a few other tools in RonDB. These tests are executed with debug compiled binaries, binaries compiled with error injection, binaries compiled for production and finally the binaries we use in the releases.

Another very important part of RonDB testing is the autotests. These tests are using the NDB API to test its functionality, it also has a lot of focus on testing recovery. This test suite contains thousands of tests that takes 36 hours to go through one test run when executed serially. It can be parallelised by running it on multiple clusters. This test suite can also be executed on different configurations with different number of replicas, different number of node groups, different number of CPUs per node and different memory sizes in the nodes.

RonDB is heavily used in Hopsworks. One part of Hopsworks is HopsFS. This is a distributed file system which is built on top of RonDB. It is written in Java and thus interfaces with ClusterJ, the Java API to RonDB that uses an easy to program model of the NDB API. HopsFS has a whole range of test cases related to it that also will be executed on a daily basis, this includes both functional tests and load tests.

RonDB is also used to handle metadata in Hopsworks and it is used as the Online Feature Store in Hopsworks. This means that the Hopsworks users will define new tables and new table structures on the fly. These parts of Hopsworks again have a set of functional tests and load tests.

Next there are upgrade tests verifying that we can perform an online upgrade of RonDB and these tests also include verifying that we can downgrade back to the old version if the upgrade didn't work as it should.

There are test cases also to handle replication to other clusters. This is a very important part of the Hopsworks framework that we support setups with multiple regions.

There are also benchmark suites, mostly Sysbench, DBT2 (~ TPC-C), DBT3 (~TPC-H) and YCSB that we regularly execute.

Hopsworks supports managed RonDB in the cloud. This offering includes support for reconfiguration of the RonDB Cluster as an online operation where we can scale resources such as MySQL Servers, REST API servers, RonDB data nodes. This management framework also has its own set of test suites that is regularly executed.

We are developing a REST API server, it is already completed in a Go version and a new C++ version of it is in development. This adds yet more tests of the RonDB functionality.

The latest addition is that we are now also developing a Kubernetes operator for RonDB. Again this operator contains CI/CD that ensures that every RonDB releases can be handled in this Kubernetes framework.

When a RonDB release is finished it has gone through all of those stages.

After release the RonDB software is used by community users and the Hopsworks customers. Any bugs found by them is immediately fed into the development process. Among other things a community user has added a CommonLisp NDB API to what is supported by RonDB.

As is hopefully clear from this picture a RonDB release is heavily tested before its release. The next LTS version of RonDB will be RonDB 22.10.1. This software have been moving through all these test frameworks and is going to be made into GA very soon. Since this is a new LTS version we have been especially careful in our testing of this version. At the moment the RonDB 22.10.2 version is going through heavy MTR testing.

This hopefully makes it clear that building a DBMS and building a data platform that uses it actually is very beneficial for the quality of the DBMS product. Thus RonDB have been through a much more varied set of tests than most DBMSs are facing that works strictly as a DBMS.

We often find bugs that originates from MySQL NDB Cluster. We try our best to be a good open source citizens by feeding back those as contributions to Oracle so that they can be included in future releases of MySQL NDB Cluster. In our view a bigger community for MySQL NDB Cluster is also good for RonDB.

Similarly of course we benefit from bug fixes that originates from Oracle. We are currently integrating MySQL 8.0.35 and 8.0.36 into RonDB 22.10 series. RonDB 22.10.1 is based on MySQL 8.0.34.

Wednesday, January 31, 2024

The completion of a 12 year long project in RonDB

In 2012 a project was started to change the memory model of MySQL NDB Cluster. The first step was some early prototypes developed in 2012 and 2013 by Jonas Oreland. When Jonas left Oracle for Google it took a while before the project got up and running again. The first project was to change the memory model for the operation records used by transactions. This project started in 2015.

It took quite some time to complete. The requirement on maintained performance was high, this required going through the changes ensuring that we either gained or at most lost 1-2% performance. The traditional model used in NDB had a very simple model that had extremely good performance, to maintain the good performance the developer Mauritz experimented with eight different new memory models before settling for a model we call TransientPool. This pool relies on that memory objects are allocated for a short time (typical for short transactions). I assisted Mauritz in ensuring that we maintained performance.

Finishing this step completed most of the framework for the new memory management model. However it only took care of a fairly small part of all memory parts in NDB. Another step was completed around 2018-2019 that finalised all work on operation records. This was the most significant part of the change and the most important one.

When I joined Hopsworks we wanted to avoid having loads of configuration parameters affecting setup of RonDB 21.04. To handle this we simply configure to support 20000 table objects (table, ordered indexes and unique indexes). This still used the old memory management model. In RonDB 22.10 the work was finalised, the final part was to move also all memory related to metadata to the new memory management model (called SchemaMemory) and also the memory used by replication to other RonDB clusters (called ReplicationMemory).

Thus with the release of RonDB 22.10.1 we have finished this very long project transitioning MySQL NDB Cluster to a new memory management model in RonDB. This means that all memory parts share a common memory pool that is allocated at startup. This pool have around 11 different parts and when one part requires much memory it can get from the shared global memory and there is a priority of who gets memory in a situation when the free memory is low.

The new memory management model in RonDB 22.10.1 also includes that one can use a malloc and free-like model to get memory from the different pools. This will be useful for all sorts of new developments in RonDB.

Tuesday, January 02, 2024

Major update to the RonDB documentation

My colleague Vincent has spent some time improving the RonDB documentation.

New/rewritten chapters/sections are:

Main page: https://docs.rondb.com
Installing: https://docs.rondb.com/rondb_installation/
Local Quickstart: https://docs.rondb.com/rondb_quickstart_local/
Start Distributed: https://docs.rondb.com/rondb_programs/
Recovery (entire chapter): https://docs.rondb.com/rondb_high_availability/
Two-Phase Commit Protocol: https://docs.rondb.com/rondb_nonblocking_2pc/
Transaction Model (only ACID section, further PR incoming) https://docs.rondb.com/intro_transactions/

Further UI changes/fixes:

Added dark mode
HTTP links are visible again
Recognition of programming language in code snippets (using Lua filter)
Order & naming of chapters
A number of new images based on our Cheetah logo