Thursday, July 29, 2021

Improvements of DBT2 benchmark in RonDB 21.10.1

In the development of RonDB 21.10.1 we have had some focus on improving the performance of the DBT2 benchmark for RonDB.  Actually NDB Cluster already had very good performance for DBT2. However this performance relies on a thread configuration that uses a lot of LDM threads and this means that tables will have very many partitions.

For an application like DBT2 (open source variant of TPC-C) this is not an issue since it is a very scalable application. But most real applications are not as scalable as DBT2 when the number of table partitions increases.

In RonDB we have focused on decreasing the number of table partitions. Thus in RonDB the number of partitions are independent of the number of LDM threads. In DBT2 most of the load are generated towards one of the tables, this means that only a subset of the LDM threads are used in executing DBT2. Even more most of the load is directed towards the primary replicas.

In RonDB 21.10.1 we improved the placement of the primary replicas such that more LDM threads were active in executing the queries. This improved DBT2 performance by about 20%.

Already in RonDB 21.04 we have introduced query threads that can be used for reads using Committed Reads. This makes application using Committed Reads scale very well such as the Online Feature Store in used by Hopsworks. However DBT2 uses a very small number of Committed Reads, most reads are using reads that lock rows. To handle this we modified RonDB 21.10 to allow also locked reads to use query threads.

Query thread already have an efficient scheduling of read queries towards LDM threads and query threads, thus ensuring that all CPUs used for LDM threads and query threads are efficiently used. With the ability to schedule locked read operations towards query threads we automatically make more efficient use of the CPU resources in the DBT2 benchmark. This improvement gives 50% better DBT2 performance for RonDB.

Another feature we made use of in the DBT2 benchmark is the ndb_import tool. Thus the load phase for DBT2 is using the ndb_import tool. This provides a very efficient parallel load tool. Both RonDB 21.04 and RonDB 21.10 contains improvements of the ndb_import tool to enable DBT2 to use it as a load tool.

Finally in RonDB 21.10.1 we also removed the index statistics mutex in the NDB storage engine as a bottleneck. This improves Sysbench throughput by about 10% at high load. We haven't measured how much it impacts the DBT2 performance.

New RonDB releases

It is in the middle of the summer, but we found some time to prepare a new RonDB release. Today we are proud to release new RonDB versions.

RonDB is a stable distribution of NDB Cluster, a key-value store with SQL capabilities. It is based on a release of MySQL, an SQL database server.

RonDB 21.04.1 is the second release of the stable version of RonDB. It contains 3 new features and 18 bug fixes. We plan to support RonDB 21.04 until 2024.

RonDB 21.10.1 is the first beta version of RonDB 21.10. It contains 4 new features that improves throughput of the DBT2 benchmark by 70% compared to RonDB 21.04.1.

Detailed release notes are available in the RonDB documentation.

The new features in RonDB 21.04.1 are:

  1. Support for primary keys using Longvarchar in ClusterJ, the native Java API for RonDB
  2. Support for autoincrement in the ndb_import tool
  3. Killing ndbmtd now uses a graceful shutdown avoiding unnecessary abort
The new features in RonDB 21.10.1 are:
  1. Improved placement of primary replicas
  2. Removing index statistics mutex as a bottleneck in MySQL Server
  3. More flexibility in thread configuration
  4. Use query threads also for Reads which locks the row
Work on RonDB 21.04.2 is already ongoing and is mainly focused on backporting bug fixes from MySQL 8.0.24 through 8.0.26 that are deemed safe and important enough for a back port. The branch used for development of RonDB 21.04 is called 21.04.

Work on RonDB 21.10.2 has already started where we integrated changes from MySQL 8.0.26. The branch used for RonDB 21.10 is called 21.10.1.

There is ongoing work on RonDB to improve use of memory resources. This includes making schema memory use the global memory resources. It also introduces some common malloc functions using global memory resources. This development is a base for many future RonDB improvements that will make it easier to develop new features in RonDB. This work is found in the branch schema_memory_21102 currently.

Here is the GitHub tree for RonDB.

The flexible thread configuration was used for some research on thread pipelines presented in this blog.

Tuesday, May 25, 2021

HA vs AlwaysOn

 In the 1990s I spent a few years studying requirements on databases used in 3G telecom networks. The main requirement was centered around three keywords, Latency, Throughput and Availability. In this blog post I will focus on Availability.

If a telecom database is down it means that no phone calls can be made, internet connections will not work and your app on your smartphone will cease to work. So more or less impacting each and everyone's life immediately.

The same requirements on databases now also start to appear in AI applications such as online Fraud detection, self-driving cars, smartphone apps.

Availability is measured in percent and for telecom databases the requirement is to reach 99.9999% availability. One often calls this Class 6 availability where 6 is the number of nines in the availability percentage.

Almost every database today promises High Availability, this has led to inflation in the term. Most databases that promise HA today reach Class 4 availability, thus 99.99% availability. Class 4 availability means that 50 minutes each year your system is down and unavailable for transactions. For many this is sufficient, but imagine relying on such a system for self-driving cars, or phone networks or modern AI applications used to make online decisions for e.g. hospitals. Thus I sometimes use the term AlwaysOn to refer to a higher availability reaching Class 5 and Class 6 and beyond.

To analyse Availability we need to look at the reasons for unavailability. There are the obvious ones like hardware and software failure. The main solution for those is to use replication. However as I will show below, most replication solutions of today incur downtime on every failure. Thus it is necessary to analyse the replication solution in more detail to see what happens at a failure. Does the database immediately deliver the same latency and throughput after a failure as required by the application?

There are also a lot of planned outages that can happen in many HA systems. We could have downtime when a node goes down before the cluster has been reorganised again, we can have downtime at Software upgrades, at Hardware upgrades, at schema reorganisations. Major upgrades involving lots of systems may cause significant downtime.

Thus to be called AlwaysOn a Database must have zero downtime solutions for at least the following events:

  1. Hardware Failure

  2. Software Failure

  3. Software Upgrade

  4. Hardware Upgrade

  5. Schema Reorganisation

  6. Online Scaling

  7. Major Upgrades

  8. Major disasters

Most of these problems require replication as the solution. However the replication must be at 2 levels. There must be a local replication that makes it possible to sustain the Latency requirements of your application even in the presence of failures. In addition there is a need to handle replication on a global level that makes it possible to survive major upgrades and major disasters.

Finally, to handle schema reorganisations and scaling the system without downtime requires careful design of algorithms in your database.

Today replication is used in almost every database. However the replication scheme in most databases leads to significant downtime for AlwaysOn applications at each failure and upgrade.

To handle a SW/HW failure and SW/HW upgrade without any downtime requires that the replicas are ready to start serving requests within one millisecond after discovering the configuration change required by the failure/upgrade.

Most replication solutions today rely on eventual consistency, thus they are by design unavailable even in normal operation and even more so when the primary replica fails. Many other replication solutions only ship the log records to the backup replica, thus at failure the backup replica must execute the log before it can serve all requests. Again downtime for the application which makes it impossible to meet the requirements of AlwaysOn applications.

To reach AlwaysOn availability it is necessary to actually perform all write operations on the backup replicas as well as on the primary replica. This ensures that the data is immediately available in the presence of failures or other configuration changes.

This has consequences for the latency, many databases take shortcuts when performing writes and doesn’t ensure that the backup replicas are immediately available at failures.

Thus obviously if you want your application to reach AlwaysOn availability, you need to ensure that your database software delivers a solution that doesn't involve log replay at failure (thus all shared disk solutions and eventual consistency solutions are immediately discarded).

As part of my Ph.D research in the 1990s I concluded those things. All those requirements was fed into the development of NDB Cluster, NDB Cluster was the base for MySQL Cluster (nowadays often referred to as MySQL NDB Cluster) and today Logical Clocks AB develops RonDB a distribution of NDB Cluster.

To reach AlwaysOn it is necessary that your database software can deliver solutions to all 8 problems listed above without any downtime (not even 10 seconds). Actually the limit on the availability is the time it takes to discover a failure, for the most part this is immediate since most failures are SW failures, but for some HW failures it is based on a heartbeat mechanism that can lead to a few seconds of downtime.

Thus to reach AlwaysOn availability it is necessary that your database software can handle failures with zero downtime, it is necessary that your database software can handle the algorithms for online schema management and online scaling of your database, it is necessary that your database software supports global replication to also be able to handle major upgrades and major disasters where entire regions are affected. Finally you need to have an organisation that operates your systems in a reliable manner.

RonDB is designed to handle all the requirements on the database software which have been proven in production sites reaching Class 6 availability for more than 15 years. Logical Clocks is building the competence to operate RonDB for customers at these availability levels.

Wednesday, May 19, 2021

5.5M Key Lookups per second on 16 VCPU VMs

 As introduced in a previous blog RonDB enables us to easily execute benchmarks on RonDB using the Sysbench benchmark.

In this blog I will present some results where the RonDB cluster had 2 data nodes, each using a r5.4xlarge VM in AWS that has 16 VCPUs and 128 GB memory. The Sysbench test uses SQL to access RonDB.

In this particular test case we wanted to test the Key-Lookup performance using SQL. Key-Lookup performance is essential in the RonDB use case as an online Feature Store in Hopsworks.

In this case we use the following settings in the Sysbench configuration file (see RonDB documentation for more details on how to set up those tests):







These settings means that each transaction has 10 SQL queries, each of those queries fetch 100 rows using a primary key lookup. Thus each transaction performs 1000 key lookups.

We will look at two things here, how many key lookups can be delivered per second and what is the latency of each of those batched SELECT statements.

In the image above we see how the number of key value lookups grows extremely quickly, even with only 2 threads per MySQL Server we reach beyond 1M key lookups per second. The total throughput at 32 threads is 5.56M key lookups per second.

Even more interesting is the response time of each of those SELECT statements. Remember that each of those SELECT fetch 100 rows using a primary key lookup. Using 4 threads per MySQL Server we respond within 1.0 millisecond with a throughput of 2.17M key lookups per second. At 12 threads we deliver around 4.5M key lookups with a latency around 1.5 millisecond per SELECT query. At 16 threads we deliver 5.27M key lookups at a response time of 2.28 milliseconds.

Thus as can be seen from these numbers, even using SQL, RonDB can provide astonishing key lookup performance. Obviously using C++, Java or JavaScript key-value APIs we expect to perform a lot better yet and in particular it will use much less CPU on the client side.