Wednesday, September 11, 2024

Rate limits and Quotas in RonDB

Hopsworks-RonDB Background 

One of the services that Hopsworks provides is a free service to run Hopsworks workloads in a managed cloud. This service has been used by many thousands of individuals and companies wanting to experiment with AI Lakehouse applications of various sorts such as predicting weather in your location and other experimental machine learning applications.

This Hopsworks service means that thousands of projects can run concurrently on one Hopsworks cluster. A Hopsworks cluster uses RonDB for three different things. It is used as an online feature store. This means that users import data from their data pipelines and some of this is directed to the online feature store and some of it is directed towards the offline feature store. The data in the online feature store is used for machine learning inferencing in real-time applications. This could be services such as personalised search services, credit fraud analysis and many, many other applications.

The offline feature store uses HopsFS, a distributed file system that is built on top of RonDB. It stores the file data in a backend storage system, often using storage systems such as S3, Scality, Google Cloud Storage, Azure Blob Storage or other storage systems. The metadata of the file system is stored in RonDB.

The data in HopsFS can be used by Hudi, Apache Iceberg, DuckDB and other query services for training machine learning models and performing batch inferencing.

Thirdly RonDB is also used to store metadata about the features, job control and many other things that are used to operate Hopsworks.

Having a multi-tenant DBMS with thousands of concurrent projects running in one RonDB cluster is obviously a challenge. RonDB is required to provide response times down to less than a millisecond and tens of milliseconds while fetching a batch of hundreds of rows to serve a single personalised search request.

Tables in RonDB can store the payload data which could be hundreds of features in one table (feature group) either in memory for best latency and performance or it could be stored in a disk column that provides a cheaper storage at a bit higher cost of accessing the data.

To handle these thousands of projects each project has one database in RonDB. In order to handle many tables in RonDB we have added the capability to configure RonDB to support hundreds of thousands of tables concurrently.

The figure below shows how the data server side is implemented by the RonDB data nodes together with the RonDB management server. The HopsFS access RonDB through ClusterJ, the native Java NDB API. The applications can either access RonDB using a set of MySQL servers or through the RonDB REST API Server. The REST API server delivers capabilities for key lookups and batched key lookups in real-time. It also provides in RonDB 24.10 a new endpoint of RonSQL. RonSQL can handle simple SQL queries that retrieve aggregate information from a table in RonDB. The same query in RonSQL is about 20x times faster than sending it through the MySQL Server.

Why Rate Limits and Quotas?

To manage thousands of concurrent users in a real-time DBMS with each using only a fraction of a CPU is indeed a challenge. In RonDB 24.10 which will be released in a few weeks we have added the capability to limit the use of CPUs, memory and disk space per database.

Managing memory and disk space is fairly straightforward. RonDB tracks each and every memory page and disk page used by a specific database. When the database has reached its limit, it will no longer be possible to insert, write and update the data. It will still be possible to delete and read the data.

Managing CPU resources is handled by keeping track of the CPU usage for each database in real-time. As long as the application using the database is within the limits of its CPU rate, the operation works as normal.

If the application tries to use more CPU than it has requested it will soon be discovered. At this point RonDB needs to slow down this project to ensure that all the other projects get a real-time service. The more overload the project tries to create, the more it will be slowed down. If the slow down doesn't work, eventually the project will not be able to complete any queries in RonDB until it has paid back its CPU "debt". Each time interval the database pays the "debt" and if things go as normal without rate limitation this will put the "debt" back to zero every time interval.

When defining the rate limits we measure it in microseconds of CPU time per second per data node. Thus in a 2-node RonDB cluster setting the rate limit to 100.000 means you get access to 0.1 CPUs in each data node. Setting it to 1000 means you get access to 1 millisecond of CPU time per second per data node. Thus 0.001 CPUs in each data node.

Different projects can have very different requirements, one could require 0.001 CPUs and another one could 1.5 CPUs, these databases can easily co-exist in the RonDB cluster and they will both be able to get a reliable real-time service.

An example of how this works was that we ran a Sysbench OLTP RW benchmark and set the rate limit to 100.000 (0.1 CPU per data node), this made it possible to run 330 TPS (thus 6600 SQL queries per second delivering almost 150.000 rows to the application). This TPS was achieved whatever the number of threads was used from 1 thread to 256 thread. Increasing the rate limit 500.000 meant that we got 5x more TPS.

Obviously this service can also be useful for a company with many departments that want to share one Hopsworks cluster as well.

Now this shows how we can operate the actual data servers in RonDB. The data server clients are stateless clients and can thus scale up and down as needs come and go. In addition the clients in Hopsworks can access the data server clients using load balancers. Together with Hopsworks 4.0 that is operated by Kubernetes it means that RonDB 24.10 will be extremely flexible in scaling up and down CPU, memory and disk on the data server side as well as CPU on the client side.



Monday, May 27, 2024

875X improvement from RonDB 21.04.17 to 22.10.4

At Hopsworks we are working on ensuring that the online feature store will be able to perform complex join operations in real-time. This means that queries that could use data from multiple tables can be easily integrated into machine learning applications.

Today most feature stores use key-value stores like Redis and DynamoDB. These systems have no capability to issue complex join queries, if this is required the feature store will have to write complex code to handle this and this is likely to involve multiple roundtrips and thus cause unwanted latency.

Hopsworks feature store uses RonDB as its online feature store. RonDB can handle any SQL operations that MySQL can handle. Actually RonDB has even support for parallelising the join queries and pushing the filtering and joining down to the RonDB data nodes where data resides.

This means that users of the Hopsworks feature store can integrate more features from multiple feature groups in online inferencing requests. This means that things credit fraud detection can be made much more intelligent by taking more features into account in the inferencing requests.

This means that performance of real-time join queries becomes more important in RonDB. To evaluate how RonDB develops in this are I ran a set of tests using TPC-H queries from DBT3 against RonDB 21.04.17 and RonDB 22.10.4 (not released yet). I also ran tests against MySQL 8.0.35 (RonDB 22.10.4 is based on MySQL 8.0.35 with loads of added RonDB features).

The results were interesting, the improvement in Q20 was the highest I have seen in my career. The performance improved from 70 seconds to 80 milliseconds, thus an 875x speedup or 87500% improvement. Q2 had a 360x improvement. So RonDB 22.10.4 is much better equipped for more complex queries compared to RonDB 21.04. MySQL 8.0.35 had similar performance to RonDB 22.10.4 with an average of around 20% slower, this is mostly due to performance improvements in RonDB, not algorithmic changes.

When using complex queries the query optimiser tries to find an optimal plan, sometimes however better plans are available and one can add hints in the SQL query to ensure a better plan is used.

The RonDB team isn't satisfied with this however, we have realised that evaluating aggregation is also very important when the online feature store stores a time window of certain features. This means that RonDB can compute aggregate dynamically and thus provide more accurate predictions.

Early tests of some simple single table queries showed an improvement of 4-5x and we expect we will be able to get to 10-20x improvements in quite a few queries of this sort.

Friday, May 24, 2024

What's cooking in RonDB

 Here is a short update on what is going on in the RonDB development. We recently launched the new RonDB release 22.10.2.

We are now working on a number of major improvements and new features.

The first feature we are working on is what we call Pushdown Aggregation. In the first step it will be able to perform Pushdown of aggregation on a single table. This is useful for Hopsworks Feature Store in that it will enable us to store time windows of certain features and compute aggregates as an online feature. This means that data will live in RonDB for a certain time and after that it will be deleted. So a part of this feature will be a Time-To-Live (TTL) feature that will enable users to declare what time window they want the data to visible. The deletion process will be handled by the new C++ REST API Server that will replace the current Go REST API Server. The C++ REST API Server have superior performance to the Go REST Server and will have exactly the same features with a set of new ones added as well.

So Pushdown Aggregation contains five parts, first a new interpreter that calculates the aggregates as part of scan operations. Second new additions to the NDB API to be able to issue those aggregation queries towards the RonDB data nodes. Third a new SQL engine that can execute SQL queries using aggregates on a single table. Fourth, the ability to call this SQL engine through a REST call and finally the TT'L feature to ensure that the rows can automatically disappear when no longer needed by the online data store.

The second feature we are working is a Kubernetes operator for RonDB. This will make it possible to start and stop RonDB clusters in on-prem settings, cloud settings and hybrid cloud settings. In addition it will contain Autoscaling of both MySQL Servers, REST servers and RonDB data nodes. Eventually it will also support global replication between multiple RonDB clusters.

We are also working on a Rate limit and Quota feature that will enable setting limits on both memory and disk usage and the amount of CPU usage a specific database is allowed to use.

So in short a number of very interesting new additions to RonDB that will make it even easier to use RonDB and prosper from the use of it.

There is also some minor features added to support even more pushdown of calculations that makes it possible to push filters based on values of array objects.

Friday, March 22, 2024

New LTS version of RonDB, RonDB 22.10.2

 After a very thorough development and test period we are proud to announce the general availability of the RonDB 22.10 LTS serie with the release of RonDB 22.10.2 today. There is also a new version of the old LTS version RonDB 21.04.16 released today.

A complete list of the new features is provided in the RonDB Documentation.

The most important new feature is the support of variable sized disk rows. This means a very significant saving of disk space, up to 10x more data can be stored on the same disk space as with RonDB 21.04. In addition RonDB 22.10 contains a major quality improvement using disk columns. This means that RonDB is now prepared for the introduction of features in the Hopsworks Feature Store using disk columns. This will provide significant cost savings in storing lots of features in the Hopsworks Online Feature Store.

Another important feature in RonDB 22.10 is that all major data structures are now using the Global Data Manager, this ensures that the memory management inside RonDB is much more flexible. This was the final step of a long project described here.

As usual we have also been working on performance in this new version. Write throughput has been siginficantly improved, enterprise applications using read locks will see greater scalability and also throughput at very high load has been significantly improved leading to up to 30% better throughput in RonDB 22.10.2 compared to RonDB 21.04. This improvement comes from further improvements to ideas presented in this blog. This change makes it possible to be much more flexible in using CPU resources. More details on performance comes later.

In this blog we have described the testing process that we have had with RonDB. RonDB 22.10 is already running in production in a number of installations and is part of Hopsworks 3.7 released recently. This Hopsworks version introduces support of fine-tuning LLMs for GenAI. This version of Hopsworks also supports multi-region support.

Come to our webinar where we will present more information about RonDB and what it can be used for.

Since Oracle now made MySQL 8.0 an LTS version we plan to merge the changes from MySQL 8.0 series into future RonDB 22.10 versions. Bug fixes and features of interest to Oracle is contributed back to MySQL.

For those interested in following the development of RonDB in real-time the tree is here, the branch where RonDB 22.10.2 is found is called 22.10.1. The development branch of RonDB 22.10 is found in 22.10-main. We are also working on new features of RonDB in forks of this tree by the developers. 22.10-main already have a new feature supporting more than 20k table objects and an even more improved thread model providing an improvement of around 5% better throughput. There is also long-term development of rate limits and quotas, enabling RonDB to be used in multi-tenant environments and pushdown of aggregations, enabling a speedup of 10-1000x of some queries used in Feature Stores.