Thursday, July 29, 2021

Improvements of DBT2 benchmark in RonDB 21.10.1

In the development of RonDB 21.10.1 we have had some focus on improving the performance of the DBT2 benchmark for RonDB.  Actually NDB Cluster already had very good performance for DBT2. However this performance relies on a thread configuration that uses a lot of LDM threads and this means that tables will have very many partitions.

For an application like DBT2 (open source variant of TPC-C) this is not an issue since it is a very scalable application. But most real applications are not as scalable as DBT2 when the number of table partitions increases.

In RonDB we have focused on decreasing the number of table partitions. Thus in RonDB the number of partitions are independent of the number of LDM threads. In DBT2 most of the load are generated towards one of the tables, this means that only a subset of the LDM threads are used in executing DBT2. Even more most of the load is directed towards the primary replicas.

In RonDB 21.10.1 we improved the placement of the primary replicas such that more LDM threads were active in executing the queries. This improved DBT2 performance by about 20%.

Already in RonDB 21.04 we have introduced query threads that can be used for reads using Committed Reads. This makes application using Committed Reads scale very well such as the Online Feature Store in used by Hopsworks. However DBT2 uses a very small number of Committed Reads, most reads are using reads that lock rows. To handle this we modified RonDB 21.10 to allow also locked reads to use query threads.

Query thread already have an efficient scheduling of read queries towards LDM threads and query threads, thus ensuring that all CPUs used for LDM threads and query threads are efficiently used. With the ability to schedule locked read operations towards query threads we automatically make more efficient use of the CPU resources in the DBT2 benchmark. This improvement gives 50% better DBT2 performance for RonDB.

Another feature we made use of in the DBT2 benchmark is the ndb_import tool. Thus the load phase for DBT2 is using the ndb_import tool. This provides a very efficient parallel load tool. Both RonDB 21.04 and RonDB 21.10 contains improvements of the ndb_import tool to enable DBT2 to use it as a load tool.

Finally in RonDB 21.10.1 we also removed the index statistics mutex in the NDB storage engine as a bottleneck. This improves Sysbench throughput by about 10% at high load. We haven't measured how much it impacts the DBT2 performance.

No comments: