Wednesday, May 19, 2021

5.5M Key Lookups per second on 16 VCPU VMs

 As introduced in a previous blog RonDB enables us to easily execute benchmarks on RonDB using the Sysbench benchmark.

In this blog I will present some results where the RonDB cluster had 2 data nodes, each using a r5.4xlarge VM in AWS that has 16 VCPUs and 128 GB memory. The Sysbench test uses SQL to access RonDB.

In this particular test case we wanted to test the Key-Lookup performance using SQL. Key-Lookup performance is essential in the RonDB use case as an online Feature Store in Hopsworks.

In this case we use the following settings in the Sysbench configuration file (see RonDB documentation for more details on how to set up those tests):







These settings means that each transaction has 10 SQL queries, each of those queries fetch 100 rows using a primary key lookup. Thus each transaction performs 1000 key lookups.

We will look at two things here, how many key lookups can be delivered per second and what is the latency of each of those batched SELECT statements.

In the image above we see how the number of key value lookups grows extremely quickly, even with only 2 threads per MySQL Server we reach beyond 1M key lookups per second. The total throughput at 32 threads is 5.56M key lookups per second.

Even more interesting is the response time of each of those SELECT statements. Remember that each of those SELECT fetch 100 rows using a primary key lookup. Using 4 threads per MySQL Server we respond within 1.0 millisecond with a throughput of 2.17M key lookups per second. At 12 threads we deliver around 4.5M key lookups with a latency around 1.5 millisecond per SELECT query. At 16 threads we deliver 5.27M key lookups at a response time of 2.28 milliseconds.

Thus as can be seen from these numbers, even using SQL, RonDB can provide astonishing key lookup performance. Obviously using C++, Java or JavaScript key-value APIs we expect to perform a lot better yet and in particular it will use much less CPU on the client side.

1 comment:

Mark Callaghan said...

An FYI - there is a CPU regression for large inlists that started in 8.0.22. I spotted it with the sysbench query you used to get these numbers.