Thursday, October 26, 2023

Results on comparing new Intel/AMD VMs with older VM types using RonDB

 In Hopsworks cloud offering for GCP one can select a fairly large variety of VM types. I am currently working on extending this list to also include the latest generation of VM types. This blog will focus on the impact of those new VM types for benchmarks using RonDB.

The newer VM types is the c3d-serie that uses AMD EPYC CPUs of the 4th generation and the c3-series which contains VMs using the Intel Saphire Rapid CPUs. Also AWS has introduced similar new VM types, but this blog discuss tests performed on VMs in GCP.

The older VM types we compared with for the MySQL Servers was the n2-standard-16 VM type. This VM uses an Intel Cascade Lake Xeon processor. This represents the second generation Intel Xeon chips whereas Intel Saphire Rapid represents the 4th generation Intel Xeon.

The RonDB data nodes used the e2-highmem-16 as the baseline for comparison. This VM types uses either an Intel Xeon of the second generation or an AMD EPYC of the second generation.

The benchmark used was Sysbench OLTP RW based on version 0.4.12.19 which is included in the RonDB tarball and is setup in the API nodes automatically by our cloud offering. This makes it extremely easy to replicate the benchmarks. We use Consul as a load balancer, so the benchmark process is setup to a single host onlinefs.mysql.service.consul. In reality this address maps to the number of MySQL Servers in the RonDB cluster. We used 3 MySQL Servers in the tests. The setup used 2 RonDB data nodes in one node group.

Thus in the Hopsworks cloud we get a load balanced RonDB Data Service as part of the infrastructure of the Hopsworks Feature Store.

We first executed the benchmark using the old VM types to get a baseline. The next step was to upgrade the RonDB MySQL Servers to use c3d-highmem-16. Thus the same amount of memory and number of CPUs as in n2-standard-16 but upgraded from Intel 2nd generation to AMD 4th generation.

This impacted the throughput mainly. The baseline experiment executed 9000 TPS and was limited by the CPUs in the MySQL Servers (they used 1550% of the 1600% available). The c3d-highmem-16 delivered 11400 TPS but only using 1000% of the available 1600%. Thus the throughput per CPU increased by around 100%. In this execution the bottleneck of the benchmark was the RonDB data nodes.

The benchmark API node was consistently a n2-standard-48 VM. This meant that most communication went from API VM of old type, to MySQL Server of new type, to RonDB data node VM of old type. Thus in all communication an old VM type was involved. The network latency was the same in this experiment as in the baseline experiment.

The change from one VM type was using the Reconfiguration support RonDB have in its Cloud offering. This change is an online operation where the cluster remains operational and the new MySQL Servers are included in the Consul setup as soon as they have started up. Only when nodes are stopped could temporary errors happen that can be handled with a simple retry logic.

Next we changed also the VM type of the RonDB data nodes to be c3d-highmem-16 using the same online reconfiguration as for the MySQL Servers.

What we quickly noted in this setup was that the latency per transaction was cut in half. Thus performance using a single thread decreased to less than half. Thus it is clear that communication between 2 VMs of the new type have more than 100% improvements on network latency. The throughput now increased to 17800 TPS and the bottleneck was now in the MySQL Servers. Thus throughput improvement is almost 98% and network latency improved by more than 100%.

When reading the announcement of the C3 machine series and the description of the C3D machine series, it is clear that the new IPU (Infrastructure Processing Unit) that takes care of offloading networking is a major reason for this improved network latency.

Analysing the Sysbench transaction in this setup there will be around 100 network messages, most of them in serial order. Still the latency of a transaction execution is no more than 6 milliseconds to execute the 20 SQL queries involved in the OLTP RW transaction. Thus a medium of 60 microsecond per message and this includes the time to also execute the RonDB Data node code and the RonDB MySQL Server code.

Next step was to again change MySQL Server VMs. This time we changed to c3-highmem-22. Unfortunately the VM type c3-highmem-16 didn't exist. So the comparison isn't perfect, but at least it gives a good estimate of the improvements in Intel's 4th generation CPUs.

The network latency was the same for Intel and AMD 4th generation VM types. The throughput increased by around 40% up to around 24000 TPS. Since the number of CPUs increased by around 40% as well, it seems that c3-serie and c3d-serie is very similar in handling throughput when used in RonDB MySQL Servers.

To test the throughput of those new VMs we ran the test using c3-highmem-8 and c3d-highmem-8 VM types as RonDB Data node VMs. The performance of those two VM types was almost indistinguishable, to the point where I started wondering if they were the same CPUs. Throughput was half the throughput of the 16 VCPU VMs.

The main conclusion of these tests is that upgrading from 2nd generation x86 CPUs to 4th generation x86 CPUs in the GCP cloud provides a 100% improvement in throughput and a similar improvement of the network latency.

The price of those VMs is higher, but substantially less than 100%. So it makes a lot of sense to start using those new VM types for new applications.

The tests were performed using the RonDB version 21.04.15. We are about to release a new LTS version of RonDB, version 22.10.1. There will be a more thorough benchmark report when this is released.