Mikael Ronstrom: YCSB Disk Data Benchmark with NDB Cluster

As mentioned in blog post 1 we have improved the capabilities to handle

very large write bandwidth and blog post 2 improved the checkpointing of

disk data columns in MySQL Cluster 8.0.20.

We wanted to verify that these changes were succesful. To do this we

selected to use the YCSB benchmark. This is a very simple benchmark,

it contains one table with in our variant 2 columns. The first

column is the primary key, the second key contains the payload data

and is stored in VARCHAR(29500) column that is stored on disk.

The first part of the benchmark fills the database. The database is

mostly stored in a tablespace that is setup in accordance with the

blog post 3. This means that we had a tablespace size of 20 TBytes.

We loaded 600M rows, thus creating a database size of 18 TByte.

The load phase inserted 44.500 rows per second. This means that we

loaded 1.25 GByte per second into the database. The bottleneck in

both the load phase and the benchmark run phase was mainly the

NVMe drives, but in some cases also the 25G Ethernet became the

bottleneck. The CPUs were never loaded more than 20%, thus never

becoming a bottleneck.

From this we can conclude that the setup and use of the NVMe drives

is the most important part of achieving extreme write rates for

use cases where NDB Cluster is used for file write loads.

The cluster setup used 2 data nodes, each of the data nodes was a bare

metal server in the Oracle Cloud (OCI) that had 8 NVMe drives (2 used

for logs and checkpoints and 6 used for the tablespace). The servers

had 52 CPU cores each. Instead of setting up a RAID on the 6 NVMe drives

we instead opted for one file system per NVMe drive and added one

data file per NVMe drive to NDB. This meant that NDB handled the

spread of writes on the different data files. Thus no complex RAID

solutions was required. However to get the best possible performance

it was necessary to use SSD overprovisioning.

The disk usage and CPU usage and network usage during this benchmark can

be found in blog post 4. The mean latency of those transactions was a bit more

than 2 milliseconds where reads took a bit more than 1 ms and writes around

4-5 milliseconds.

The actual benchmark consisted of 50% reads and 50% writes. Here we

achieved almost 70.000 transactions per second. This meant that we

read 1 GByte per second in parallel with writing 1 GByte per second.

Mikael Ronstrom