Thursday, February 13, 2020

NDB Cluster, the World's Fastest Key-Value Store

Using numbers produced already with MySQL Cluster 7.6.10 we have
shown that NDB Cluster is the world's fastest Key-Value store using
the Yahoo Cloud Serving Benchmark (YCSB) Workload A.

Presentation at Slideshare.net.

We reached 1.4M operations using 2 Data Nodes and 2.8M operations
using a 4 Data Node setup. All this using a standard JDBC driver.
Obviously using a specialised ClusterJ client will improve performance
further. These benchmarks was executed by Bernd Ocklin.

The benchmark was executed in the Oracle Cloud. Each Data Node used
a Bare Metal Server using DenseIO which have 52 CPU cores with
8 NVMe drives.

The MySQL Servers and Benchmark clients was executed on Bare Metal
servers with 2 MySQL Server per server (1 MySQL Server per CPU socket).
These Bare Metal servers contained 36 CPU cores each.

All servers used Oracle Linux 7.

YCSB Workload A means that 50% of the operations are reads that read the
full rows (1 kByte in size) and 50% perform updates of one of the fields
(100 bytes in size).

Oracle Cloud contains 3 different levels of domains. The first level is that
servers are placed in different Failure Domains within the same Availability
Domains. This means essentially that servers are not relying on the same
switches and power supplies. But they can still be in the same building.

The second level is Availability Domains that are in the same region, but each
Availability Domain is failing independently of the other Availability Domains
in the same region.

The third level is regions that are separated by long distances as well.

Most applications of NDB Cluster relies on a model that would use 2 or
more NDB Clusters in different regions, but each cluster contained inside
an Availability Domain. Next global replication between the NDB Clusters
is used for fail-over when one region or availability domain fails.

With Oracle Cloud one can also setup a cluster to have Data Nodes in
different Availability Domain. This increases the availability of the
cluster at the expense of higher latency for write operations. NDB Cluster
have configuration options to ensure that one always performs local
reads on either the same server or at least in the same Availability/Failure
Domain.

The Oracle Cloud have the most competitive real-time characteristics of
the enterprise clouds. Our experience is that the Oracle Cloud provides
2-4x better latency compared to other cloud vendors. Thus the Oracle
Cloud is perfectly suitable for NDB Cluster.

The DenseIO Bare Metal Servers or DenseIO VMs are suitable for
use for NDB Data Nodes or a NDB Data Nodes colocated with
MySQL Server. These servers have excellent CPU combined with
25G Ethernet links and extremely high performing NVMe drives.

This benchmark reported here stores the table as In-Memory tables.
We will later report on some benchmarks where we use a slightly
modified YCSB benchmark to show numbers when we instead use
Disk-based tables with much heavier update loads.

The Oracle Cloud contains a number of variants of Bare Metal servers
and VMs that are suitable for MySQL Servers and applications.

In NDB Cluster the MySQL Servers are actually stateless since all
the state is in the NDB Data Node. The only exception to this rule
is the MySQL Servers used for replication to another cluster that
requires disk storage for the MySQL binlog.

So usually a standard server can be setup without any special extra
disks for MySQL Servers and clients.

In the presentation we show the following important results.

The latency of DBMS operations is independent of the data size. We
get the same latency when Data set have 300M rows as when there are
600M rows.

We show that scaling to 8 Data Nodes with 4 Data Nodes in each Availability
Domains scales from 4 Data Nodes in the same Availability Domain. But
the extra latency increases the latency and this also some loss in throughput.
Still we reach 3.7M operations per second for this 8-node setup.

We show that an important decision for the cluster setup is the number of
LDM threads. These are the threads doing the actual database work. We get
best scalability when going for the maximum number of LDM threads which
is 32. Using 32 LDM threads can increase latency at low number of clients,
but when the clients increase the 32 LDM setup will scale much longer than
the 16 LDM thread setup.

In MySQL Cluster 8.0.20 we have made more efforts to improve scaling to
many LDM threads. So we expect the performance of large installations to
scale even further in 8.0.20.

The benchmark report above gives very detailed numbers of latency in various
situations. As can seen there we can handle 1.3M operations per second with
latency of reads below 1 ms and updates having latency below 2 ms!

Finally the benchmark report also shows the impact of various NUMA settings
on performance and latency. It is shown that Interlaced NUMA settings have
a slight performance disadvantage, but since it means that we get access to the
full DRAM and the full machine, it is definitely a good idea to use this setting.
In NDB Cluster this is the default setting.

The YCSB benchmark shows NDB Cluster in its home turf with enormous
throughput of key operations, both read and write, with predictable and low
latency.

Couple with the high availability features that have been proven in the field
with more than 15 years of continous operations with better than Class 6
availability we feel confident to claim that NDB Cluster is the World's
Fastest and Most Available Key-Value Store!

The YCSB benchmark is a standard benchmark, so any competing solution
is free to challenge our claim. We used a standard YCSB client of version
0.15.0 using a standard MySQL JDBC driver.

NDB Cluster supports full SQL through the MySQL Server, it can push joins
down to the NDB Data Nodes for parallel query filtering and joining.
NDB Cluster supports sharding transparently and complex SQL queries
executes cross-shard joins which most competing Key-Value stores don't
support.

One interesting example using NDB Cluster as a Key-Value Store is HopsFS
that implements a hierarchical file system based on Hadoop HDFS. It has been
shown to scale to 1.6M file operations per second and small files can be stored
in the NDB Cluster for low latency access to small files.

No comments: