Thursday, June 07, 2018

Benchmarking the Read Backup feature in the NDB storage engine

Read Backup was a new feature in MySQL Cluster 7.5. When MySQL
Cluster 7.5 was released I was already busily engaged in working
on the partial LCP feature we now released in 7.6. So I had not
much time producing benchmarks showing the impact of the
Read Backup feature.

Read Backup means that committed reads in NDB can use the backup
replicas as well. In NDB tables reads are already directed towards
the primary replica. The reason is that MySQL Cluster wants to
ensure that applications can trust that a reader can see his own
updates. Many modern NoSQL DBMSs lack this feature since they are
using eventual replication and a very flexible scheduling of which
replicas to read. NDB provides a stronger consistency guarantee
in that all applications can see their own updates and replication
is synchronous.

The reason that reading using a backup replica can fail to see its own
changes in NDB is that we release the locks on the primary replica first,
next we deliver the committed message to the application and last
we release the locks on the backup. This means that reading a
backup replica using committed read (reads the latest committed row without
locks) can only be guaranteed to see its own updates if it reads the
primary replica.

With tables that have the Read Backup feature we will delay the
sending of the committed message to the application until all rows
have been unlocked. This means that we can safely read any replica
for those tables and still see our own updates.

Setting the Read Backup feature for a table can either be set through
a special syntax using the COMMENT in the CREATE TABLE statement. A
much easier and likely more useful approach is to set the
ndb_read_backup configuration variable in the MySQL Server to 1. This
means that all tables in this MySQL Server will be created with the
Read Backup feature. Similarly there is a similar feature for ensuring
that all tables are created with the fully replicated feature. In this
case the configuration variable ndb_fully_replicated is set to 1. In
MySQL Cluster 7.6 none of these configuration variables are enabled by
default. But for SQL applications it is a good to always enable the
read backup feature and for applications that focus on read scalability
with fairly limited size of the data, the fully replicated feature can
also be enabled.

Fully replicated tables have a replica in each data node and any replica
can be read for committed reads.

The benchmark we present here is performed in a setup with the optimal
network architecture. It is two machines where the MySQL Server and the
NDB data node can be colocated and the network between the nodes is
using an Infiniband network. This means that in most every real case the
impact of using read backup in a colocated scenario is even bigger.

In this scenario the extra delay to go over the network is fairly small,
thus the impact of low concurrency is fairly small, but the extra overhead
of going over the network a lot shows its impact on higher concurrency and
grows all the way up to 30%.

No comments: