Tuesday, June 05, 2018

Colocated MySQL Server and NDB data nodes

Historically the advice for MySQL Cluster has been to not colocate
the MySQL Server and the NDB data node for scalable applications.

There are still reasons to apply this principle in scalable setups
where the application isn't close to the data.

But with MySQL Cluster 7.6 we have added a number of reasons why it
makes sense to colocate the MySQL Server and the NDB data node.

Internally in the NDB development we have had a debate about whether
to integrate the NDB data node inside the MySQL Server. The reasons
for this is that the MySQL Server will be closer to the data. The
argument against is that the NDB data node and the MySQL Server are
designed with different recovery models. NDB data nodes are failfast,
as soon as we find a fault that is not supposed to happen we will
crash the data node. The MySQL Server on the other hand tries to
stay up as much as possible since a crash of the MySQL Server brings
down the data accessibility. In NDB we always expect another data
node to have a synchronous replica, thus data is accessible even in
the case of a crash.

With MySQL Cluster 7.6 we have gotten the best of both worlds. We
can now communicate from the MySQL Server to a NDB data node using
a shared memory transporter. This means that communication goes
entirely through the memory of the machine, the communication between
a thread in the MySQL Server and a thread in the NDB data node
goes through memory and when a thread needs to wake up a thread a
mutex is used with a condition variable exactly as in the MySQL
Server. Still the NDB data node and the MySQL Server is separate
programs that can reside on machines independent of each other
and they can crash independently of each other.

So with the release of MySQL Cluster 7.6 it is possible to have
clusters with locality of reads. Already in MySQL Cluster 7.5 we
introduced the possibility to declare tables as being able to
read from all replicas (Read Backup feature). In addition we
introduced tables that can be fully replicated in MySQL Cluster 7.5.
In these fully replicated tables access to a table is always local
to the data node we are in.

In MySQL Cluster 7.6 we are introducing a shared memory transporter
for efficient communication between a colocated MySQL Server and
an NDB data node. In addition we are introducing the possibility
to declare location domain ids for all nodes in the cluster. This
means that we can ensure that reads always stays local to the
Availability Domain in an Oracle Cloud (Availability Zone in Amazon
and Google clouds). Thus it is possible to design applications
without having to consider networking constraints as much as before
with NDB.

This means that we expect NDB to work very well in SQL applications.
We are also constantly working on improving the SQL performance of
NDB by supporting more and more push down of joins to the NDB data nodes.
We are working on improving the basic range scan mechanism in NDB,
we are working on improving the interface between the NDB storage
engine and the MySQL Server. Finally we are also working hard to
integrate all the changes in MySQL 8.0 into MySQL Cluster as well.

I will describe a number of different scenarios for how to build
applications in the cloud using a setup where we have 3 data nodes,
one in each availability domain of the Oracle Cloud.

But in this blog and a few more blogs I will start by looking
specifically at how the shared memory transporter improves performance
of standard sysbench benchmarks.

In the previous blog I showed how we have improved performance of
Sysbench OLTP RW even for the standard TCP transporter. This was
due to the use of a new wakeup thread and the use of locking the NDB API
receive thread to a CPU where it can work undisturbed. The receive
thread handles receive of all messages from the NDB data nodes and
must be prioritised over the other MySQL Server threads, the best way
to achieve this is to use CPU locking. In the benchmarks we present in
this blog we will always use this CPU locking.

In the figure above we show how the performance of a normal setup using
7.5.9 compares to the 7.6.6 with receive thread locked to a CPU using
the TCP transporter. Next we have a curve that shows performance when
simply replacing the TCP transporter with a shared memory transporter.
Next we show a curve of what happens if we configure the shared memory
transporter to use spinning for a while before it goes to sleep.

The final curve shows the performance when also spinning in the TC
threads and the LDM threads in the NDB data node. Spinning in those
threads is not likely to be beneficial if those threads are not locked
to their own CPU core, thus in this one should not use hyperthreading
for those threads.

The takeaways from the graph above are the following:

1) The shared memory transporter have similar performance at low
concurrency as the TCP transporter. As concurrency increases the
shared memory transporter has better performance, the improvement
is 10% at top performance and more than 40% at very high concurrency.

2) Using spinning in the configuration of the shared memory transporter
improves performance at low concurrency significantly, by more than
20%. Top performance is similar to not using spinning, but it is
easier to get to this top performance.

3) Using spinning in the TC threads and LDM threads improves performance
even more at low concurrency. Performance increases by more than 30% at
low concurrency compared to no spinning and by 10% compared to spinning
only in transporter. Performance at high concurrency is similar for all
variants using shared memory transporter. So spinning helps to make the
MySQL Server need less concurrency to reach high performance levels.

We have added a graph below where highlight the performance at 1 and 2
threads since it is difficult to see those differences in the first

Configuring NDB to use a shared memory transporter is easy, the easiest
way is to simply set a new configuration variable UseShm to 1 on the
NDB data nodes. With this setting we will create a shared memory transporter
between all API nodes and this node when the API node and the data node
share the same hostname. It is also possible to create a separate shared
memory section to describe the transporter setup between two specific
nodes in the cluster.

Spintime for the shared memory transporter is easiest to setup using the default
shared memory transporter section. Spintime for TC and LDM threads in the NDB
data nodes are configured using the ThreadConfig variable in NDB data nodes.

Since we are using mutex and condition variables in shared memory we are
only supporting shared memory transporters on Linux at the moment.

The conclusion is that using the shared memory transporter we can improve
performance at low concurrency by more than 30%, we can improve throughput
by 20% and at very high concurrency (1536 threads) we get about 100%
improvement, all comparing to the result in using 7.5.9.

In the graph below we show only the 7.5.9 curve and compare it to the curve
achieved with all improvements in 7.6.

No comments: