Friday, November 15, 2019

What's new in MySQL Cluster 8.0.18

MySQL Cluster 8.0.18 RC2 was released a few weeks back packed with a set of
new interesting things.

One major change we have done is to increase the number of data nodes from 48 to
144. There is also ongoing work to fully support 3 and 4 replicas in MySQL
Cluster 8.0. NDB has actually always been designed to handle up to 4 replicas.
But the test focus have previously been completely focused on 2 replicas. Now we
expanded our test focus to also verify that 3 and 4 replicas work well. This means
that with NDB 8.0 we will be able to confidently support 3 and 4 replicas.
This means that with NDB 8.0 it will be possible to have 48 node
groups with 3 replicas in each node group in one cluster.

The higher number of nodes in a cluster gives the possibility to store even more
data in the cluster. So with 48 node groups it is possible to store 48 TByte of
in-memory data in one NDB Cluster and on top of that one can also have
about 10x more data in disk data columns. Actually we have successfully
managed to load 5 TByte of data into a single node using the DBT2 benchmark,
so with 8.0 we will be able to store a few hundred TBytes of replicated
in-memory and petabytes of data in disk data columns for key-value stores
with high demands on storage space.

Given that we now support so much bigger data sets it is natural that we focus
on the ability to load data at high rates, both into in-memory data and into
disk data columns. For in-memory data this was solved already in 7.6 and
there is even more work in this area ongoing in 8.0.

We also upped one limitation in NDB where 7.6 have a limitation on row sizes
up to 14000 bytes, with NDB 8.0 we can handle 30000 byte row sizes.

Another obvious fact is that with so much data in the cluster it is important to
be able to analyze the data as well. Already in MySQL Cluster 7.2 we
implemented a parallel join operator inside of NDB Cluster available
from the MySQL Server for NDB tables. We made several important
improvements of this in 7.6 and even more has happened in NDB 8.0.

This graph shows the improvement made to TPC-H queries in 7.6 and in
8.0 up until 8.0.18. So chances are good that you will find that some of
your queries will be executed substantially faster in NDB 8.0 compared to
earlier versions. NDB is by design a parallel database machine, so what
we are doing here is ensuring that this parallel database machine can now
also be applied for more real-time analytics. We currently have parallel
filtering, parallel projection and parallel join in the data nodes. With
NDB 8.0 we also get all the new features of MySQL 8.0 that provides a
set of new features in the query processing area.
 The main feature added in 8.0.18 for query processing is the ability to pushdown
join execution of queries where we have conditions of the type t1.a = t2.b.
Previously this was only possible for the columns handled by the choosen index
in the join. Now it can be handled for any condition in the query.

8.0.18 also introduces a first step of improved memory management where the goal
is to make more efficient use of the memory available to NDB data nodes and also
to make configuration a lot simpler.

In NDB 8.0 we have also introduced a parallel backup feature. This means that taking
a backup will be much faster than previously and load will be shared on all LDM threads.

1 comment:

  1. Hi Mikael. It is great to see all the improvments coming in NDB.

    One thing I have a question about is the following from the MYSQL 8.0 release notes:

    "Two MySQL storage engines currently provide native partitioning support—InnoDB and NDB; of these, only InnoDB is supported in MySQL 8.0. Any attempt to create partitioned tables in MySQL 8.0 using any other storage engine fails. "
    https://dev.mysql.com/doc/refman/8.0/en/mysql-nutshell.html

    Is this going to be true for the GA version of MySQL Cluster?

    ReplyDelete