Wednesday, August 08, 2018

More automated control in MySQL Cluster 7.6.7

Apart from bug fixes the 7.6.7 version of MySQL Cluster also brings
a major improvement of restart times through adaptively controlling
checkpoint speed.

Many DBMSs work hard on automating management of the database nodes.
In NDB automated management was a design point from the very first
version. This means that nodes crash and restart without operator
assistance.

For the last years we have also worked on developing algorithms that
require less configuration. This will greatly simplify the configuration
of NDB Cluster.

In 7.6.7 we have made it much easier to configure handling of checkpoints
(LCPs) and REDO logging.

In earlier versions of NDB the checkpoint speed has been controlled by
two things. The first is based on the following configuration variables:

MinDiskWriteSpeed: This is the minimum disk write speed we will attempt
to write during a checkpoint even in the presence of CPU overload and
disk overload. This defaults to 10 MByte per second, this is the sum
on all LDM threads, thus on the entire data node.

MaxDiskWriteSpeed: This is the maximum disk write speed we will attempt
to write during a checkpoint, if no CPU overload or disk overload is
seen, this is the checkpoint speed that will be used in normal operation.
Defaults to 20 MByte per second.

MaxDiskWriteSpeedOtherNodeRestart: This is the maximum disk write speed
we will write during a checkpoint when another node is restarting. It
defaults to 50 MByte per second.

MaxDiskWriteSpeedOwnNodeRestart: This is the maximum disk write speed
we will write during a checkpoint when our node is restarting. It defaults to
200 MByte per second.

The actual disk write speed achieved is using those configuration variables
in combination with an adaptive algorithm that will decrease the checkpoint
speed when the CPU or the disk is overloaded.

These parameters exists also in 7.6.7, but there is very little reason to
change them from their default value if the new configuration variable
EnableRedoControl is set to 1. By default this variable is set to 0 to
avoid changes of behaviour in a GA released version of MySQL Cluster.

In earlier versions of NDB it was necessary to have very large REDO logs.
The reason is that earlier versions (7.5 and earlier) wrote the entire
database to disk in each checkpoint. This meant that checkpoints during
massive inserts got larger and larger and to ensure successful insertion
of the entire data set it was necessary to have REDO logs that was about
twice the size of the DataMemory.

Now in 7.6.7 it should be quite enough to have 2-4 GByte of REDO log per
REDO log part (normally equal to the number of LDM threads). This REDO
log size works perfectly even when loading TBytes of data into NDB.
Remember that EnableRedoControl needs to be set to 1 for this to work.

Thus in MySQL Cluster 7.6.7 one can simplify the configuration of REDO logs
and checkpointing.

In earlier versions we need to set the following variables:
NoOfFragmentLogParts (always set equal to number of LDM threads)
NoOfFragmentLogFiles
FragmentLogFileSize
MinDiskWriteSpeed
MaxDiskWriteSpeed
MaxDiskWriteSpeedOtherNodeRestart
MaxDiskWriteSpeedOwnNodeRestart

The product of NoOfFragmentLogParts, NoOfFragmentLogFiles and
FragmentLogFileSize is the size of the REDO log. In earlier versions
this product should be roughly two times the setting of DataMemory.

The default setting of FragmentLogFileSize is 16 MByte. Personally I always
increase this setting to 256 MByte (set to 256M).

So e.g. with a DataMemory of 100 GByte and 8 LDM threads one can set those to
NoOfFragmentLogParts=8
NoOfFragmentLogFiles=50
FragmentLogFileSize=256M

This gives a REDO log size of 200 GByte.

The setting of disk write speed will be discussed a bit more in a coming blog.

In 7.6.7 one can instead configure as follows.

EnableRedoControl=1
NoOfFragmentLogParts=8
NoOfFragmentLogFiles=8
FragmentLogFileSize=256M

The setting of disk write speed variables need not be considered. The setting
of NoOfFragmentLogFiles to 8 and FragmentLogFileSize to 256M should work for
almost all setups of NDB. Only when dealing with data nodes larger than
one terabyte could it be considered to increase the REDO log size. The
NoOfFragmentLogParts should still be set to the number of LDM threads.

Thus in 7.6.7 a lot less thought has to go into configuration of REDO logs
and disk write speeds. Disk write speed still affects backup write speeds
as well though, so it could be a good idea to consider how fast you want to
write your backups using the variables MinDiskWriteSpeed and MaxDiskWriteSpeed.

The reason that disk write speeds for checkpoints is less important to consider
is that we calculate how fast we need to write the checkpoints based on the
write activity in NDB. This means that when setting EnableRedoControl the write
speed to the disk can be quite substantial. So this setting will not work very
well unless the disk subsystem is able to handle the load. The disk subsystem
should be able to handle around 100 MByte of disk writes per LDM thread.

With modern HW this should not be an issue, in particular not when using NVMe
drives. In our benchmarking we are using a RAID 0 setup of 6 SSD drives. With
8 LDM threads inserting at full speed we use about 50% of the disk bandwidth
in this case (500 MByte per second).

No comments: