Monday, August 20, 2018

Use cases for MySQL Cluster

Third chapter of "MySQL Cluster 7.5 inside and out".

NDB was designed for a number of networking applications. In particular the original
design focused a lot on telecom applications. The telecom applications have extreme
requirements on availability, many networking applications as well.

This chapter goes through a number of application types where NDB have been used
and also uses a number of public examples of use cases.

These application types include DNS servers, DHCP servers, Intelligent Network (IN)
applications, 3G/4G/5G applications, RADIUS servers, number portability, DIAMETER
server, Video-on-demand applications, Payment Gateways, Software Defined
Networking (SDN), Network Function Virtualization (NFV), Voice over IP.

In addition also many internet applications such as Gaming servers, financial
applications such as stock quote servers, whitelist/blacklist handling. eCommerce
applications, payment services, web applications, fraud detection, online banking,
session database.

In addition a number of frameworks have used NDB and presented it publically such
as a GE presentation, HopsFS (Hadoop distribution), HopsYARN, Oracle OpenStack
and OpenLDAP.

With the possibility to handle more data in MySQL Cluster 7.6 it is likely that this
list of application types will grow even more.

Friday, August 17, 2018

Rationale for MySQL Cluster

Second chapter of "MySQL Cluster 7.5 inside and out".

When I started developing the ideas for NDB Cluster in my Ph.D studies
about 25 years ago, the first thing I did was to perform a very thorough
study of the requirements.

At that time I participated in a European Research for UMTS. UMTS was
later marketed as 3G. My part of this big study group of more than 100
researchers was to simulate the network behaviour. We used the
protocols developed by other groups to see what the load would be on
the various nodes in the telecom network.

I was mostly interested in the load it contributed to the network databases.
Through these studies and also by studying the AXE system developed
in Ericsson I got a very good picture of the requirements on a DBMS
to be used for future telecom services.

In parallel with this I also studied a number of other areas such as
multimedia email servers, news-on-demand servers, genealogy
servers that would be other popular services in the telecom network.

In the above chapter I go through how those requirements was turned
into requirements on predictable response times, availability requirements,
throughput requirements and so forth.

In particular how the requirement led to a model that divided the Data
Server functionality (NDB data nodes) and the Query Server
functionality (MySQL Server nodes).

Also how predictable response times are achieved by building on ideas
from the AXE architecture developed in Ericsson.

Thursday, August 16, 2018

What is special with MySQL Cluster

The first chapter from the book "MySQL Cluster 7.5 inside and out".
This chapter presents a number of key features that makes NDB
unique.

Thursday, August 09, 2018

Optimising scan filter for checkpoints in NDB

When loading massive amounts of data into NDB when testing the new
adaptive checkpoint speed I noted that checkpoints slowed down as the
database size grew.

I could note in debug logs that the amount of checkpoint writes was
dropping significantly at times. After some investigation I discovered
the root cause.

The checkpoint algorithm in NDB requires all changed rows to be written
to the checkpoint even if it is not a part that is fully checkpointed.
This means that each row has to be scanned to discover if it has been
written.

When loading 600 GByte of DBT2 data we have more than two billion rows
in the database. Scanning two billion rows takes around 15-20 seconds
when simultaneously handling lots of inserts.

This slowed down checkpoints and in addition it uses a lot of CPU.
Thus we wanted a more efficient scanning algorithm in this case.

The solution is based on dividing the database into larger segments.
When updating a row, one has to ensure that a flag on the larger
segment is also updated. A simple first approach is to implement
this on page level for our fixed size pages. Every row has an entry
in the fixed size. This part contains the row header and all fixed
size columns that are not defined as using DYNAMIC storage.

In DBT2 this means that most fixed size pages have around 300 row
entries. Thus we can check one page and if no row has been changed
we can skip checking 300 row entries.

When data size grows to TBytes and we checkpoint every 10-20 seconds,
the risk of a row in a page being updated is actually fairly low.
Thus this simple optimisation brings down the slowdown of the
checkpoints to small parts of a second.

Obviously it is possible to use smaller regions and also larger regions
to control this if required.

This is an important improvement of the checkpointing in
MySQL Cluster 7.6.7.