Friday, February 21, 2020

Influences leading to Asynchronous Programming Model in NDB Cluster

A number of developments was especially important in influencing the development
of NDB Cluster. I was working at Ericsson, so when I didn't work on DBMS research
I was deeply involved in prototyping the next generation telecom switches. I was the
lead architect in a project that we called AXE VM. AXE was the cash cow of Ericsson
in those days. It used an in-house developed CPU called APZ. I was involved in some
considerations into how to develop a new generation of the next generation APZ in the
early 1990s. However I felt that the decided architecture didn't make use of modern
ideas on CPU development. This opened for the possibility to use a commercial CPU
to build a virtual machine for APZ. The next APZ project opted for a development
based on the ideas from AXE VM at the end of the 1990s. I did however at this time
focus my full attention to development of NDB Cluster.

One interesting thing about the AXE is that was the last single CPU telecom switch on
the market. The reason that the AXE architecture was so successful was due to the
concept of blocks and signals.

The idea with blocks came from inheriting ideas from HW development for SW
development. The idea is that each block is self-contained in that it contains all the
software and data for its operation. The only way to communicate between blocks is
through signals. More modern names on blocks and signals are modules and
messages. Thus AXE was entirely built on a message passing architecture.
However to make the blocks truly independent of each other it is important to only
communicate using asynchronous signals. As soon as synchronous signals are used
between blocks, these blocks are no longer independent of each other.

I became a very strong proponent of the AXE architecture, in my mind I saw that the
asynchronous model gave a 10x improvement of performance in a large distributed
system. The block and signal model constitutes a higher entrance fee to SW
development, but actually it provides large benefits when scaling the software for new
requirements.

One good example of this is when I worked on scaling MySQL towards higher
CPU core counts between 2008 and 2012. I worked on both improving scalability of
NDB Cluster and the MySQL Server. The block and signal model made it possible to
scale the NDB data nodes with an almost lock-free model. There are very few
bottlenecks in NDB data nodes for scaling to higher number of CPUs.
The main ones that still existed have been extensively improved in NDB Cluster 8.0.20.

Thus it is no big surprise that NDB Cluster was originally based on AXE VM. This
heritage gave us some very important parts that enabled quick bug fixing of
NDB Cluster. All the asynchronous messages goes through a job buffer. This means
that in a crash we can print the last few thousand messages that have been executed in
each thread in the crashed data node. In addition we also use a concept called
Jump Address Memory (jam). This is implemented in our code as macros that write
the line number and file number into memory such that we can track exactly how we
came to the crash point in the software.

So NDB Cluster comes from marrying the requirements on a network database for
3G networks with the AXE model that was developed in Ericsson in the 1970s.
As can be seen this model is still going strong given that NDB Cluster is able to deliver
the best performance, highest availability of any DBMS for telecom applications,
financial applications, key-value stores and even distributed file systems.

Thus listing the most important requirements we have on the software
engineering model:

1) Fail-fast architecture (implemented through ndbrequire macro in NDB)
2) Asynchronous programming (provides much tracing information in crashes)
3) Highly modular SW architecture
4) JAM macros to track SW in crash events

No comments: