Friday, February 21, 2020

Influences leading to NDB Cluster using a Shared Nothing Model

The requirements on Class 5 availability and immediate failover had two important
consequences for NDB Cluster. The first is that we wanted a fail-fast architecture.
Thus as soon as we have any kind of inconsistency in our internal data structures we
immediately fail and rely on the failover and recovery mechanisms to make the failure
almost unnoticable. The second is that we opted for a shared nothing model where all
replicas are able to take over immediately.

The shared disk model requires replay of the REDO log before failover is completed
and this can be made fast, but not immediate. In addition as one quickly understands
with the shared disk model is that it relies on an underlying shared nothing storage
service. The shared disk implementation can never be more available than the
underlying shared nothing storage service.

Thus it is actually possible to build a shared disk DBMS on top of NDB Cluster.

The most important research paper influencing the shared nothing model used in NDB
is the paper presented at VLDB 1992 called "Dynamic Data Distribution in a
Shared-Nothing Multiprocessor Data Store".

Obviously it was required to fully understand the ARIES model that was presented
also in 1992 by a team at IBM. However NDB Cluster actually choose a very different
model since we wanted to achieve a logical REDO log coupled with a checkpoint
model that actually changed a few times in NDB Cluster.

No comments: