Almost all software designed for multithreaded use cases in the 1990s have some sort of big kernel mutex, as a matter of a fact this is also true for some hyped new software written in this millenium and even in this decade. Linux had its big kernel mutex, InnoDB had its kernel mutex, MySQL server had its LOCK_open mutex. All these mutexes are characterized by the fact that these mutexes protects many things that often have no connection with each other. Most of these mutexes have been fixed by now, the Linux big kernel mutex is almost gone by now, the LOCK_open is more or less removed as a bottleneck, the InnoDB kernel mutex has been split into ten different mutexes.
In MySQL Cluster we have had two types of kernel mutexes. In the data nodes the "kernel mutex" was actually a single-thread execution model. This model was very efficient but limited scalability. In MySQL Cluster 7.0 we extended the single threaded data nodes to be able to run up to 8 threads in parallel instead. In MySQL Cluster 7.2 we extended this to enable support of up to 32 or more threads.
The "kernel mutex" in the NDB API we call the transporter mutex. This mutex meant that all communication from a certain process, using a specific API node, used one protected region that protected communication with all other data nodes. This mutex could in some cases be held for substantial times.
This has meant that there has been limitation on how much throughput can be processed using one API node. It has been possible to process much throughput
anyways using multiple API nodes per process (this is the configuration parameter ndb-cluster-connection-pool).
What we have done in MySQL Cluster 7.3 is that we have fixed this bottleneck. We have split the transporter mutex and replaced it with mutexes that protects sending to a specific data node, mutexes that protects receiving from a certain data node, mutexes that protect memory buffers and mutexes that protect execution on behalf of a certain NDB API connection.
This means a significant improvement of throughput per API node. If we run a benchmark with just one data node using the flexAsynch benchmark that handles around 300-400k transactions per second per API node, this improvement increases throughput by around 50%. A Sysbench benchmark for one data node is improved by a factor of 3.3x. Finally a DBT2 benchmark with one data node is improved by a factor of 7.5x.
The bottleneck for an API node is that only one thread can process incoming messages for one connection between an API node and a data node. For flexAsynch there is a lot of processing of messages per node connection, it's much smaller in Sysbench and even smaller in DBT2 and thus these benchmarks see a much higher improvement due to this new feature.
If we run with multiple data nodes the improvement increases even more since the connections to different data nodes from one API node are now more or less independent of each other.
The feature will improve performance of applications without any changes of the application, the changes are entirely done inside the NDB API and thus improve performance both of NDB API applications as well as MySQL applications.
It is still possible to use multiple API nodes for a certain client process. But the need to do so is much smaller and in many cases even removed.