MySQL Cluster 7.6 using the shared memory transporter can improve
performance at very high thread counts by more than 100%. Still
the performance is still dropping fairly significantly moving from
512 to 1536 threads. The MySQL Server using the NDB transporter
scales very well on all sorts of architectures and using very many
cores. But I have noted that when the number of connections goes
beyond some limit (in my benchmarks usually around 512 threads),
the performance starts to drop.
Actually in the commercial version of MySQL Cluster help is available
to resolve this problem. The thread pool was developed by me and a team
of performance experts to ensure that MySQL using InnoDB would have
the same performance even with massive amounts of threads hitting at the
MySQL server. It still works for this purpose. I have never mentioned
the use of thread pool for MySQL Cluster before, but the fact of the matter
is that it works perfectly fine to use the thread pool in combination
with MySQL Cluster.
There is one limitation in the current thread pool implementation. The maximum
number of thread groups are 64. This limit was set since MySQL didn't scale
beyond this number using InnoDB in those days. NDB is a distributed engine,
so it works a bit differently for NDB compared to InnoDB. It would be possible
to make the integration of the thread pool and NDB a bit tighter. But even with
the current implementation NDB can be used perfectly fine with the thread pool.
The limit 64 means that it won't really be so useful to use the thread pool and
NDB in combination with MySQL servers that use more than 16 CPUs.
The graph above show how the thread pool compares to the performance of
MySQL Cluster 7.6 on a small MySQL Server. It loses 1-2% on low thread
counts, but it continues to deliver good results even when passing the 512 thread limit.
The graph below shows how MySQL Cluster 7.6.6 using the thread pool compares to
MySQL Cluster 7.5.9 without thread pool. We see the usual linear curve at
high concurrency for the thread pool, in this case however it is limited by
the 64 thread groups since the setup in this case have access to 30 CPUs using
one cluster connection. I did some experiments where I moved the limit of 64
up a bit (a minor change). The performance for most experiments has a
good setting with thread pool size set to 128 and in this case the performance
actually increases a small bit as the number of threads increase.