Mikael Ronstrom

Wednesday, May 13, 2009

More data on InnoDB Thread Concurrency

Here is the performance graph comparing using
InnoDB Thread Concurrency equal to 0 and
InnoDB Thread Concurrency equal to 24 using
sysbench readwrite with the new InnoDB
Thread concurrency algorithm as introduced
in MySQL 5.4.0.

Analysis of Google patches on 4,8 and 12 cores

One of the goals we had originally with the MySQL 5.4
development was to improve scaling from 4 cores to
8 cores. So in my early testing I ran comparisons of
the Google SMP + IO + tcmalloc patches on 4, 8 and 12
cores to see how it behaved compared with a stock
MySQL 5.1.28 version (Note the comparison here was
done on a very early version of 5.4, 5.4.0 have a
set of additional patches applied to it).

What we can see here is that the Google SMP patch and use
of tcmalloc makes a difference already on a 4-core server
using 4 threads. On 1 and 2 threads the difference is only
on the order of 1-2% so not really of smaller significance.

An interesting note in the graph is that 8-core numbers using
the Google improvements outperform the 12-core stock MySQL
5.1.28.

So what we concluded in those graphs is that the scaling from 4-cores
to 8-cores had improved greatly and that there also was a good scaling
from 8 cores to 12 cores. This improvement increased even more with
the 5.4 release. The main purpose of showing these numbers is to show
the difference between 4, 8 and 12 cores.

All benchmarks were executed on a 16-core x86 box with 4 cores
dedicated to running sysbench.

Analysis of Google patches in MySQL 5.4

Early on in the MySQL 5.4 development we tried out the
impact of the Google SMP patch and the Google IO patch.
At first we wanted to see which of the patches that
made most of an impact. The Google patches in MySQL 5.4
have 3 components at least that impact the performance.
1) Replace InnoDB memory manager by a malloc variant
2) Replace InnoDB RW-lock implementation
3) Make InnoDB use more IO threads

When disabling the InnoDB one opens up for a whole array
of potential candidates for malloc. Our work concluded
that tcmalloc behaved best on Linux and mtmalloc was
best on Solaris, see blog posts on Solaris below.

Malloc on Solaris investigation

Battle of the Mallocators on Solaris

I did also do some testing on Linux where I compared 4 different
cases (all variants were based on MySQL 5.1.28):
1) Using the Google SMP patch, Google IO patch (with 4 read and
4 write threads) and using tcmalloc
2) Using tcmalloc and no other Google patches
3) Using plain malloc from libc
4) Using plain MySQL 5.1.28 using InnoDB memory manager

Here are the results:

So as we can see here the replacement of the InnoDB memory manager
by standard malloc had no benefits whereas replacing it with
tcmalloc gave 10% extra performance. The Google SMP patch added
another 10% performance in sysbench readwrite. We have also
tested other OLTP benchmarks where the Google SMP patch added
about 5-10% performance improvement. As shown by Mark Callaghan
there are however other benchmarks where the Google SMP patch
provides much greater improvements.

Tuesday, May 12, 2009

MySQL 5.4 Patches: InnoDB Thread Concurrency

When benchmarking MySQL with InnoDB we quickly discovered
that using InnoDB Thread Concurrency set to 0 was an
improvement to performance since the implementation of
InnoDB Thread Concurrency used a mutex which in itself was
a scalability bottleneck.

Given that InnoDB Thread Concurrency is a nice feature that
ensures that one gets good performance also on an overloaded
server I was hoping to find a way to make the implementation
of this more scalable.

I tried out many different techniques using a combination of
mutexes and atomic variables. However every technique fell to
the ground and was less performant than setting it to 0 and not
using the InnoDB Thread Concurrency implementation. So I was
ready to give up the effort and move on to other ideas.

However after sleeping on it an inspirational idea came up.
Why use a mutex at all, let's see how it works by using the
OS scheduler to queue the threads that need to blocked. This
should be more scalable to use than a mutex-based approach.
There is obviously one bad thing about this approach and this
is due to that new arrivees can enter before old waiters. To
ensure we don't suffer too much from this a limit on the wait
was necessary.

So I quickly put together a solution that called yield once
and slept for 10 milliseconds twice at most and every time it
woke up it was checking an atomic variable to see if it was ok
to enter. After those three attempts it would enter without
checking.

I tried it and saw a 1% decrease on low concurrency and 5%
improvement on 32 threads and 10% on 64 threads and 15% on 128
threads. Voila, it worked. Now I decided to search for the
optimal solution to see how many yields and sleeps would be best.
It turned out I had found the optimal number at the first attempt.

The implementation still has corner cases where it provides less
benefits so I kept the possibility to use the old implementation by
adding a new variable here.

So currently the default in MySQL 5.4 is still 0 for InnoDB Thread
Concurrency. However we generally see optimal behaviour using
InnoDB Thread Concurrency set to around 24, setting it higher is
not bringing any real value to MySQL 5.4.0 and setting it lower
decreases the possible performance one can achieve. This seems
to be a fairly generic set-up that should work well in most cases.
We might change the defaults for this later.