Mikael Ronstrom

Wednesday, May 20, 2009

MySQL 5.4 Webinar

The quality of MySQL 5.4.0 is very high for a beta product.
Four weeks after we released it as beta we have not had
any real serious bugs reported yet. There are some issues
due to deprecation of features, version numbers and a
bug in the SHOW INNODB STATUS printout and some concerns
with the new defaults when running on low-end machines.
It's also important as usual to read the documentation
before upgrading, it contains some instructions needed to
make an upgrade successful. The upgrade issue comes from
changing the defaults of the InnoDB log file sizes.

For those of you who want to know more about MySQL 5.4.0
and it's characteristics and why you should use it, please
join this webinar where Allan Packer will explain what
has been done in MySQL 5.4.0.

Tuesday, May 19, 2009

Patches ready for buf page hash split shootout

Today I created a patch that builds on the Google v3
patch where I added some ideas of my own and some ideas
from the Percona patches. The patch is here.

Here is a reference to the patch derived from the Google
v3 patch.

Here is a reference to my original patch (this is likely to
contain a bug somewhere so usage for other than benchmarking
isn't recommended).

So it will be interesting to see a comparison of all those
variants directly against each other on a number of benchmarks.

Analysis of split flush list from buffer pool

In the Google v3 patch the buffer pool mutex have been
split into an array of buffer page hash mutexes and a
buffer flush list mutex and the buffer pool mutex also
remains.

I derived the patch splitting out the buffer flush list
mutex from the Google v3 patch against the MySQL 5.4.0
tree. The patch is here.

I derived a lot of prototype patches based on MySQL 5.4.0
and Dimitri tried them out. This particular patch seems
to be the most successful in the pack of patches we
tested. It had a consistent positive impact.

The main contribution of this patch is twofold. It
decreases the pressure on the buffer pool mutex by
splitting out a critical part where the oldest dirty
pages are flushed out to disk. In addition this patch
also decreases the pressure on the log_sys mutex by
releasing the log_sys mutex earlier for the mini-
transactions. In addition it removes interaction
between the buffer pool mutex and the log_sys mutex.
So previously both mutexes had to be held for a
while, this is no longer necessary since only the
flush list mutex is needed, not the buffer pool
mutex.

The new patch is the b11 variant which is red in
the comparison graphs.

As we can see the read-write tests have a pretty significant boost
from this patch, it improves top performance by 5% and by 10-20%
on higher number of threads. It also moves the maximum from 16 to
32 threads.

Even on read-only there are some positive improvements although
it is very possible those are more random in nature.

Finally the above picture shows that this patch also moves the
optimal InnoDB thread concurrency up to 24 from 16 since it
allows for more concurrency inside InnoDB. This is also visible
by looking at the numbers for InnoDB Thread Concurrency set to 0
as seen below.

Friday, May 15, 2009

Shootout of split page hash from InnoDB buffer pool mutex

One of the hot mutexes in InnoDB is the buffer pool mutex.
Among other things this mutex protects the page hash where
pages reside when they are in the cache.

There is already a number of variants of how to split out
this mutex. Here follows a short description of the various
approaches.

1) Google v3 approach
Ben Hardy at Google took the approach of using an array of
mutexes (64 mutexes) and this mutex only protects the
actual read, insert and delete from the page hash table.
This has the consequence of a very simple patch, it means
also that when the block has been locked one has to check
that the owner of the block hasn't changed since we didn't
protect the block between the read of the hash and the
locking of the block, thus someone is capable of coming in
between and grabbing the block for another page before we
get to lock the block. In addition this patch focuses
mainly on optimising the path in the buf_page_get_gen
which is the routine used to get a page from the page
cache and thus the hot-spot.

2) Percona approaches
Percona has done a series of approaches where the first
only split the page hash as one mutex and still protecting
the blocks from being changed while holding this mutex.
Next step was to change the mutex into a read-write lock.

3) My approach
My approach was inspired by Percona but added two main
things. First it split the page hash into a number of
page hashes and had one RW-lock per page hash (this
number has been tested with 4, 8 and 16 and 4 was the
optimal on Linux at least). In addition to avoid having
to lock and unlock multiple pages while going through
the read ahead code the hash function to decide which
page hash to use decided on the same page hash for all
pages within 1 MByte (which is the unit of read ahead
in InnoDB).

Pros and Cons

The simplest patch is the Google patch which makes for
a very simple patch and also by only focusing on
buf_page_get_gen avoids a lot of possible extra traps
that are likely if one tries to solve too much of the
problem.

Using a RW-lock instead of a mutex seems like at least
a manner of improving the concurrency but could of
course impose a higher overhead as well so here
benchmarking should show which is best here.

When using an array of locks it makes sense to optimise
for read ahead functionality since this is a hot-spot
in the code as has been shown in some blogs lately.

4) Mixed approach
So a natural solution is then to also try a mix of the
Google variant with my approach. So still using an
array of locks (either mutex or RW-locks, whatever
has the optimal performance) but ensuring that the
pages within a read ahead area is locked by the same
lock.

This approach reuses the simplicity of the Google
approach, the total lack of deadlock problems for
the Google approach with the optimised layout from
my approach and the idea of RW-locks from Percona.

We don't have any results of this shootout yet.
This shootout should also discover the optimum number
of areas to split the page cache into, Google has
used 64, but my results so far indicates a number of
4 seems more appropriate.