Friday, June 05, 2009

Follow-up Analysis of Split Rollback Segment Mutex

I performed a new set of tests of the patch to split the
rollback segment mutex on Linux. All these tests gave
positive results with improvements in the order of 2%.

One could also derive from the results some conclusions.
The first conclusion is that this split mainly improves
things when the number of threads is high and thus
contention of mutexes is higher. At 256 threads a number
of results improved up to 15%.

The numbers on lower number of threads were more timid
although in many cases an improvement was still seen.

What was also noticeable was that the sysbench read-write
with less reads which makes the transactions much shorter
the positive impact was much greater and the positive
impact on long transactions was much smaller (+0.4%
versus +2.5%). The impact on the short transaction test
with less reads was very positive also on lower number
of threads, the result on 32 threads improved 7%.

So the conclusion is that this patch is a useful contribution
to improvements and in particular improves matters on high
number of threads and with short transactions. According to
a comment on the previous blog it is also very positive in
insert benchmarks.

Thursday, June 04, 2009

Results of shootout on split page hash in InnoDB

I have now tried out the buffer split page hash patches on
both a Linux/x86 box and a SPARC/Solaris server (tests done
by Dimitri).

The three variants in short description are:
1) The Google v3 derived patch. This introduces a new array
of mutexes that only protect the buffer page hash. Thus some
extra checking is needed to ensure the page hasn't been
removed from the hash before using it. This is a very simple
and attractive patch from that point of view. The patch uses
an array of 64 mutexes.

2) A variant I developed with some inspiration from the Percona
patches. This patch uses an array of page hashes which each has
its own read-write lock. I've tried this with 1, 4 and 16 page
hashes and 4 is the optimum number. The rw-lock protects the
page hash long enough to ensure that the block hasn't been
possible to remove from the hash before the mutex is acquired.

3) The last variant is a mix of the two first which uses the
simplicity of the Google patch, uses a rw-lock instead and
separate page hashes (to ensure read ahead doesn't have to
go into all mutexes). Used an array of 4 page hashes here.

The conclusion is that the only version that has consistently
improved the MySQL 5.4.0 numbers is the version I originally
developed (2 above).

On sysbench read-write all versions improve numbers compared to
MySQL 5.4.0. 2 and 3 improve 2% whereas the original Google
patch improved with 1%.

On sysbench read-only on Linux it was much harder to beat the
MySQL 5.4.0 version. Only 2) did so and only by 0.5%. This is
not so surprising since this mutex is not a blocker for read-only
workloads. 1) gave -1% and 3) gave -0.3%.

On a write intensive workload on Linux 1) and 3) performed 0.5%
better than MySQL 5.4.0 whereas 2) gave 2% improvement.

Finally on a sysbench read-write with less reads on Linux, all
variants lost to MySQL 5.4.0. 1) by 2%, 2) by 0.1% and 3) by

Also the numbers from SPARC/Solaris give similar data. The major
difference is that the positive impact on SPARC servers is much
bigger, all the way up to 30% improvements in some cases. The
most likely reason for this is that SPARC servers
have bigger CPU caches and are thus more held back by lack of
concurrency and not so much by increased working set. The x86
box had 512kB cache per core and a 2MB L3 cache and is likely
to be very sensitive to any increase of the working set.

So the likely rationale for worse numbers in some cases is that
more mutexes or rw-locks gives more cache misses.

So given the outcome I will continue to see if I can keep the
simplicity of the Google patch and still maintain the improved
performance of my patch.

Wednesday, June 03, 2009

Some ideas on InnoDB kernel_mutex

I've noted that one reason that InnoDB can get difficulties
when there are many concurrent transactions in the MySQL Server
is that the lock time of the kernel_mutex often increases
linearly with the number of active transactions. One such
example is in trx_assign_read_view where each transaction
that does a consistent read creates a copy of the transaction
list to be able to deduce the read view of the transaction or

This means that each transaction is copied to the local transaction
list while holding the critical kernel_mutex.

Another such case is that most operations will set some kind of
intention lock on the table. This lock code will walk through
all locks on the table to check for compatible locks and the
first time it will even do so twice. Thus if all threads use the
same table (as they do in e.g. sysbench) then the number of locks
on the table will be more or less equal to the number of active

Thus as an example when running with 256 threads compared to 16
threads the kernel_mutex lock will be held for 16 times longer
and possibly even more since with more contention the mutex is
needed for even longer time to start up waiting transactions.

So this is an obvious problem, so what is then the solution?
Not extremely easy but one thing one can do is to make the
kernel_mutex into a read-write lock instead of a mutex. Then
many threads can traverse those lists in parallel. It will
still block others needing write access to the kernel_mutex
but it should hopefully improve things.

Another solution that is also going to improve the problem is
to use thread pools. Thread pools ensure that not as many
threads are active at a time. However we still have a problem
that transactions can still be as many active in parallel as
there are connections (although InnoDB has a limit of 1024
concurrent active transactions). So the thread pool needs
to prioritize connections with active transactions in cases
where there are too many threads active at a time.

This type of load regulation is often used in telecom systems
where it is more important to give priority to those that have
already invested time in running the activity. Those that are
newcomer comes in when there are empty slots not taken by
already running activities.

Tuesday, June 02, 2009

Increasing log file size increases performance

I have been trying to analyse a number of new patches we've
developed for MySQL to see their scalability. However I've
have gotten very strange results which didn't at all compare
with my old results and most of changes gave negative impact :(
Not so nice.

As part of debugging the issues with sysbench I decided to go
back to the original version I used previously (sysbench 0.4.8).
Interestingly even then I saw a difference on 16 and 32 threads
whereas on 1-8 threads and 64+ threads the result were the same
as usual.

So I checked my configuration and it turned out that I had changed
log file size to 200M from 1300M and also used 8 read and write
threads instead of 4. I checked quickly and discovered that the
parameter that affected the sysbench results was the log file size.
So increasing the log file size from 200M to 1300M increased the
top result at 32 threads from 3300 to 3750, a nice 15% increase.
The setting of the number of read and write threads had no
significant impact on performance.

This is obviously part of the problem which is currently being
researched both by Mark Callaghan and Dimitri.
Coincidentally Dimitri has just recently blogged about this and
provided a number of more detailed comparisons of the
performance of various settings of the log file size in InnoDB.