Wednesday, July 08, 2009

New update to DBT2 clone with automated sysbench runs

I have made an update to the DBT2 clone where I packed in all my
benchmarking support scripts.

This update adds a new script bench_prepare.sh that should be run
from the benchmark server and uses the input of 3 tarballs, the
DBT2 tarball, the sysbench tarball and a MySQL tarball. It will
automatically build all needed binaries on both the benchmark
server and on the MySQL Server machine (they could be on same
machine or on different machine).

The script only requires one parameter --default-directory where
one configuration file called autobench.conf should be placed.
This directory will also be used to house all result files,
builds and generated configuration files for all involved scripts.

The aim is to continue develop such that we can also benchmark
easily using different Linux versions.

The tarball can be downloaded from here

The script can also handle a MySQL Server which is Windows-based,
but the benchmark server cannot run Windows for the moment.

Friday, June 05, 2009

Follow-up Analysis of Split Rollback Segment Mutex

I performed a new set of tests of the patch to split the
rollback segment mutex on Linux. All these tests gave
positive results with improvements in the order of 2%.

One could also derive from the results some conclusions.
The first conclusion is that this split mainly improves
things when the number of threads is high and thus
contention of mutexes is higher. At 256 threads a number
of results improved up to 15%.

The numbers on lower number of threads were more timid
although in many cases an improvement was still seen.

What was also noticeable was that the sysbench read-write
with less reads which makes the transactions much shorter
the positive impact was much greater and the positive
impact on long transactions was much smaller (+0.4%
versus +2.5%). The impact on the short transaction test
with less reads was very positive also on lower number
of threads, the result on 32 threads improved 7%.

So the conclusion is that this patch is a useful contribution
to improvements and in particular improves matters on high
number of threads and with short transactions. According to
a comment on the previous blog it is also very positive in
insert benchmarks.

Labels: , ,

Thursday, June 04, 2009

Results of shootout on split page hash in InnoDB

I have now tried out the buffer split page hash patches on
both a Linux/x86 box and a SPARC/Solaris server (tests done
by Dimitri).

The three variants in short description are:
1) The Google v3 derived patch. This introduces a new array
of mutexes that only protect the buffer page hash. Thus some
extra checking is needed to ensure the page hasn't been
removed from the hash before using it. This is a very simple
and attractive patch from that point of view. The patch uses
an array of 64 mutexes.

2) A variant I developed with some inspiration from the Percona
patches. This patch uses an array of page hashes which each has
its own read-write lock. I've tried this with 1, 4 and 16 page
hashes and 4 is the optimum number. The rw-lock protects the
page hash long enough to ensure that the block hasn't been
possible to remove from the hash before the mutex is acquired.

3) The last variant is a mix of the two first which uses the
simplicity of the Google patch, uses a rw-lock instead and
separate page hashes (to ensure read ahead doesn't have to
go into all mutexes). Used an array of 4 page hashes here.

The conclusion is that the only version that has consistently
improved the MySQL 5.4.0 numbers is the version I originally
developed (2 above).

On sysbench read-write all versions improve numbers compared to
MySQL 5.4.0. 2 and 3 improve 2% whereas the original Google
patch improved with 1%.

On sysbench read-only on Linux it was much harder to beat the
MySQL 5.4.0 version. Only 2) did so and only by 0.5%. This is
not so surprising since this mutex is not a blocker for read-only
workloads. 1) gave -1% and 3) gave -0.3%.

On a write intensive workload on Linux 1) and 3) performed 0.5%
better than MySQL 5.4.0 whereas 2) gave 2% improvement.

Finally on a sysbench read-write with less reads on Linux, all
variants lost to MySQL 5.4.0. 1) by 2%, 2) by 0.1% and 3) by
1%.

Also the numbers from SPARC/Solaris give similar data. The major
difference is that the positive impact on SPARC servers is much
bigger, all the way up to 30% improvements in some cases. The
most likely reason for this is that SPARC servers
have bigger CPU caches and are thus more held back by lack of
concurrency and not so much by increased working set. The x86
box had 512kB cache per core and a 2MB L3 cache and is likely
to be very sensitive to any increase of the working set.

So the likely rationale for worse numbers in some cases is that
more mutexes or rw-locks gives more cache misses.

So given the outcome I will continue to see if I can keep the
simplicity of the Google patch and still maintain the improved
performance of my patch.

Labels: , ,

Wednesday, June 03, 2009

Some ideas on InnoDB kernel_mutex

I've noted that one reason that InnoDB can get difficulties
when there are many concurrent transactions in the MySQL Server
is that the lock time of the kernel_mutex often increases
linearly with the number of active transactions. One such
example is in trx_assign_read_view where each transaction
that does a consistent read creates a copy of the transaction
list to be able to deduce the read view of the transaction or
statement.

This means that each transaction is copied to the local transaction
list while holding the critical kernel_mutex.

Another such case is that most operations will set some kind of
intention lock on the table. This lock code will walk through
all locks on the table to check for compatible locks and the
first time it will even do so twice. Thus if all threads use the
same table (as they do in e.g. sysbench) then the number of locks
on the table will be more or less equal to the number of active
transactions.

Thus as an example when running with 256 threads compared to 16
threads the kernel_mutex lock will be held for 16 times longer
and possibly even more since with more contention the mutex is
needed for even longer time to start up waiting transactions.

So this is an obvious problem, so what is then the solution?
Not extremely easy but one thing one can do is to make the
kernel_mutex into a read-write lock instead of a mutex. Then
many threads can traverse those lists in parallel. It will
still block others needing write access to the kernel_mutex
but it should hopefully improve things.

Another solution that is also going to improve the problem is
to use thread pools. Thread pools ensure that not as many
threads are active at a time. However we still have a problem
that transactions can still be as many active in parallel as
there are connections (although InnoDB has a limit of 1024
concurrent active transactions). So the thread pool needs
to prioritize connections with active transactions in cases
where there are too many threads active at a time.

This type of load regulation is often used in telecom systems
where it is more important to give priority to those that have
already invested time in running the activity. Those that are
newcomer comes in when there are empty slots not taken by
already running activities.

Labels: , ,

Tuesday, June 02, 2009

Increasing log file size increases performance

I have been trying to analyse a number of new patches we've
developed for MySQL to see their scalability. However I've
have gotten very strange results which didn't at all compare
with my old results and most of changes gave negative impact :(
Not so nice.

As part of debugging the issues with sysbench I decided to go
back to the original version I used previously (sysbench 0.4.8).
Interestingly even then I saw a difference on 16 and 32 threads
whereas on 1-8 threads and 64+ threads the result were the same
as usual.

So I checked my configuration and it turned out that I had changed
log file size to 200M from 1300M and also used 8 read and write
threads instead of 4. I checked quickly and discovered that the
parameter that affected the sysbench results was the log file size.
So increasing the log file size from 200M to 1300M increased the
top result at 32 threads from 3300 to 3750, a nice 15% increase.
The setting of the number of read and write threads had no
significant impact on performance.

This is obviously part of the problem which is currently being
researched both by Mark Callaghan and Dimitri.
Coincidentally Dimitri has just recently blogged about this and
provided a number of more detailed comparisons of the
performance of various settings of the log file size in InnoDB.

Labels: , ,

Wednesday, May 20, 2009

MySQL 5.4 Webinar

The quality of MySQL 5.4.0 is very high for a beta product.
Four weeks after we released it as beta we have not had
any real serious bugs reported yet. There are some issues
due to deprecation of features, version numbers and a
bug in the SHOW INNODB STATUS printout and some concerns
with the new defaults when running on low-end machines.
It's also important as usual to read the documentation
before upgrading, it contains some instructions needed to
make an upgrade successful. The upgrade issue comes from
changing the defaults of the InnoDB log file sizes.

For those of you who want to know more about MySQL 5.4.0
and it's characteristics and why you should use it, please
join this webinar where Allan Packer will explain what
has been done in MySQL 5.4.0.

Labels: ,

Tuesday, May 19, 2009

Patches ready for buf page hash split shootout

Today I created a patch that builds on the Google v3
patch where I added some ideas of my own and some ideas
from the Percona patches. The patch is here.

Here is a reference to the patch derived from the Google
v3 patch.

Here is a reference to my original patch (this is likely to
contain a bug somewhere so usage for other than benchmarking
isn't recommended).

So it will be interesting to see a comparison of all those
variants directly against each other on a number of benchmarks.

Labels: , , ,

Analysis of split flush list from buffer pool

In the Google v3 patch the buffer pool mutex have been
split into an array of buffer page hash mutexes and a
buffer flush list mutex and the buffer pool mutex also
remains.

I derived the patch splitting out the buffer flush list
mutex from the Google v3 patch against the MySQL 5.4.0
tree. The patch is here.

I derived a lot of prototype patches based on MySQL 5.4.0
and Dimitri tried them out. This particular patch seems
to be the most successful in the pack of patches we
tested. It had a consistent positive impact.

The main contribution of this patch is twofold. It
decreases the pressure on the buffer pool mutex by
splitting out a critical part where the oldest dirty
pages are flushed out to disk. In addition this patch
also decreases the pressure on the log_sys mutex by
releasing the log_sys mutex earlier for the mini-
transactions. In addition it removes interaction
between the buffer pool mutex and the log_sys mutex.
So previously both mutexes had to be held for a
while, this is no longer necessary since only the
flush list mutex is needed, not the buffer pool
mutex.

The new patch is the b11 variant which is red in
the comparison graphs.



As we can see the read-write tests have a pretty significant boost
from this patch, it improves top performance by 5% and by 10-20%
on higher number of threads. It also moves the maximum from 16 to
32 threads.



Even on read-only there are some positive improvements although
it is very possible those are more random in nature.



Finally the above picture shows that this patch also moves the
optimal InnoDB thread concurrency up to 24 from 16 since it
allows for more concurrency inside InnoDB. This is also visible
by looking at the numbers for InnoDB Thread Concurrency set to 0
as seen below.

Labels: , , , ,