Thursday, October 28, 2010

Finding the optimum configuration of MySQL 5.5 running Sysbench

Sysbench is a commonly used benchmark tool to discover ways to improve MySQL performance. It is certainly not representative for every application, but it's still a useful tool for finding bottlenecks in the MySQL code.

In MySQL 5.5 a great number of new scalability improvements have been developed. Some of these will always be active and some of them requires using new configuration parameters.

In order to assist users of MySQL I am performing a fairly extensive benchmark series where I test the various configuration parameters that have an effect on running MySQL/InnoDB using Sysbench.

The parameters can be categorized into:
1) Changes of run-time environment
2) Compile time parameters (including choice of compiler)
3) Configuration parameters for MySQL
4) Configuration parameters for InnoDB

Finally there is also a set of parameters one can use to affect the execution of sysbench itself to have it behave differently. It's possible to have Sysbench use secondary index instead of primary key index, it's possible to let the table be partitioned and it's possible to run sysbench using multiple tables instead of only one (it's very syntetic to only have one table in a system which gets all the queries).

The parameters that I have found to be important to consider for performance of MySQL 5.5 when running sysbench are:
1.1) Use of tcmalloc
1.2) Use of taskset (lock MySQL to certain CPUs)
1.3) Affecting memory allocation by use of numactl
1.4) Connecting to MySQL through socket or using TCP/IP
2.1) Use of gcc compiler with platform-specific optimisations
2.2) Use --with-fast-mutexes flag when compiling MySQL
2.3) Choice of compiler (gcc, Open64, icc, SunStudio, ..)
2.4) Using feedback compilation in compilation
2.5) Compiling sysbench itself with platform specific optimisations
3.1) Choice of transaction isolation (READ COMMITTED or REPEATABLE READ)
3.2) Use of large pages through setting --large-pages=ON
3.3) Compiling with Performance Schema activated
3.4) Running with Performance Schema activated through --perfschema
3.5) Deactivating Query Cache completely through --query_cache_size=0
and --query_cache_type=0
4.1) Using new file format Barracuda or old format Antelope through
--innodb_file_format=x
4.2) Setting --innodb_stats_on_metadata to ON/OFF
4.3) Deactivating InnoDB doublewrite through --skip-innodb_doublewrite
4.4) Setting innodb-change-buffering to none/insert/all and so forth
4.5) Setting number of buffer pool instances using innodb-buffer-pool-instances=x
4.6) Setting InnoDB log file size using innodb-log-file-size=x
4.7) Setting InnoDB log buffer size using innodb-log-buffer-size=x
4.8) Setting InnoDB flush method (default fsync, O_DIRECT, O_DSYNC) using
--innodb_flush_method=x
4.9) Setting InnoDB to use file per table using --innodb_file_per_table
4.10) Deactivating InnoDB adaptive hash index using
--skip-innodb-adaptive-hash-index
4.11) Adapting read-ahead using --innodb_read_ahead=x
4.12) Using InnoDB max purge lag through --innodb_max_purge_lag=x
4.13) Using InnoDB purge thread through --innodb_purge_thread=1
4.14) Changing behaviour of InnoDB spin loops by changing
--innodb_sync_spin_loops=x and --innodb_spin_wait_delay=x
4.15) Changing InnoDB IO capacity by setting --innodb-io-capacity=x
4.16) Setting InnoDB buffer pool size through --innodb-buffer-pool-size=x
4.17) Setting InnoDB dirty pages percent through --innodb_dirty_pages_pct=x
4.18) Setting InnoDB old blocks percentage through --innodb_old_blocks_pct=x
4.19) Activating support for InnoDB XA through --innodb_support_xa=FALSE/TRUE
4.20) Activating InnoDB thread concurrency using
--innodb-thread-concurrency=x
4.21) Setting InnoDB commit mechanism through
--innodb-flush-log-at-trx-commit=x
4.22) Setting number of read and write IO threads through
--innodb-read-io-threads=x and --innodb-write-io-threads=x

This makes for a total of 36 parameters that can be tuned to affect sysbench performance. Most of these parameters have only a limited impact on performance but some of them can have a large impact and there is also a number of them where most reasonable values are ok, but if a bad value is used it can have a major impact on performance.

In addition the problem of finding the optimal values is a multi-dimensional search. So changing one parameter might very well affect the impact of other parameters. So my approach will be to start with a reasonable baseline configuration, then try each parameter and see how it affects the outcome, come to a new baseline by changing the most important parameter. Next step is to restart tests varying all the parameters which have had a measurable impact on performance. Using this method we should be able to find an optimum to a reasonable degree. Finding the absolute optimum is probably more or less practically impossible.

It's important here to understand that the MySQL defaults can sometimes have really bad values for performance. The reason is that MySQL defaults have been choosen mainly to make MySQL run on any HW platform and use very small resources. Remember that the normal use of MySQL isn't running sysbench on a 32-core server :) So it is important to consider all of these parameters to get the optimal performance of the MySQL Server.

I plan to write about these variables and how they affect sysbench performance in a number of upcoming blogs. I might also do a similar run with MySQL using the DBT2 benchmark. So hopefully after finishing these test runs I have found a reasonable optimum configuration to run sysbench and probably also the configuration parameters that needs changing when running DBT2 instead.

2 comments:

wlad said...

An idea a colleague of mine had back in the day is automation using AI methods for finding optimum configuration for benchmarks.

For example hill climbing or simulated annealing might well be applied, or genetic algorithms. I found that idea fascinating, it is pity nobody uses that in practice.

Mark Callaghan said...

This is an excellent project. I hope you have results soon.