Tuesday, November 01, 2011

MySQL Thread Pool: Storage Engines

I got a question from the NDB folks that are currently adapting MySQL Cluster to
MySQL 5.5 about whether any special developments are needed to adapt the NDB
storage engine for use with the thread pool. Then I realised there are more
people out there that write storage engines that want to know how to optimise
their storage engines for the thread pool.

So first of all any storage engine will work with the thread pool as they are today
without any modifications. It is however possible to improve the performance of the
MySQL Server when using the thread pool by adapting the storage engine to the
thread pool APIs.

The new API that has been added to the MySQL 5.5 server is the thd_wait interface.
This interface makes it possible for storage engines to report to a thread pool
plugin before starting a wait and after finishing a wait.

As an example, we have adapted the InnoDB storage engine by adding the thd_wait
interface calls around row locks in InnoDB and before file IO due to misses in
the InnoDB buffer pool. The InnoDB code have also been changed to make those
callbacks as part of the implementation of the --innodb-thread-concurrency and
when waiting for flushes of the buffer pool as part of checkpoints and other
activities where writes are required to ensure proper operation of InnoDB.

The NDB storage engine has very different reasons for the waits, the NDB storage engine
implements the actual data management in the NDB data nodes (these nodes runs in
separate processes separate from the MySQL Server), thus the only reason for waits
in the MySQL Server is when we're waiting for packets to return from the NDB data nodes.

Most third-party storage engines probably fit fairly well with InnoDB and/or NDB in how
they are integrated with the thread pool plugin. So there are storage engines that
perform all the work inside the MySQL Server. The more advanced such engines are likely
to also have a buffer pool and thus should consider calling the thd_wait interface
when doing IO, these storage engines are also likely to acquire row locks or some
similar level of data lock that sometimes will require an extended wait. There are
also other storage engines that are distributed in nature such as NDB, these
storage engines will want to make the callbacks to the new thread pool API when
waiting for responses on the network.

For storage engines that implement some data structure similar to the THD object in the
MySQL Server, there is one additional thing to consider. When using a thread pool it
makes sense to consider pooling such objects given that the thread pool
will pool threads. As an example, we have such an object called Ndb in the NDB API that
has the potential to be pooled. The benefits of pooling such objects are that it
means less time to create them, less memory usage and thus fewer CPU cache misses
due to their usage.

The thd_wait interface is really simple. It contains two calls thd_wait_begin and
thd_wait_end. Both calls have the THD object as the first parameter. Often the THD
object isn't known in the storage engine code when needed. In this case one simply
uses NULL as the THD object. The thd_wait interface can even handle the case where
the thd_wait interface is used from threads that are private to the storage engine.
The thread pool will discover that there is no THD object attached to the thread
and ignore the call.

The thd_wait_begin call also have a second parameter that specifies the type of
wait that will show up in the thread pool information schema tables. There
will be statistics on waits per type. There are currently 10 wait types.

To see an example of usage of this interface, search for thd_wait in the InnoDB
storage engine source code in the MySQL 5.5 community server.

The MyISAM storage engine does not use this API because MyISAM relies on the
MySQL Server for locking. Also, MyISAM assumes that the OS takes care of caching
of pages. This means that there is a very high probability that writes to the
file system are handled directly in the file system cache without involving any
long waits.

What is effect of not modifying a storage engine to implement the thd_wait
interface? The thread pool operates by trying to always have one thread active
per thread group. If the active thread is blocked and the thread pool is informed
of the block, then the thread pool can start another thread to ensure that the
thread group is being efficiently used. If the storage engine is not modified to
implement the thd_wait interface, the thread pool is not informed of the block.
In this case, the thread group will be blocked for a while until the wait is
completed or until the query is defined as stalled. The throughput of the system
can to some extent be handled in those cases by increasing the number of thread
groups.

So implementing the thd_wait interface means better throughput and also less
variance of the throughput and waiting times.

To use these interfaces in a file, include two header files (the thd_wait interface is
part of the plugin APIs in the MySQL 5.5 community and commercial servers).

#include "mysql/plugin.h"
#include "mysql/service_thd_wait.h"

Below is the most important information in these header files.

typedef enum _thd_wait_type_e {
THD_WAIT_SLEEP= 1,
THD_WAIT_DISKIO= 2,
THD_WAIT_ROW_LOCK= 3,
THD_WAIT_GLOBAL_LOCK= 4,
THD_WAIT_META_DATA_LOCK= 5,
THD_WAIT_TABLE_LOCK= 6,
THD_WAIT_USER_LOCK= 7,
THD_WAIT_BINLOG= 8,
THD_WAIT_GROUP_COMMIT= 9,
THD_WAIT_SYNC= 10,
THD_WAIT_LAST= 11
} thd_wait_type;
void thd_wait_begin(MYSQL_THD thd, int wait_type);
void thd_wait_end(MYSQL_THD thd);

THD_WAIT_SLEEP: For uninterrupted sleeps.
THD_WAIT_DISKIO: For file IO operations that are very likely to cause an actual
disk read.
THD_WAIT_ROW_LOCK: For row locks/page locks in the storage engine.
THD_WAIT_GLOBAL_LOCK: For global locks such as the global read lock in the MySQL
Server.
THD_WAIT_TABLE_LOCK: When waiting for a table lock.
THD_WAIT_META_DATA_LOCK: For waiting on a meta data lock which isn't a table lock.
THD_WAIT_USER_LOCK: For some type of special lock.
THD_WAIT_BINLOG: When waiting for the replication binlog.
THD_WAIT_SYNC: When waiting for an fsync operation.

It's quite likely we will introduce more wait types, such as the wait on the network.

Thursday, October 27, 2011

MySQL Thread Pool: Summary

A number of blogs have been written with the intent of describing how
the thread pool manages to solve the requirements of the thread pool.

These blogs are:
MySQL Thread Pool: Problem Definition
MySQL Thread Pool: Scalability Solution
MySQL Thread Pool: Limiting number of concurrent statement executions
Automated benchmark tool for DBT2, Sysbench and flexAsynch
MySQL Thread Pool: Limiting number of concurrent transactions
MySQL Thread Pool: When to use?
MySQL Thread Pool vs. Connection Pool
MySQL Thread Pool: Optimal configuration
MySQL Thread Pool: Benchmarking

There are some interesting discussions in the comments on the scalability solution blog
and on the blog about limiting number of concurrent statement executions
and finally also on the blog about when to use.

These discussions are around when to use it, what other features might be worth
considering and some remarks on the type of benchmarks that could be used to
evaluate solutions.

The requirements we had on the thread pool solution and the solutions were:
1) Split threads into groups individually handled to avoid making the
solution a problem in itself, aim is to manage one active thread per
group.

Solution:
Connections are put into a thread group at connect time by round robin.
Configurable number of thread groups. This ensures that the thread pool
itself isn't a scalability hog.

2) Wait for execution of a query until the MySQL Server has sufficient
CPU and memory resources to execute it.

Solution:
Each thread group tries to keep the number of executing queries to one or
zero. If a query is already executing in the thread group, put connection
in wait queue.

3) Prioritize queries on connections that have an ongoing transaction.

Solution:
Put waiting connections in high priority queue when a transaction is
already started on the connection.

4) Avoid deadlocks when queries are stalled or execute for a long time.

Solution:
Allow another query to execute when the executing query in the thread
group is declared as stalled (after a configurable time).

Wednesday, October 26, 2011

MySQL Thread Pool: Benchmarking

We have executed a number of benchmarks using the thread pool to
see how it operates in various workloads. A thorough study on this
can be found in Dimitri's blog here.

Optimal number of active connections is the number of active connections
needed to achieve the best throughput for the MySQL Server. For an InnoDB
workload this is usually around 32-128 active connections.

From all our benchmarks we've seen that the performance of the thread pool
when operated with less than the optimal number of active connections is
about 1-3% slower than without thread pool since the behaviour is the same
and the thread pool adds a little bit more overhead. More or less all of
this overhead is to handle KILL query correctly.

When operated in the region of the optimal number of active connections
the performance is very similar. We have seen though that the thread pool
benefits very much from locking the MySQL Server to a number of CPUs
equal to the setting of the thread_pool_size configuration parameter.
When not locked to CPUs the performance is similar, when locked to CPUs
the thread pool gives 10-15% higher performance when using the optimal
number of active connections. The MySQL Server operated without thread
pool and locked to CPUs have no significant change of throughput compared
to not locking to CPUs.

When operating above optimal number of connections the thread pool
provides a great benefit, we've seen numbers all the way up to 100x
better performance when operating with a few thousand concurrently
active connections.

MySQL Thread Pool: Information Schema Tables

The thread pool have three information schema tables. These are TP_THREAD_STATE,
TP_THREAD_GROUP_STATE and TP_THREAD_GROUP_STATS.

The TP_THREAD_STATE table contains one row per thread that is currently
maintained by the thread pool. This row contains only interesting information
if the thread is actively executing a statement. In this case it contains information
how many 10 milliseconds slots the query has consumed, if the thread is blocked by
some event, the event is listed. Both of those information items are current state
and will change for each new query.

The TP_THREAD_GROUP_STATE table contains one row per thread group. It contains
information about number of threads of various types. The first type is consumer
threads, this is a thread not used for the moment, at most 1 such thread will
exist at any point in time. This is the next thread to use if the current threads
used are not enough and a new thread is needed.

The second type of threads are reserved threads, these are also threads not currently
used. They will be used when there is no consumer thread and a new thread needs to be
started.

It contains information about the current number of connections handled in this thread
group. It contains current information about the number of queued low priority
statements (QUEUED_QUERIES) and queued high priority statements (QUEUED_TRANS).

It contains information about configuration, thus state of stall limit, priority
kickup timer, algorithm used. Also information about current number of threads in
the thread group, current number of threads actively executing a statement in the
thread group and current number of stalled statement executions.

Finally it contains some useful information about thread number of a possible
waiter thread (the thread that listens to incoming statements), information about
the oldest query that is still waiting to be executed.

The last table is the TP_THREAD_GROUP_STATS that contains statistics about the
thread group.

There are statistics about number of connections, number of connections closed,
number of queries executed, number of queries stalled, number of queries queued,
number of queries that was kicked up in priority from low priority to high priority.

There is also statistics on threads, how many threads have been started, how many
threads have become consumer threads, become reserve threads, become waiter threads.
How many times the thread that checks for stalled threads decided to start a thread
to handle the possibility of executing a query.

Finally there is statistics about each blocking event coming from the MySQL Server
(meta data locks, row locks, file IO, sleeps and so forth).

One of the most important information here is the number of stalled queries
(STALLED_QUERIES_EXECUTED in TP_THREAD_GROUP_STATS), this counter
gives a good idea if we have many stalled queries, if there are too many such
queries, it is a good indication that one should consider increasing the
thread_pool_stall_limit.

Another very important information is the number of priority kickups
(PRIO_KICKUPS in TP_THREAD_GROUP_STATS). If this counter
grows too quick it is an indication that the thread_pool_prio_kickup_timer
might need to be higher.

It might at times be important to check the number of threads started
(THREADS_STARTED in TP_THREAD_GROUP_STATS) as well.
If the threads are started too often, it's a good indicator that we should
not be so aggressive in stopping threads and thus set thread_pool_max_unused_threads
a bit higher.

The current oldest waiting query might also be a good idea to track to ensure that
we don't get longer waits than what is acceptable. If we get too long waits here,
one can either change some configuration variable, but it might also be an indicator
that the MySQL Server is constantly overloaded and that some action should be done
to remedy this.