Thursday, December 03, 2009

New threadpool design

In MySQL 6.0 a threadpool design was implemented based on
libevents and mutexes.

This design unfortunately had a number of deficiences:
1) The performance under high load was constrained due to a global
mutex protecting libevent (see BUG#42288).

2) The design had no flexibility to handle cases where threads were
blocked due to either locking or latches. E.g. a thread held up by a
table lock will be kept in the threadpool as an active thread until
the table lock is released. If all threads are blocked in this state,
it's easy to see that also any query that want to release the table
lock cannot be processed since all threads in the thread pool are
blocked waiting for the table lock (see BUG#34797).

3) The design is intended to support very many connections but
didn't use the most efficient methods to do this on Windows.
libevent uses poll on Windows which isn't a scalable API when
there are thousands of connections.

Also in all of the benchmarking with MySQL it's been clear that
performance of MySQL often drops significantly when there are too
many threads hitting the MySQL Server. We have seen vast
improvements of this the last year and there are some additional
improvements of this in the pipeline for inclusion into the next
MySQL milestone release. However the basic problem is still there,
that too many waiters in the queue can lead to various performance
drop off, one reason for such drop offs can be when mutex waits
starts to timeout in InnoDB.

So actually when we're looking at the threadpool design now, we're
aiming at solving two issues in one. The first is to remove this
scalability dropoff at high thread counts and the second is to
efficiently handle MySQL servers with thousands of connections.
Threadpool also enables us to have more control over on which
CPU threads are scheduled to execute on. We can even dynamically
adapt the CPU usage to optimize for lower power consumption by
the MySQL Server with a clever threadpool design.

We're currently in the phase of experimenting with different
models, however we opted for a design based around usage of epoll
on Linux, eventports on Solaris and kqueue for FreeBSD and
Mac OS X. We will also make a poll-based variant work mostly for
portability reasons although it's scalability won't be so great.
For Windows we're experimenting with some Windows specific
API's such as the IO Completion API.

The code to support thread pooling in MySQL is actually very
small so it's easy to adapt the code for a new experiment.

Last week we found a model that seems to work very fine.
The benchmarks shows that the performance on 1-32 threads is
around 97-103% of one thread per connection performance. When
we go beyond 32 threads the thread pool gains more and more,
it's getting to about 130% at 256 threads and reaches 250%
better performance on 1024 threads. However this model still
have the problem of deadlocks, so there is still some work on
refining this model. The current approach we have is fixing
the deadlock problem but removes about 10-15% of the
performance on lower number of threads. We have however
numerous ideas on how to improve this.

The basic idea with our current approach is to use thread groups,
where each group works indepently of other groups in handling a
set of connections. We're experimenting with the number of
threads per group and also how to handle the situation when the
last thread in the group is getting ready to execute a query.

Compared to maximum performance around 32 threads we reach
about 67% of this performance also on 1024 concurrently active
threads. The drop off 33% is expected since there is some
additional load when we reach an overload situation to ensure
that the proper thread is handling the task. At low number of
threads it's possible to immediately schedule the current worker
thread to work on the query, but in the overload situation there
is some queueing and context switching needed to handle the
situation. However the overhead at overload is constant, so it
doesn't worsen when the number of threads goes to a very high
number.

To handle the problems with blocked threads, we will implement a
new part of the storage engine API and API towards the MySQL
Server where the MySQL Server and the storage engines can
announce that they're planning to go inactive for some reason.
The threadpool will however handle the situation even if a thread
goes to sleep without announcing it, it will simply be more
performant if the announcement comes in those situations.

The new MySQL development model with milestone release is a
vital new injection to the MySQL development leading to the
possibility of making new features available to the MySQL
community users in an efficient manner without endangering the
quality of the MySQL Server. There is a very strict quality model
before approving any new feature into a milestone release.
The 6.0 thread pool design would not meet this strict quality
model. The new design must meet this strict quality model before
being accepted although we have good hopes for this to happen.