Wednesday, December 03, 2008

LOCK_open, THE mutex :)

In all my days at working at MySQL the
LOCK_open mutex have always been a
key mutex to understand, now that I'm
working on scalability improvements of
the server it's as important to change
this mutex into something less contentious.

So last week I finally decided to start
thinking about how we can resolve this
mutex which is at the heart of the MySQL
Server. In principle the idea is that
LOCK_open has been used to protect a
hash table with all the open tables in
the MySQL Server. However it has been
used for many other purposes as well.
So it's not trivial to move around it.

However the main scalability problem
with LOCK_open is the hash lock it
provides. So what to do about it?

My current thinking is that a four-thronged
approach will do the trick.

1) Divide and concquer, perform the hash calculation
outside of the mutex and divide the hash into
e.g. 16 smaller hashes. This creates one problem
which is how to prune the open table cache.
Obviously there is no longer a simple linked
list where I can find the oldest entry. This
problem I'm still contemplating, there's
probably already a number of known good solutions
to this problem since I find it popping up in
almost every similar design. So it's a problem
looking for a solution pattern.

2) Shrink the amount of data it protects
by only allowing it to protect the hash table
and nothing more. This means e.g. that some
counters need to be updated with atomic
instructions instead.

3) Shrink the time it is protected by inserting
the table share into the hash rather than the
table object (this is actually Monty's idea).

4) Use a different technique for the lock that
works better for short-term locks (usually
spinlocks are more successful here).

A combination of these techniques will hopefully
make it possible to decrese the impact of
LOCK_open on the server code.

8 comments:

Bill Karwin said...

That's great that you're giving the lock contention some attention. Have you measured that this is the bottleneck for MySQL scalability?

I mean, the scalability advice we usually hear is to increase memory to cache buffers, make sure to define indexes appropriately, move data or scratch space to an SSD device, partition tables, or use replicated slaves for reading. Things like that.

All these tips suggest that MySQL Server is typically more I/O-bound than CPU-bound. So how much benefit do you expect to see from reducing lock contention?

Antony said...

Hi Mikael,

There are actually better options available than the four that you listed: Some options exist due to interesting quirks as to how some of MySQL's internal data structures operate.

Consider instead of having one global shared hash of tables, have multiple versions of the table. When a version is committed, it is immutable. Cloning HASH is actually trivially easy and is as cheap as a memcpy operation.
When a transaction begins, the THD should take a reference to the current version and use that throughout the life of the transaction. Have a simple detached thread which waits for transactions to end and it could scan through the list of active THDs and dispose of any old versions which are no longer referenced from anywhere.
The advantages: For typical code execution which does not include DDL operations, no atomic operations would be required as there are no need for mutex nor reference counts.

Food for thought?

Feel free to email me if you want to discuss this further in more detail.

Dmitry Lenev said...

Hej Mikael!

Note that approach 3) is already implemented in MySQL 6.0.

Mikael Ronstrom said...

Bill,
Yes, we have done lots of measurements
and as usual the workload very much
defines what is the bottleneck. So
LOCK_open is one of the hot mutexes
in the case when the workload is
CPU-bound. IO bound workload has
different bottleneck which is
almost down in the storage engines,
so these bottlenecks are engine-
specific. The benefits we're
expecting to see is mainly when we run
MySQL on machines with a high number
of cores (e.g. we've now got lots
standard servers with 16,24 and even
Niagara boxes with 256 threads per
box).

Currently the easiest way to avoid
the LOCK_open mutex contention is
to increase the size of the
table_open_cache from its current
default value of 64 to a higher number.
The other scalability advices you
mention will remain to be good ideas.

Mikael Ronstrom said...

Antony,
My understanding of your idea is
that the hash table would be a
RCU to avoid having a mutex around
this. Sounds like an interesting
idea to consider.

However correct me if I'm wrong but
I gather I will still need some
LOCK-thingie on the table share. So
the RCU would provide a pointer to
a table share which is certain to
not be removed since this would
destroy the RCU characteristics. So
then the LOCK is down to the table
and here the same issue arises again
I gather.

Shlomi said...

Mikael,

You said that the LOCK_open mutex is being used for many purposed. Can you shed some light on the other uses?
Are the other uses at all related to the table cache? If not, would it not be advisable to first split the LOCK-open to several non-contending locks (one for the table cache, another for whatever else is there, etc.)?

This first separation will negate (or sadly confirm) the possibility that other operations have grown accustomed to the fact that the lock is used by the table cache.

Once this is done, and you can safely say that the lock is only used by the table cache, a locking mechanism should be easier to implement.

Regards,
Shlomi

Mikael Ronstrom said...

Shlomi,
The other purposes that LOCK_open
is used for is covered by other
work already ongoing in the
6.0 development mentioned by
Dmitri here.

I haven't looked into the details
of all that but obviously will have
to look into those details to
solve this, but I'm pretty sure
that many of those uses are
separate from protecting this
hash table. So I don't expect
any major issues due to the
other uses since they don't
present a scalability problem,
more a software engineering
problem.

Antony said...

Hi Mikael,

Simple solution, give the table share structure a version number too.

The only time when you need exclusive lock on table share data is when performing ddl operations.

Create a new version, already marked as exclusively locked, wait until previous version is collected if neccessary, then proceed with ddl operation. Release lock on success, or create new version with old share info on failure.