Thursday, April 23, 2009

Join Executor for MySQL Cluster

Jonas in the Cluster team reported on his work on executing
joins in the NDB kernel for MySQL Cluster here.

This is a very interesting work we have in progress at MySQL.
We are working on an extension of the Storage Engine API
where the MySQL Server will present an Abstract Query Tree
to the Storage Engine. The Storage Engine can then decide to
execute the query on his own or decide that the MySQL Server
should execute it in the classic manner. In the first prototype
the optimisation will be done as usual and only after the
optimisation phase will we present the join to the storage
engine. However the specification also covers work on
integrating this with the optimiser and also enabling the
possibility for the storage engine to execute parts of the
query and not the entire one. The specification of this
work can be found here.

Jonas is working on the backend part for this interface in
MySQL Cluster.

What is interesting with pushing joins to the NDB kernel is that
it becomes very easy to parallelize the join execution. So what
will happen when this feature is ready is that MySQL Cluster
will shine on join performance and enable very good
performance on all sorts of application using SQL.

The reason that MySQL Cluster can so easily parallelize the query
execution of the join is due to the software architecture of the
NDB kernel. The NDB kernel is entirely developed as a message
passing architecture. So to start a thread of execution in the
NDB kernel one simply sends two messages when executing one
message and to stop a thread one simply doesn't send any messages
when executing a message. The problem then is more on that one
should not parallelize too much to run out of resources in the
system.

So with this development MySQL Cluster will also be shining at
Data Mining in an OLTP database. MySQL Cluster is designed for
systems where you need massive amounts of read and write
bandwidth (the cost of writing your data is close to the cost
of reading the data). So with the new features it will be
possible to do Data Mining on data updated in Real-time. Most
Data Mining is performed on a specialised Data Warehousing
solution. But to achieve this you need to transfer the data to
the Data Warehouse. With MySQL Cluster it will be possible to
both use the database for OLTP applications with heavy updates
always occuring while still querying the data with parallel
queries in parallel. MySQL Cluster is very efficient at
executing individual queries in the NDB kernel and can also
scale to very many machines and CPU cores.

2 comments:

Anonymous said...

Is this feature included in 7.0 release?

BTW, enjoy reading your blog. Very informative. It is the kind of performance info on mysql that I have been looking for...

Mikael Ronstrom said...

It isn't included in 7.0, it's still in the development phase. But more and more things work for each week so things are progressing nicely.