<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-14455177</id><updated>2012-02-01T08:03:25.906+01:00</updated><category term='MyISAM'/><category term='Connection Pool'/><category term='Dolphin'/><category term='innodb-thread-concurrency'/><category term='eventports'/><category term='MySQL'/><category term='crash recovery'/><category term='LOCK_open'/><category term='scalability'/><category term='ALTER TABLE'/><category term='epoll'/><category term='Partititon'/><category term='poll'/><category term='buffer pool mutex'/><category term='Windows'/><category term='iClaustron'/><category term='benchmarks'/><category term='innodb-thread-concurrency-timer-based'/><category term='DTrace'/><category term='DBT2'/><category term='kqueue'/><category term='Google'/><category term='partitioning'/><category term='InnoDB'/><category term='parallel MySQL'/><category term='IO Completion'/><category term='PARTTION BY'/><category term='Thread pool'/><category term='OpenSolaris'/><category term='cache index'/><category term='MySQL Cluster'/><category term='MySQL 5.4'/><category term='DX'/><category term='SSD'/><category term='WL#3352'/><category term='Solaris'/><category term='threadpool'/><category term='NDB'/><title type='text'>Mikael Ronstrom</title><subtitle type='html'>My name is Mikael Ronstrom and I work for Oracle as
Senior MySQL Architect. I am a member of the LDS
church.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default?start-index=101&amp;max-results=100'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>116</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-14455177.post-3465021675558954577</id><published>2011-11-01T16:28:00.000+01:00</published><updated>2011-11-01T16:28:57.252+01:00</updated><title type='text'>MySQL Thread Pool: Storage Engines</title><content type='html'>I got a question from the NDB folks that are currently adapting MySQL Cluster to&lt;br /&gt;MySQL 5.5 about whether any special developments are needed to adapt the NDB&lt;br /&gt;storage engine for use with the thread pool. Then I realised there are more&lt;br /&gt;people out there that write storage engines that want to know how to optimise&lt;br /&gt;their storage engines for the thread pool.&lt;br /&gt;&lt;br /&gt;So first of all any storage engine will work with the thread pool as they are today&lt;br /&gt;without any modifications. It is however possible to improve the performance of the&lt;br /&gt;MySQL Server when using the thread pool by adapting the storage engine to the&lt;br /&gt;thread pool APIs.&lt;br /&gt;&lt;br /&gt;The new API that has been added to the MySQL 5.5 server is the thd_wait interface.&lt;br /&gt;This interface makes it possible for storage engines to report to a thread pool&lt;br /&gt;plugin before starting a wait and after finishing a wait.&lt;br /&gt;&lt;br /&gt;As an example, we have adapted the InnoDB storage engine by adding the thd_wait&lt;br /&gt;interface calls around row locks in InnoDB and before file IO due to misses in&lt;br /&gt;the InnoDB buffer pool. The InnoDB code have also been changed to make those&lt;br /&gt;callbacks as part of the implementation of the --innodb-thread-concurrency and&lt;br /&gt;when waiting for flushes of the buffer pool as part of checkpoints and other&lt;br /&gt;activities where writes are required to ensure proper operation of InnoDB.&lt;br /&gt;&lt;br /&gt;The NDB storage engine has very different reasons for the waits, the NDB storage engine&lt;br /&gt;implements the actual data management in the NDB data nodes (these nodes runs in&lt;br /&gt;separate processes separate from the MySQL Server), thus the only reason for waits&lt;br /&gt;in the MySQL Server is when we're waiting for packets to return from the NDB data nodes.&lt;br /&gt;&lt;br /&gt;Most third-party storage engines probably fit fairly well with InnoDB and/or NDB in how&lt;br /&gt;they are integrated with the thread pool plugin. So there are storage engines that&lt;br /&gt;perform all the work inside the MySQL Server. The more advanced such engines are likely&lt;br /&gt;to also have a buffer pool and thus should consider calling the thd_wait interface&lt;br /&gt;when doing IO, these storage engines are also likely to acquire row locks or some&lt;br /&gt;similar level of data lock that sometimes will require an extended wait. There are&lt;br /&gt;also other storage engines that are distributed in nature such as NDB, these&lt;br /&gt;storage engines will want to make the callbacks to the new thread pool API when&lt;br /&gt;waiting for responses on the network.&lt;br /&gt;&lt;br /&gt;For storage engines that implement some data structure similar to the THD object in the&lt;br /&gt;MySQL Server, there is one additional thing to consider. When using a thread pool it&lt;br /&gt;makes sense to consider pooling such objects given that the thread pool&lt;br /&gt;will pool threads. As an example, we have such an object called Ndb in the NDB API that&lt;br /&gt;has the potential to be pooled. The benefits of pooling such objects are that it&lt;br /&gt;means less time to create them, less memory usage and thus fewer CPU cache misses&lt;br /&gt;due to their usage.&lt;br /&gt;&lt;br /&gt;The thd_wait interface is really simple. It contains two calls thd_wait_begin and&lt;br /&gt;thd_wait_end. Both calls have the THD object as the first parameter. Often the THD&lt;br /&gt;object isn't known in the storage engine code when needed. In this case one simply&lt;br /&gt;uses NULL as the THD object. The thd_wait interface can even handle the case where&lt;br /&gt;the thd_wait interface is used from threads that are private to the storage engine.&lt;br /&gt;The thread pool will discover that there is no THD object attached to the thread&lt;br /&gt;and ignore the call.&lt;br /&gt;&lt;br /&gt;The thd_wait_begin call also have a second parameter that specifies the type of&lt;br /&gt;wait that will show up in the thread pool information schema tables. There&lt;br /&gt;will be statistics on waits per type. There are currently 10 wait types.&lt;br /&gt;&lt;br /&gt;To see an example of usage of this interface, search for thd_wait in the InnoDB&lt;br /&gt;storage engine source code in the MySQL 5.5 community server.&lt;br /&gt;&lt;br /&gt;The MyISAM storage engine does not use this API because MyISAM relies on the&lt;br /&gt;MySQL Server for locking. Also, MyISAM assumes that the OS takes care of caching&lt;br /&gt;of pages. This means that there is a very high probability that writes to the&lt;br /&gt;file system are handled directly in the file system cache without involving any&lt;br /&gt;long waits.&lt;br /&gt;&lt;br /&gt;What is effect of not modifying a storage engine to implement the thd_wait&lt;br /&gt;interface? The thread pool operates by trying to always have one thread active&lt;br /&gt;per thread group. If the active thread is blocked and the thread pool is informed&lt;br /&gt;of the block, then the thread pool can start another thread to ensure that the&lt;br /&gt;thread group is being efficiently used. If the storage engine is not modified to&lt;br /&gt;implement the thd_wait interface, the thread pool is not informed of the block.&lt;br /&gt;In this case, the thread group will be blocked for a while until the wait is&lt;br /&gt;completed or until the query is defined as stalled. The throughput of the system&lt;br /&gt;can to some extent be handled in those cases by increasing the number of thread&lt;br /&gt;groups.&lt;br /&gt;&lt;br /&gt;So implementing the thd_wait interface means better throughput and also less&lt;br /&gt;variance of the throughput and waiting times.&lt;br /&gt;&lt;br /&gt;To use these interfaces in a file, include two header files (the thd_wait interface is&lt;br /&gt;part of the plugin APIs in the MySQL 5.5 community and commercial servers).&lt;br /&gt;&lt;br /&gt;#include "mysql/plugin.h"&lt;br /&gt;#include "mysql/service_thd_wait.h"&lt;br /&gt;&lt;br /&gt;Below is the most important information in these header files.&lt;br /&gt;&lt;br /&gt;typedef enum _thd_wait_type_e {&lt;br /&gt;  THD_WAIT_SLEEP= 1,&lt;br /&gt;  THD_WAIT_DISKIO= 2,&lt;br /&gt;  THD_WAIT_ROW_LOCK= 3,&lt;br /&gt;  THD_WAIT_GLOBAL_LOCK= 4,&lt;br /&gt;  THD_WAIT_META_DATA_LOCK= 5,&lt;br /&gt;  THD_WAIT_TABLE_LOCK= 6,&lt;br /&gt;  THD_WAIT_USER_LOCK= 7,&lt;br /&gt;  THD_WAIT_BINLOG= 8,&lt;br /&gt;  THD_WAIT_GROUP_COMMIT= 9,&lt;br /&gt;  THD_WAIT_SYNC= 10,&lt;br /&gt;  THD_WAIT_LAST= 11&lt;br /&gt;} thd_wait_type;&lt;br /&gt;void thd_wait_begin(MYSQL_THD thd, int wait_type);&lt;br /&gt;void thd_wait_end(MYSQL_THD thd);&lt;br /&gt;&lt;br /&gt;THD_WAIT_SLEEP: For uninterrupted sleeps.&lt;br /&gt;THD_WAIT_DISKIO: For file IO operations that are very likely to cause an actual&lt;br /&gt;disk read.&lt;br /&gt;THD_WAIT_ROW_LOCK: For row locks/page locks in the storage engine.&lt;br /&gt;THD_WAIT_GLOBAL_LOCK: For global locks such as the global read lock in the MySQL&lt;br /&gt;Server.&lt;br /&gt;THD_WAIT_TABLE_LOCK: When waiting for a table lock.&lt;br /&gt;THD_WAIT_META_DATA_LOCK: For waiting on a meta data lock which isn't a table lock.&lt;br /&gt;THD_WAIT_USER_LOCK: For some type of special lock.&lt;br /&gt;THD_WAIT_BINLOG: When waiting for the replication binlog.&lt;br /&gt;THD_WAIT_SYNC: When waiting for an fsync operation.&lt;br /&gt;&lt;br /&gt;It's quite likely we will introduce more wait types, such as the wait on the network.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3465021675558954577?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3465021675558954577/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3465021675558954577' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3465021675558954577'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3465021675558954577'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/11/mysql-thread-pool-storage-engines.html' title='MySQL Thread Pool: Storage Engines'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-5620026616702588228</id><published>2011-10-27T16:45:00.000+02:00</published><updated>2011-10-27T16:45:07.521+02:00</updated><title type='text'>MySQL Thread Pool: Summary</title><content type='html'>A number of blogs have been written with the intent of describing how&lt;br /&gt;the thread pool manages to solve the requirements of the thread pool.&lt;br /&gt;&lt;br /&gt;These blogs are:&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-problem-definition.html"&gt;MySQL Thread Pool: Problem Definition&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-scalability-solution.html"&gt;MySQL Thread Pool: Scalability Solution&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number-of.html"&gt;MySQL Thread Pool: Limiting number of concurrent statement executions&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/automated-benchmark-tool-for-dbt2.html"&gt;Automated benchmark tool for DBT2, Sysbench and flexAsynch&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number-of_21.html"&gt;MySQL Thread Pool: Limiting number of concurrent transactions&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-when-to-use.html"&gt;MySQL Thread Pool: When to use?&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-vs-mysql-connection.html"&gt;MySQL Thread Pool vs. Connection Pool&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-optimal-configuration.html"&gt;MySQL Thread Pool: Optimal configuration&lt;/a&gt;&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-benchmarking.html"&gt;MySQL Thread Pool: Benchmarking&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;There are some interesting discussions in the comments on the &lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-scalability-solution.html"&gt;scalability solution blog&lt;/a&gt;&lt;br /&gt;and on the blog about &lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number-of.html"&gt;limiting number of concurrent statement executions&lt;/a&gt;&lt;br /&gt;and finally also on the blog about &lt;a href="http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-when-to-use.html"&gt;when to use&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;These discussions are around when to use it, what other features might be worth&lt;br /&gt;considering and some remarks on the type of benchmarks that could be used to&lt;br /&gt;evaluate solutions.&lt;br /&gt;&lt;br /&gt;The requirements we had on the thread pool solution and the solutions were:&lt;br /&gt;1) Split threads into groups individually handled to avoid making the&lt;br /&gt;solution a problem in itself, aim is to manage one active thread per&lt;br /&gt;group.&lt;br /&gt;&lt;br /&gt;Solution:&lt;br /&gt;Connections are put into a thread group at connect time by round robin.&lt;br /&gt;Configurable number of thread groups. This ensures that the thread pool&lt;br /&gt;itself isn't a scalability hog.&lt;br /&gt;&lt;br /&gt;2) Wait for execution of a query until the MySQL Server has sufficient&lt;br /&gt;CPU and memory resources to execute it.&lt;br /&gt;&lt;br /&gt;Solution:&lt;br /&gt;Each thread group tries to keep the number of executing queries to one or&lt;br /&gt;zero. If a query is already executing in the thread group, put connection&lt;br /&gt;in wait queue.&lt;br /&gt;&lt;br /&gt;3) Prioritize queries on connections that have an ongoing transaction.&lt;br /&gt;&lt;br /&gt;Solution:&lt;br /&gt;Put waiting connections in high priority queue when a transaction is&lt;br /&gt;already started on the connection.&lt;br /&gt;&lt;br /&gt;4) Avoid deadlocks when queries are stalled or execute for a long time.&lt;br /&gt;&lt;br /&gt;Solution:&lt;br /&gt;Allow another query to execute when the executing query in the thread&lt;br /&gt;group is declared as stalled (after a configurable time).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-5620026616702588228?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/5620026616702588228/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=5620026616702588228' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5620026616702588228'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5620026616702588228'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-summary.html' title='MySQL Thread Pool: Summary'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1562421322955901260</id><published>2011-10-26T15:04:00.000+02:00</published><updated>2011-10-26T15:04:43.003+02:00</updated><title type='text'>MySQL Thread Pool: Benchmarking</title><content type='html'>We have executed a number of benchmarks using the thread pool to&lt;br /&gt;see how it operates in various workloads. A thorough study on this&lt;br /&gt;can be found in Dimitri's blog &lt;a href="http://dimitrik.free.fr/blog/archives/2011/09/mysql-performance-high-load-new-thread-pool-in-55.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Optimal number of active connections is the number of active connections&lt;br /&gt;needed to achieve the best throughput for the MySQL Server. For an InnoDB&lt;br /&gt;workload this is usually around 32-128 active connections.&lt;br /&gt;&lt;br /&gt;From all our benchmarks we've seen that the performance of the thread pool &lt;br /&gt;when operated with less than the optimal number of active connections is&lt;br /&gt;about 1-3% slower than without thread pool since the behaviour is the same&lt;br /&gt;and the thread pool adds a little bit more overhead. More or less all of&lt;br /&gt;this overhead is to handle KILL query correctly.&lt;br /&gt;&lt;br /&gt;When operated in the region of the optimal number of active connections&lt;br /&gt;the performance is very similar. We have seen though that the thread pool&lt;br /&gt;benefits very much from locking the MySQL Server to a number of CPUs&lt;br /&gt;equal to the setting of the thread_pool_size configuration parameter.&lt;br /&gt;When not locked to CPUs the performance is similar, when locked to CPUs&lt;br /&gt;the thread pool gives 10-15% higher performance when using the optimal&lt;br /&gt;number of active connections. The MySQL Server operated without thread&lt;br /&gt;pool and locked to CPUs have no significant change of throughput compared&lt;br /&gt;to not locking to CPUs.&lt;br /&gt;&lt;br /&gt;When operating above optimal number of connections the thread pool&lt;br /&gt;provides a great benefit, we've seen numbers all the way up to 100x&lt;br /&gt;better performance when operating with a few thousand concurrently&lt;br /&gt;active connections.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1562421322955901260?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1562421322955901260/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1562421322955901260' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1562421322955901260'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1562421322955901260'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-benchmarking.html' title='MySQL Thread Pool: Benchmarking'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-992229636366396390</id><published>2011-10-26T10:55:00.000+02:00</published><updated>2011-10-26T10:55:40.008+02:00</updated><title type='text'>MySQL Thread Pool: Information Schema Tables</title><content type='html'>The thread pool have three information schema tables. These are TP_THREAD_STATE,&lt;br /&gt;TP_THREAD_GROUP_STATE and TP_THREAD_GROUP_STATS.&lt;br /&gt;&lt;br /&gt;The TP_THREAD_STATE table contains one row per thread that is currently&lt;br /&gt;maintained by the thread pool. This row contains only interesting information&lt;br /&gt;if the thread is actively executing a statement. In this case it contains information&lt;br /&gt;how many 10 milliseconds slots the query has consumed, if the thread is blocked by&lt;br /&gt;some event, the event is listed. Both of those information items are current state&lt;br /&gt;and will change for each new query.&lt;br /&gt;&lt;br /&gt;The TP_THREAD_GROUP_STATE table contains one row per thread group. It contains&lt;br /&gt;information about number of threads of various types. The first type is consumer&lt;br /&gt;threads, this is a thread not used for the moment, at most 1 such thread will&lt;br /&gt;exist at any point in time. This is the next thread to use if the current threads&lt;br /&gt;used are not enough and a new thread is needed.&lt;br /&gt;&lt;br /&gt;The second type of threads are reserved threads, these are also threads not currently&lt;br /&gt;used. They will be used when there is no consumer thread and a new thread needs to be&lt;br /&gt;started.&lt;br /&gt;&lt;br /&gt;It contains information about the current number of connections handled in this thread&lt;br /&gt;group. It contains current information about the number of queued low priority&lt;br /&gt;statements (QUEUED_QUERIES) and queued high priority statements (QUEUED_TRANS).&lt;br /&gt;&lt;br /&gt;It contains information about configuration, thus state of stall limit, priority&lt;br /&gt;kickup timer, algorithm used. Also information about current number of threads in&lt;br /&gt;the thread group, current number of threads actively executing a statement in the&lt;br /&gt;thread group and current number of stalled statement executions.&lt;br /&gt;&lt;br /&gt;Finally it contains some useful information about thread number of a possible&lt;br /&gt;waiter thread (the thread that listens to incoming statements), information about&lt;br /&gt;the oldest query that is still waiting to be executed.&lt;br /&gt;&lt;br /&gt;The last table is the TP_THREAD_GROUP_STATS that contains statistics about the&lt;br /&gt;thread group.&lt;br /&gt;&lt;br /&gt;There are statistics about number of connections, number of connections closed,&lt;br /&gt;number of queries executed, number of queries stalled, number of queries queued,&lt;br /&gt;number of queries that was kicked up in priority from low priority to high priority.&lt;br /&gt;&lt;br /&gt;There is also statistics on threads, how many threads have been started, how many&lt;br /&gt;threads have become consumer threads, become reserve threads, become waiter threads.&lt;br /&gt;How many times the thread that checks for stalled threads decided to start a thread&lt;br /&gt;to handle the possibility of executing a query.&lt;br /&gt;&lt;br /&gt;Finally there is statistics about each blocking event coming from the MySQL Server&lt;br /&gt;(meta data locks, row locks, file IO, sleeps and so forth).&lt;br /&gt;&lt;br /&gt;One of the most important information here is the number of stalled queries&lt;br /&gt;(STALLED_QUERIES_EXECUTED in TP_THREAD_GROUP_STATS), this counter&lt;br /&gt;gives a good idea if we have many stalled queries, if there are too many such&lt;br /&gt;queries, it is a good indication that one should consider increasing the&lt;br /&gt;thread_pool_stall_limit.&lt;br /&gt;&lt;br /&gt;Another very important information is the number of priority kickups&lt;br /&gt;(PRIO_KICKUPS in TP_THREAD_GROUP_STATS). If this counter&lt;br /&gt;grows too quick it is an indication that the thread_pool_prio_kickup_timer&lt;br /&gt;might need to be higher.&lt;br /&gt;&lt;br /&gt;It might at times be important to check the number of threads started&lt;br /&gt;(THREADS_STARTED in TP_THREAD_GROUP_STATS) as well.&lt;br /&gt;If the threads are started too often, it's a good indicator that we should&lt;br /&gt;not be so aggressive in stopping threads and thus set thread_pool_max_unused_threads&lt;br /&gt;a bit higher.&lt;br /&gt;&lt;br /&gt;The current oldest waiting query might also be a good idea to track to ensure that&lt;br /&gt;we don't get longer waits than what is acceptable. If we get too long waits here,&lt;br /&gt;one can either change some configuration variable, but it might also be an indicator&lt;br /&gt;that the MySQL Server is constantly overloaded and that some action should be done&lt;br /&gt;to remedy this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-992229636366396390?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/992229636366396390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=992229636366396390' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/992229636366396390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/992229636366396390'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-information-schema.html' title='MySQL Thread Pool: Information Schema Tables'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2711061687642015823</id><published>2011-10-25T13:09:00.000+02:00</published><updated>2011-10-25T13:09:11.969+02:00</updated><title type='text'>MySQL Thread Pool: Optimal configuration</title><content type='html'>The thread pool plugin has a number of configuration parameters that will affect&lt;br /&gt;its performance. These are documented in the MySQL manual &lt;a href="http://dev.mysql.com/doc/refman/5.5/en/thread-pool-plugin.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;To configure the thread pool for optimal operation the most important parameter is&lt;br /&gt;the --thread_pool_size. This parameter specifies the number of thread groups that&lt;br /&gt;the thread pool will create.&lt;br /&gt;&lt;br /&gt;The default value of thread_pool_size=16 is very often a good starting point. We have&lt;br /&gt;seen that for InnoDB Read Workloads it is sometimes possible to achieve even better&lt;br /&gt;results when it is set to around 30-40. For write intensive InnoDB workloads the&lt;br /&gt;optimum can be in the range 12-30. MyISAM workloads usually have an optimum a bit&lt;br /&gt;lower in the range of 6-8. The default value of 16 will work well also for most&lt;br /&gt;MyISAM workloads.&lt;br /&gt;&lt;br /&gt;The next parameter to consider for optimum operation is --thread_pool_stall_limit.&lt;br /&gt;This is set to 6 (=60ms) by default. This number is set very low for good operation&lt;br /&gt;in most cases. In most cases with workloads that don't have very many long queries&lt;br /&gt;it is ok to set this much higher. Setting it to 100 (=1 second) should be ok in most&lt;br /&gt;cases.&lt;br /&gt;&lt;br /&gt;In the information schema one can see how many queries are stalled, if there are too&lt;br /&gt;many queries stalled, then it is a good idea to increase this parameter since stalled&lt;br /&gt;queries lead to increased context switching activity and more threads to manage for the&lt;br /&gt;operating system.&lt;br /&gt;&lt;br /&gt;The next parameter --thread_pool_prio_kickup_timer is set rather high to 1000&lt;br /&gt;(=1 second). This setting should be ok for most cases, in extremely loaded environments&lt;br /&gt;where thousands of connections want to execute at the same time it's necessary to&lt;br /&gt;increase this variable to ensure that queries aren't moved too early. At the same time&lt;br /&gt;setting it too high means that long-running transactions can block out short transactions&lt;br /&gt;too much. But settings up to 10000 (=10 seconds) should in most cases be ok.&lt;br /&gt;&lt;br /&gt;There is a parameter which isn't supported --thread_pool_algorithm. This parameter&lt;br /&gt;makes it possible to use a bit more aggressive scheduling algorithm in the thread pool.&lt;br /&gt;In most cases this has no benefits other than in some cases achieving better results.&lt;br /&gt;It has been left accessible if someone wants to experiment with it and give us feedback&lt;br /&gt;about it.&lt;br /&gt;&lt;br /&gt;The last parameter is --thread_pool_max_unused_threads. This parameter specifies the&lt;br /&gt;maximum amount of unused threads we will keep per thread group. It's possible to have&lt;br /&gt;quite a few unused threads and to ensure that we give back memory to the operating&lt;br /&gt;system one can use this parameter. By default it's 0 which means that threads are never&lt;br /&gt;released and kept around for future use. Setting to a nonzero value means that the server&lt;br /&gt;will use less memory but can also attribute to a higher CPU overhead to create new&lt;br /&gt;threads again later on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2711061687642015823?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2711061687642015823/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2711061687642015823' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2711061687642015823'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2711061687642015823'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-optimal-configuration.html' title='MySQL Thread Pool: Optimal configuration'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-5483029293482408115</id><published>2011-10-24T17:31:00.001+02:00</published><updated>2011-10-24T19:05:54.503+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Connection Pool'/><category scheme='http://www.blogger.com/atom/ns#' term='Thread pool'/><title type='text'>MySQL Thread Pool vs. Connection Pool</title><content type='html'>Given that thread and connections in the MySQL Server&lt;br /&gt;have been so intertwined, it is easy to confuse the&lt;br /&gt;purpose of the MySQL Thread Pool and the purpose of&lt;br /&gt;a Connection Pool.&lt;br /&gt;&lt;br /&gt;The aim of a Connection Pool is that the MySQL&lt;br /&gt;clients should not be forced to constantly do connect and&lt;br /&gt;disconnect. Thus it is possible to cache a connection in&lt;br /&gt;the MySQL client when a user of the connection no longer&lt;br /&gt;needs it. Thus another user that needs a connection to the&lt;br /&gt;same MySQL Server can reuse this cached connection later on.&lt;br /&gt;&lt;br /&gt;This saves execution time in both the client and the server.&lt;br /&gt;It does however not change the dynamics of how many queries&lt;br /&gt;are executed in parallel in the MySQL Server. This means that&lt;br /&gt;the likelihood of too many concurrent queries to execute in&lt;br /&gt;the MySQL Server is the same with or without a Connection&lt;br /&gt;Pool.&lt;br /&gt;&lt;br /&gt;Also a Connection Pool operates on the client side. This&lt;br /&gt;means that it doesn't see the state of the MySQL Server when&lt;br /&gt;deciding whether to send a query to the MySQL Server or not. Thus&lt;br /&gt;it doesn't have the required information to decide whether to&lt;br /&gt;queue a query or not. Only the MySQL Server have this information&lt;br /&gt;and thus the MySQL Thread Pool has to operate in the MySQL Server.&lt;br /&gt;It cannot perform its task on the client side.&lt;br /&gt;&lt;br /&gt;Thus it is easy to see that the MySQL Thread Pool and a&lt;br /&gt;Connection Pool are orthogonal and can be used independent of&lt;br /&gt;each other.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-5483029293482408115?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/5483029293482408115/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=5483029293482408115' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5483029293482408115'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5483029293482408115'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-vs-mysql-connection.html' title='MySQL Thread Pool vs. Connection Pool'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2009462092266278123</id><published>2011-10-24T13:07:00.000+02:00</published><updated>2011-10-24T13:07:16.144+02:00</updated><title type='text'>MySQL Thread Pool: When to use?</title><content type='html'>The most important variable to monitor is threads_running. This&lt;br /&gt;variable keeps track of the number of concurrent statements&lt;br /&gt;currently executing in the MySQL Server.&lt;br /&gt;&lt;br /&gt;If this variable has spikes that put it in a region where the&lt;br /&gt;server won't operate optimally (usually going beyond 40 for&lt;br /&gt;InnoDB workloads) and most particular if it goes well beyond&lt;br /&gt;this into the hundreds or even thousands of concurrent&lt;br /&gt;statements then the thread pool will be something beneficial&lt;br /&gt;to protect the MySQL Server from troubles in overload&lt;br /&gt;situations.&lt;br /&gt;&lt;br /&gt;Another indicator that you will benefit from use of the thread&lt;br /&gt;pool is when you already now use the --innodb-thread-concurrency&lt;br /&gt;variable. This variable tries to solve a similar problem and the&lt;br /&gt;thread pool solves it at a better place even before query execution&lt;br /&gt;has started and also provides additional benefits.&lt;br /&gt;&lt;br /&gt;Also if your workload is mainly short queries then the thread&lt;br /&gt;pool will be beneficial, long queries isn't bad for the thread&lt;br /&gt;pool but will decrease its positive impact.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2009462092266278123?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2009462092266278123/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2009462092266278123' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2009462092266278123'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2009462092266278123'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-when-to-use.html' title='MySQL Thread Pool: When to use?'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2819856994320359829</id><published>2011-10-21T19:39:00.000+02:00</published><updated>2011-10-21T19:39:22.375+02:00</updated><title type='text'>MySQL Thread Pool: Limiting number of concurrent transactions</title><content type='html'>There are hot spots in the MySQL Server that become hotter when many&lt;br /&gt;transactions are handled concurrently. This means that it is imperative&lt;br /&gt;to avoid having too many concurrent transactions executing in parallel.&lt;br /&gt;&lt;br /&gt;The thread pool handles this by prioritizing queued queries according&lt;br /&gt;to whether they have already started executing a transaction or not.&lt;br /&gt;It is also possible for the user to decide that a connection will be of&lt;br /&gt;high priority independent of whether a transaction is started or not.&lt;br /&gt;&lt;br /&gt;Such a prioritization can have issues with livelock if there are transactions&lt;br /&gt;that are very long. To avoid this problem a query will be moved to the high&lt;br /&gt;priority queue after a configurable time have expired. This time is set in the&lt;br /&gt;configuration parameter --thread_pool_prio_kickup_timer&lt;br /&gt;(number of milliseconds before a query is kicked up).&lt;br /&gt;&lt;br /&gt;However to avoid too many movements in a short time, the thread pool will&lt;br /&gt;at most move one query per 10 milliseconds per thread group.&lt;br /&gt;&lt;br /&gt;It is possible for the user to define his connection as always being of&lt;br /&gt;high priority to ensure queries from that connection always moves faster&lt;br /&gt;through the server.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2819856994320359829?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2819856994320359829/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2819856994320359829' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2819856994320359829'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2819856994320359829'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number-of_21.html' title='MySQL Thread Pool: Limiting number of concurrent transactions'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1461405391621086864</id><published>2011-10-21T17:50:00.000+02:00</published><updated>2011-10-21T17:50:08.835+02:00</updated><title type='text'>Automated benchmark tool for DBT2, Sysbench and flexAsynch</title><content type='html'>A new benchmark tool is available &lt;a href="http://dev.mysql.com/downloads/benchmarks.html"&gt;here&lt;/a&gt; to enable automated&lt;br /&gt;benchmark runs of DBT2, Sysbench and flexAsynch using MySQL&lt;br /&gt;and MySQL Cluster.&lt;br /&gt;&lt;br /&gt;The benchmark tool is based on dbt2-0.37 for DBT2, sysbench-0.4.12&lt;br /&gt;for sysbench benchmarks and a flexAsynch program that is available in&lt;br /&gt;MySQL Cluster source releases (the version needed for the automated&lt;br /&gt;flexAsynch tests requires an updated version of flexAsynch.cpp which&lt;br /&gt;hasn't been included in a MySQL Cluster release yet, a blog post&lt;br /&gt;notifying when it arrives will be written).&lt;br /&gt;&lt;br /&gt;The automation scripts are part of the dbt2-0.37.50.tar.gz package.&lt;br /&gt;This package is needed to run all benchmarks. In addition a gzipped&lt;br /&gt;source or binary tarball of MySQL or MySQL Cluster is required to&lt;br /&gt;run the benchmarks. Finally to run sysbench benchmarks one also needs&lt;br /&gt;to download the sysbench-0.4.12.5 tarball.&lt;br /&gt;&lt;br /&gt;So assuming you have downloaded all those tarballs, how does one&lt;br /&gt;run a sysbench benchmark on your local machine?&lt;br /&gt;&lt;br /&gt;The first step is to create a benchmark directory, I usually use&lt;br /&gt;$HOME/bench or /data1/bench. In this directory create a directory&lt;br /&gt;tarballs. Place all three tarballs in this directory. Go into this&lt;br /&gt;directory and unpack the dbt2 tarball through tar xfz dbt2-0.37.50.tar.gz.&lt;br /&gt;Then copy the benchmark start script into the $HOME/bench directory&lt;br /&gt;through the command:&lt;br /&gt;cp $HOME/bench/tarballs/dbt2-0.37.50/scripts/bench_prepare.sh $HOME/bench/.&lt;br /&gt;&lt;br /&gt;Then copy the example configuration file in the same manner using the&lt;br /&gt;command:&lt;br /&gt;cp $HOME/bench/dbt2-0.37.50/examples/autobench.conf $HOME/bench/.&lt;br /&gt;&lt;br /&gt;Edit the autobench.conf to be conformant to your file system environment.&lt;br /&gt;The example configuration file assumes the use of /data1/bench as the&lt;br /&gt;directory to use.&lt;br /&gt;&lt;br /&gt;Now it is time to prepare to run a benchmark, create a directory under&lt;br /&gt;$HOME/bench for the test run. So for example if you want to call it&lt;br /&gt;test_sysbench then run the command:&lt;br /&gt;mkdir $HOME/bench/test_sysbench&lt;br /&gt;&lt;br /&gt;Next step is to copy the autobench.conf file into this directory and&lt;br /&gt;edit it.&lt;br /&gt;cd $HOME/bench&lt;br /&gt;cp autobench.conf test_sysbench/.&lt;br /&gt;&lt;br /&gt;Now there are two way to go about editing this configuration file. If&lt;br /&gt;you want to go fast and unsafe, then go ahead and edit the file&lt;br /&gt;directly, there is a fair amount of explanations of the various&lt;br /&gt;parameters in this file. If you want more help, then read the&lt;br /&gt;dbt2-0.37.50/README-AUTOMATED to get more directions about how to&lt;br /&gt;set-up the configuration file properly.&lt;br /&gt;&lt;br /&gt;Now everything is ready to run the benchmark, this is done through&lt;br /&gt;the commands:&lt;br /&gt;cd $HOME/bench&lt;br /&gt;./bench_prepare.sh --default-directory $HOME/bench/test_sysbench&lt;br /&gt;&lt;br /&gt;If you want to follow the development of the benchmark in real-time&lt;br /&gt;you can do this by issuing tail -f on the proper file. For sysbench&lt;br /&gt;RO benchmarks there will be a file called&lt;br /&gt;$HOME/bench/test_sysbench/sysbench_results/oltp_complex_ro_1.res&lt;br /&gt;for the first test run (you can tell the benchmark to do several&lt;br /&gt;runs). Do tail -f on this while the benchmark is running and you'll&lt;br /&gt;get printouts from the sysbench program written on your console.&lt;br /&gt;Among other things you'll get a string like this:&lt;br /&gt;Intermediate results: 128 threads, 3564 tps&lt;br /&gt;if the current running test uses 128 concurrent connections. By&lt;br /&gt;default the intermediate results are reported every 3 seconds.&lt;br /&gt;&lt;br /&gt;The final result will be reported in the file&lt;br /&gt;$HOME/bench/test_sysbench/final_result.txt&lt;br /&gt;&lt;br /&gt;An additional note is that if you want to test the flexAsynch&lt;br /&gt;tests, then it is necessary to use a source tarball of the&lt;br /&gt;MySQL Cluster 7.x series. This is simply because the flexAsynch&lt;br /&gt;program isn't distributed in binary tarballs.&lt;br /&gt;&lt;br /&gt;The benchmark script will take care of the build process for all&lt;br /&gt;source tarballs, all important parameters you need to handle&lt;br /&gt;is part of the autobench.conf script. You will however need to&lt;br /&gt;install the proper compilers and build tools to enable builds of&lt;br /&gt;MySQL, sysbench and DBT2 programs.&lt;br /&gt;&lt;br /&gt;If you want to benchmark a MySQL Server using the thread pool, then&lt;br /&gt;it is necessary to download a MySQL Enterprise Edition of the MySQL&lt;br /&gt;Server. If you already have a commercial license with Oracle, then&lt;br /&gt;simply use this to download the MySQL binary tarball through the&lt;br /&gt;edelivery.oracle.com. If you don't have a commercial license, you&lt;br /&gt;can use the Oracle Software Delivery Cloud Trial License Agreement&lt;br /&gt;which gives you a 30-day trial license. So to get the binary tarball&lt;br /&gt;go to edelivery.oracle.com, register if necessary, log in, answer&lt;br /&gt;all required license agreements.&lt;br /&gt;&lt;br /&gt;Next step is to select MySQL Database as the Product Pack and as&lt;br /&gt;Platform select Linux x86-64. Finally download the TAR file for&lt;br /&gt;generic Linux2.6 x86-64 platforms. When this download is completed&lt;br /&gt;then unzip the file and you'll get the gzipped tarball you need&lt;br /&gt;to run the thread pool benchmark.&lt;br /&gt;Linux x86_64.&lt;br /&gt;&lt;br /&gt;The sysbench contains a few extra features compared to the&lt;br /&gt;sysbench-0.4.12 version. It contains support for intermediate&lt;br /&gt;result reporting, support for multiple tables in the sysbench&lt;br /&gt;benchmark, support for partitioned tables, support for using&lt;br /&gt;secondary indexes, support for using HANDLER statements instead&lt;br /&gt;of SELECT statements, and also support for running sysbench at&lt;br /&gt;fixed transaction rates with a certain jitter.&lt;br /&gt;&lt;br /&gt;DBT2 can in addition to running with a single MySQL Server also&lt;br /&gt;run with multiple MySQL Servers when used with MySQL Cluster.&lt;br /&gt;It contains a few new features here to control partitioning,&lt;br /&gt;possibility to place the ITEM table in each MySQL Server and&lt;br /&gt;so forth.&lt;br /&gt;&lt;br /&gt;All scripts and many programs have updated parameters and all&lt;br /&gt;scripts have extensive help outputs to make it easy to understand&lt;br /&gt;what they can do.&lt;br /&gt;&lt;br /&gt;It is fairly easy to extend the benchmark scripts. As an example&lt;br /&gt;if you need to change a parameter which isn't included in&lt;br /&gt;autobench.conf then first add it in bench_prepare.sh and then&lt;br /&gt;add code to handle it in start_ndb.sh. Also update the&lt;br /&gt;autobench.conf example file if you want to keep the feature&lt;br /&gt;for a longer time. If you want to suggest changes to the&lt;br /&gt;scripts please report it in My Oracle Support (support.oracle.com)&lt;br /&gt;or in bugs.mysql.com and assign it to Mikael Ronstrom.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1461405391621086864?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1461405391621086864/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1461405391621086864' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1461405391621086864'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1461405391621086864'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/automated-benchmark-tool-for-dbt2.html' title='Automated benchmark tool for DBT2, Sysbench and flexAsynch'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6733213820112746169</id><published>2011-10-21T13:56:00.000+02:00</published><updated>2011-10-21T13:56:41.285+02:00</updated><title type='text'>MySQL Thread Pool: Limiting number of concurrent statement executions</title><content type='html'>The main task of the thread pool is to limit the number of&lt;br /&gt;concurrent statement executions. The thread pool achieves&lt;br /&gt;this by trying to always operate a thread group such that&lt;br /&gt;only one or zero queries are concurrently executed per&lt;br /&gt;thread group.&lt;br /&gt;&lt;br /&gt;There is however a livelock issue to consider. A long-running&lt;br /&gt;query in a thread group will in this manner block out all&lt;br /&gt;other queries in this thread group until the query is completed.&lt;br /&gt;&lt;br /&gt;To resolve this issue there will be a configurable timer that&lt;br /&gt;decides when a statement execution is declared as stalled. When&lt;br /&gt;a query is declared as stalled, it is allowed to continue&lt;br /&gt;executing until completed. The thread group will handle the&lt;br /&gt;connection as stalled and not count it as an active connection.&lt;br /&gt;Thus new queries can be executed in the thread group again when a&lt;br /&gt;query have been declared as stalled.&lt;br /&gt;&lt;br /&gt;Another issue is when a statement execution is blocked for some&lt;br /&gt;reason. Queries can be blocked e.g. by Row Locks, File IO, Table&lt;br /&gt;Locks, Global Read Locks and so forth. If it is likely that the&lt;br /&gt;blockage will continue for at least a millisecond or so, then it&lt;br /&gt;makes sense to start up another statement execution in the thread&lt;br /&gt;group to ensure that we continue to keep the number of concurrent&lt;br /&gt;active connections at the right level.&lt;br /&gt;&lt;br /&gt;To enable this the MySQL Server will make callbacks to the thread&lt;br /&gt;pool stating when a block begins and when it ends. The thread&lt;br /&gt;pool will use this to keep track of number of active statement&lt;br /&gt;executions and this is used to decide when to start a new query&lt;br /&gt;and when to allow an incoming query to start.&lt;br /&gt;&lt;br /&gt;It is important that the wait is sufficiently long since it is&lt;br /&gt;necessary to immediately continue executing the query when the&lt;br /&gt;blockage ends.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6733213820112746169?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6733213820112746169/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6733213820112746169' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6733213820112746169'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6733213820112746169'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-limiting-number-of.html' title='MySQL Thread Pool: Limiting number of concurrent statement executions'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7663275461280342791</id><published>2011-10-21T11:08:00.000+02:00</published><updated>2011-10-21T11:08:52.758+02:00</updated><title type='text'>MySQL Thread Pool: Scalability solution</title><content type='html'>When implementing a thread pool or any other means of limiting concurrency in the MySQL Server, careful thought is required about how to divide the problem to ensure that we don't create any unnecessary new hot spots. It is very easy to make a design that manages all connections and threads in one pool. This design does however very quickly run into scalability issues due to the need to lock the common data structures every time a connection or thread needs to change its state.&lt;br /&gt;&lt;br /&gt;To avoid this issue we decided to implement the thread pool using a set of thread groups. Each of those thread groups are independent of the other thread groups. Each thread group manages a set of connections and threads. It also handles a set of queues and other data structures required to implement the thread group operations.  Each thread group will contain a minimum of one thread, connections will be bound to a thread group at connect time using a simple round robin assignment. The thread pool aims to ensure that each thread group either has zero or one thread actively executing a statement. This means that the interactions between threads within one thread group is extremely limited. Also the interactions won't grow as the MySQL Server gets more statements to process. Thus it is very hard to see this model become a scalability issue in itself.&lt;br /&gt;&lt;br /&gt;So we solved the scalability problem using a Divide-and-Conquer technique.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7663275461280342791?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7663275461280342791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7663275461280342791' title='18 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7663275461280342791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7663275461280342791'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-scalability-solution.html' title='MySQL Thread Pool: Scalability solution'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>18</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8276542742455467792</id><published>2011-10-18T16:03:00.001+02:00</published><updated>2011-10-18T16:07:03.049+02:00</updated><title type='text'>MySQL Thread Pool: Problem Definition</title><content type='html'>A new thread pool plugin is now a part of the MySQL Enterprise Edition.&lt;br /&gt;In this blog we will cover the problem that the thread pool is solving&lt;br /&gt;and some high-level description of how it solves this problem.&lt;br /&gt;&lt;br /&gt;In the traditional MySQL server model there is a one-to-one mapping between&lt;br /&gt;thread and connection. Even the MySQL server has lots of code where thread&lt;br /&gt;or some abbreviation of thread is actually representing a connection.&lt;br /&gt;Obviously this mapping has served MySQL very well over the years, but there&lt;br /&gt;are some cases where this model don't work so well.&lt;br /&gt;&lt;br /&gt;One such case is where there are much more connections executing queries&lt;br /&gt;simultaneously compared to the number of CPUs available in the server. The&lt;br /&gt;MySQL Server also have scalability bottlenecks where performance suffers&lt;br /&gt;when too many connections execute in parallel.&lt;br /&gt;&lt;br /&gt;So effectively there are two reasons that can make performance suffer in&lt;br /&gt;the original MySQL Server model.&lt;br /&gt;&lt;br /&gt;The first is that many connections executing in parallel means that the&lt;br /&gt;amount of data that the CPUs work on increases. This will decrease the&lt;br /&gt;CPU cache hit rates. Lowering the CPU cache hit rate can have a significant&lt;br /&gt;negative impact on server performance. Actually in some cases the amount&lt;br /&gt;of memory allocated by the connections executing in parallel could at times&lt;br /&gt;even supersede the memory available in the server. In this case we enter a&lt;br /&gt;state called swapping which is very detrimental to performance.&lt;br /&gt;&lt;br /&gt;The second problem is that the number of parallel queries and transactions&lt;br /&gt;can have a negative impact on the throughput through the "critical sections"&lt;br /&gt;of the MySQL Server (critical section is where mutexes are applied to&lt;br /&gt;ensure only one CPU changes a certain data structure at a time, when such&lt;br /&gt;a critical section becomes a scalability problem we call it a hot spot).&lt;br /&gt;Statements that writes are more affected since they use more critical&lt;br /&gt;sections.&lt;br /&gt;&lt;br /&gt;Neither of those problems can be solved in the operating system scheduler.&lt;br /&gt;However there are some operating systems that have attempted solving this&lt;br /&gt;problem for generic applications on a higher level in the operating system.&lt;br /&gt;&lt;br /&gt;Both of those problems have the impact that performance suffers more and&lt;br /&gt;more as the number of statements executed in parallel increases.&lt;br /&gt;&lt;br /&gt;In addition there are hot spots where the mutex is held for a longer time&lt;br /&gt;when many concurrent statements and/or transactions are executed in&lt;br /&gt;parallel. One such example is the transaction list in InnoDB where each&lt;br /&gt;transaction is listed in a linked list. Thus when the number of concurrent&lt;br /&gt;transactions increases the time to scan the list increases and the time&lt;br /&gt;holding the lock increases and thus the hot spot becomes even hotter&lt;br /&gt;as the concurrency increases.&lt;br /&gt;&lt;br /&gt;Current solutions to these issues exist in InnoDB through use of the&lt;br /&gt;configuration parameter --innodb-thread-concurrency. When this parameter&lt;br /&gt;is set to a nonzero value, this indicates how many threads are&lt;br /&gt;able to run through InnoDB code concurrently. This solution have its&lt;br /&gt;use cases where it works well. It does however have the drawback that&lt;br /&gt;the solution itself contains a hot spot that limits the MySQL server&lt;br /&gt;scalability. It does also not contain any solution to limiting the&lt;br /&gt;number of concurrent transactions.&lt;br /&gt;&lt;br /&gt;In a previous alpha version of the MySQL Server (MySQL 6.0) a thread&lt;br /&gt;pool was developed. This thread pool solved the problem with limiting&lt;br /&gt;the number of concurrent threads executing. It did nothing to solve&lt;br /&gt;the problem with limiting the number of concurrent transactions.&lt;br /&gt;It was also a scalability bottleneck in itself. Finally it didn't&lt;br /&gt;solve all issues regarding long queries and blocked queries.&lt;br /&gt;This made it possible for the MySQL Server to become completely&lt;br /&gt;blocked.&lt;br /&gt;&lt;br /&gt;When developing the thread pool extension now available in the MySQL&lt;br /&gt;Enterprise Edition we decided to start from a clean plate with the&lt;br /&gt;following requirements:&lt;br /&gt;&lt;br /&gt;1) Limit the number of concurrently executing statements to ensure&lt;br /&gt;that each statement execution has sufficient CPU and memory resources&lt;br /&gt;to fulfill its task.&lt;br /&gt;&lt;br /&gt;2) Split threads and connection into thread groups that are&lt;br /&gt;independently managed. This is to ensure that the thread pool&lt;br /&gt;plugin itself doesn't become a scalability bottleneck. The&lt;br /&gt;aim is that each thread group has one or zero active threads&lt;br /&gt;at any point in time.&lt;br /&gt;&lt;br /&gt;3) Limit the number of concurrently executing transactions&lt;br /&gt;through prioritizing queued connections dependent on if&lt;br /&gt;they have started a transaction or not.&lt;br /&gt;&lt;br /&gt;4) Avoid deadlocks when a statement execution becomes long or&lt;br /&gt;when the statement is blocked for some reason for an extended&lt;br /&gt;time.&lt;br /&gt;&lt;br /&gt;If you are interested in knowing more details of how the new&lt;br /&gt;thread pool solves these requirements there will be a&lt;br /&gt;webinar on Thursday 20 Oct 2011 at 9.00 PDT. Check &lt;a href="http://www.mysql.com/news-and-events/web-seminars/display-666.html"&gt;here&lt;/a&gt;&lt;br /&gt;for details on how to access it.&lt;br /&gt;&lt;br /&gt;If you want to try out the thread pool go &lt;a href="https://edelivery.oracle.com/"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8276542742455467792?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8276542742455467792/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8276542742455467792' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8276542742455467792'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8276542742455467792'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/10/mysql-thread-pool-problem-definition.html' title='MySQL Thread Pool: Problem Definition'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3156425594840884941</id><published>2011-05-26T11:39:00.003+02:00</published><updated>2011-10-21T18:44:00.649+02:00</updated><title type='text'>Better than linear scaling is possible</title><content type='html'>As part of my research for my Ph.D. thesis, I spent a lot of time&lt;br /&gt;understanding the impact of CPU caches on the performance of a DBMS.&lt;br /&gt;I concluded that in a parallel data server it is actually possible&lt;br /&gt;to get better than linear scaling in certain workloads.&lt;br /&gt;&lt;br /&gt;When executing a benchmark with 2 machines consisting of 8 cores where&lt;br /&gt;those 8 cores share a 2 MByte cache has a total of 4 MByte CPU cache.&lt;br /&gt;Assuming that the benchmark executes with a data set of 2 GByte, then&lt;br /&gt;0.1% of the data fits in the CPU cache. As the number of machines grow,&lt;br /&gt;the available CPU caches also grows, this means that when we have&lt;br /&gt;32 machines, we have 64MByte of cache available. This means that we can&lt;br /&gt;now store 1.6% of the data set in the CPU cache.&lt;br /&gt;&lt;br /&gt;For benchmarks one mostly tries to scale the data set size when increasing&lt;br /&gt;the number of nodes in the system. This is however not necessarily true in&lt;br /&gt;real-life applications. For real-life applications the working set is&lt;br /&gt;constant, the working set can grow in time as more customers join the service&lt;br /&gt;or for other reasons. But the working set of a real life application doesn't&lt;br /&gt;grow when you grow the number of machines in the database cluster.&lt;br /&gt;&lt;br /&gt;It's very well known that there are many things that drives sublinear scaling,&lt;br /&gt;the most important of those is the extra cost of communication in a larger&lt;br /&gt;cluster. The number of communication lanes in a fully connected cluster is&lt;br /&gt;n * (n - 1) / 2. This means that the number of communication lanes grow by&lt;br /&gt;the square of the number of machines, O(n^2). The communication only&lt;br /&gt;increase linearly in number of machines which means that each lane gets&lt;br /&gt;linearly less bytes to communicate in a larger cluster. Given that&lt;br /&gt;communication cost is fixed_cost + #bytes * cost_per_byte, this means&lt;br /&gt;that the cost per byte sent will increase in a larger cluster since there&lt;br /&gt;will be smaller packets and thus fewer bytes to pay for the fixed cost.&lt;br /&gt;&lt;br /&gt;The above is one reason why sharding is a good idea, this means that we&lt;br /&gt;partition the problem, thus we only use a subset of the communication lanes&lt;br /&gt;and thus we avoid the increased cost of communication as the number of&lt;br /&gt;machines grows. Obviously sharding also imposes limitations to the type of&lt;br /&gt;queries you can handle efficiently.&lt;br /&gt;&lt;br /&gt;Now to some specific facts about MySQL Cluster and why we can obtain&lt;br /&gt;bettter than linear scaling here (reported in earlier blogs &lt;a href="http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-doing-682m-reads-per.html"&gt;here&lt;/a&gt; and &lt;a href="http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-running-246m-updates-per.html"&gt;here&lt;/a&gt;).&lt;br /&gt;For reads here we got 1.13M on 8 nodes, 2.13M on 16 nodes and 4.33M&lt;br /&gt;reads on 32 nodes. For updates we got 687k on 4 nodes, 987k on 8 nodes&lt;br /&gt;and finally 2.46M on 16 nodes. All the data in this benchmark was also&lt;br /&gt;replicated.&lt;br /&gt;&lt;br /&gt;The data nodes in MySQL Cluster use an architecture where we have up to&lt;br /&gt;4 threads that handle the local database handling. These 4 threads handle&lt;br /&gt;their own partitions. Next we have one thread that handles the transaction&lt;br /&gt;coordinator role, we also have one thread that takes care of the receive&lt;br /&gt;part of the communication. Finally we have a set of threads taking care&lt;br /&gt;of file system communication. What this effectively means is that as we&lt;br /&gt;grow the cluster size and the cost of communication grows, each data node&lt;br /&gt;will consume more CPU power, however the architecture of MySQL Cluster&lt;br /&gt;is done in such a way that this extra CPU power is spent in its own&lt;br /&gt;CPU cores. Thus we simply use a bit more of the CPU cores for&lt;br /&gt;communication when the cluster size grows.&lt;br /&gt;&lt;br /&gt;The benefit of this approach is that it is easy to scale the number&lt;br /&gt;of CPU cores used for communication. Given that modern machines often&lt;br /&gt;comes with quite high number of CPU cores, this means that as machines&lt;br /&gt;gets beefier, we can actually deliver better than linear scaling of&lt;br /&gt;the workload one can achieve by growing the number of data nodes in&lt;br /&gt;MySQL Cluster.&lt;br /&gt;&lt;br /&gt;In MySQL Cluster each execution thread has its own scheduler.&lt;br /&gt;This scheduler becomes more and more efficient as load grows for&lt;br /&gt;two reasons. The first is that as the load grows, the queue is&lt;br /&gt;longer and thus we need to refill the queue fewer times,&lt;br /&gt;this means that we spend more time executing the same code over&lt;br /&gt;and over again. This means that the instruction cache for that&lt;br /&gt;code will be very hot and we will train the branch predictor&lt;br /&gt;subsystem in the CPUs very well. This benefit we get both in&lt;br /&gt;the code refilling the queue and the code to execute the actual&lt;br /&gt;database workload. Given that the load is high we also avoid&lt;br /&gt;running code that checks for messages and there is no messages&lt;br /&gt;around. Thus as load increases the efficiency increases and&lt;br /&gt;the actual number of instructions to execute per message also&lt;br /&gt;decreases.&lt;br /&gt;&lt;br /&gt;So when I presented this theory at the presentation of my&lt;br /&gt;Ph.D. thesis this was only a theory. In the real world it's&lt;br /&gt;very uncommon to see the effect of CPU caches and other effects&lt;br /&gt;being greater than the added burden of a larger cluster. However&lt;br /&gt;I have seen it twice in my career. The first was a benchmark&lt;br /&gt;performed in 2002 on a very large computer where we hosted 32 nodes&lt;br /&gt;(single CPU nodes in those days) and 23 benchmark applications.&lt;br /&gt;Here we scaled from 0.5 million to 1.5 million going from&lt;br /&gt;16 to 32 nodes. Now also in the results presented at the&lt;br /&gt;MySQL Users conference and in my previous blogs we achieved better&lt;br /&gt;than linear scaling in particular for the write benchmark, but also&lt;br /&gt;to some extent for read benchmarks. I am sure the above isn't the&lt;br /&gt;entire explanation of these effects, but the effects of the things&lt;br /&gt;explained above certainly plays a role in it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3156425594840884941?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3156425594840884941/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3156425594840884941' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3156425594840884941'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3156425594840884941'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/05/better-than-linear-scaling-is-possible.html' title='Better than linear scaling is possible'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7142137582548170235</id><published>2011-04-12T13:17:00.000+02:00</published><updated>2011-04-12T13:17:18.121+02:00</updated><title type='text'>MySQL Cluster: Designed for high-scale SQL and NoSQL web applications</title><content type='html'>As shown in a number of blogs, the MySQL Cluster SW already uses the type of features found in many NoSQL products. It has an extremely efficient API through which it is possible to shoot millions of reads and writes towards a Cluster per second. It contains partitions of its data similar to shards in NoSQL and supports both high availability of those partitions and also repartitioning of the data when new nodes are added to the Cluster. Advanced replication solutions both providing replication inside a Cluster and between Clusters makes it possible to use MySQL Cluster in an a very large number of replication configurations, even scaling across multiple global data centers.&lt;br /&gt;&lt;br /&gt;Finally MySQL Cluster makes it possible for you to choose to stay with your data in relational tables while still using NoSQL-like APIs, supporting on-line changes of partitioning and also adding new fields to tables while still reading and writing data in the tables. Using MySQL Cluster you can use MySQL APIs, the NDB API, Cluster/J, JPA, LDAP API and even more APIs are worked on and will soon be announced.&lt;br /&gt;&lt;br /&gt;Most web data requires heavy use for generation of web pages where the use is mostly simple queries, but very many of them. Most of the web data also requires analysis to make intelligent business decisions based on the web generated data. A prototype of parallel query for MySQL Cluster was displayed at the MySQL Users Conference 2010. Tools such as this will also make it possible to analyse data efficiently in MySQL Cluster. Thus MySQL Cluster is a very efficient tool for working with many sorts of web data while retaining ACID compliance and a rich set of tools, expertise and best practices.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7142137582548170235?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7142137582548170235/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7142137582548170235' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7142137582548170235'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7142137582548170235'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-designed-for-high-scale.html' title='MySQL Cluster: Designed for high-scale SQL and NoSQL web applications'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1182464440877638164</id><published>2011-04-12T13:15:00.002+02:00</published><updated>2011-04-12T13:15:50.004+02:00</updated><title type='text'>MySQL Cluster running 2.46M updates per second!</title><content type='html'>In a previous blog post we showed how MySQL Cluster achieved 6.82M reads per second. This is a high number. However what is also very interesting to see is how efficient MySQL Cluster is at executing updating transactions as well. We were able to push through the 1M transactions per second wall and even past the 2M transactions per second and all the way up to 2.46M transactions per second.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1182464440877638164?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1182464440877638164/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1182464440877638164' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1182464440877638164'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1182464440877638164'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-running-246m-updates-per.html' title='MySQL Cluster running 2.46M updates per second!'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2110591113257383823</id><published>2011-04-11T21:18:00.000+02:00</published><updated>2011-04-11T21:18:17.173+02:00</updated><title type='text'>MySQL Cluster doing 6.82M reads per second</title><content type='html'>We ran a number of tests to see how many reads per second we could get from MySQL Cluster. We used a modified version of flexAsynch (as shown in previous blog), where each record read was 100 bytes in size.&lt;br /&gt;&lt;br /&gt;With a cluster of 4 data nodes operating on 2 machines we were able to process 1.15M reads per second. On a cluster consisting of 8 data nodes executing on 4 machines we were able to process 2.13M reads per second. On a 16-data node cluster with 8 machines used for data nodes, we were able to process 4.33M reads per second and finally a cluster with 32 data nodes distributed on 16 machines we executed 6.82M reads per second. The tests were run on MySQL Cluster 7.1, we're confident that similar numbers can be achieved with MySQL Cluster 7.0 and also with the new beta version MySQL Cluster 7.2.&lt;br /&gt;&lt;br /&gt;This benchmark will give you a good idea what can be achieved with direct usage of the NDB API, and using other APIs like Cluster/J, mod-ndb, NDB-memcached.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2110591113257383823?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2110591113257383823/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2110591113257383823' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2110591113257383823'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2110591113257383823'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-doing-682m-reads-per.html' title='MySQL Cluster doing 6.82M reads per second'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8894856453543115655</id><published>2011-04-11T14:59:00.000+02:00</published><updated>2011-04-11T14:59:42.631+02:00</updated><title type='text'>MySQL Cluster Benchmark</title><content type='html'>We had the opportunity to use a fair amount of machines to run a benchmark to see what throughput MySQL Cluster can achieve on a bit bigger clusters. The benchmark we use is a benchmark we developed for internal testing many years ago and shows very well the performance aspects of MySQL Cluster as discussed in some previous blogs of mine.&lt;br /&gt;&lt;br /&gt;The benchmark is called flexAsynch, it's part of an internal series of benchmark we call the flex-series of benchmarks. It's first member was flexBench, this benchmark consisted of the following simple set of operations. First create the table with the set of attributes and the size of the attributes as specified by the startup options. Next step is to create a set of threads as specified by the startup options. Next step is that each thread will execute a number of transactions, the number which is configurable and each transaction can also run one or more operations as configured (one operation is either an insert of one record, update of one record, read of one record or delete of one record). The flexBench benchmark always starts by doing a set of inserts, then reading those, updating each record, reading it again and finally deleting all records. The flexBench benchmark also consisted of a verify phase such that we could also verify that the cluster actually read and updated the records as they should.&lt;br /&gt;&lt;br /&gt;The flexAsynch benchmark was a further development of this benchmark, flexBench uses the synchronous NDB API, where each transaction is sent and executed per thread. This means that we can have as many outstanding transactions to the cluster as we have threads. flexAsynch uses the asynchronous NDB API, this API provides the possibility to define multiple transactions and send and execute those all at once. This means that we can get a tremendous parallelism in the application using this API. The manner in which MySQL Cluster is designed, it is actually no more expensive to update 10 records in 10 different transactions compared to updating 10 records in 1 transaction using this API. Jonas Oreland showed in his &lt;a href="http://jonasoreland.blogspot.com/2008/11/950k-reads-per-second-on-1-datanode.html"&gt;blog post&lt;/a&gt; how one API process using this API can handle 1 million operations per second.&lt;br /&gt;&lt;br /&gt;The main limitation to how many operations can be executed per second is the processing in the data nodes of MySQL Cluster for this benchmark. Thus we wanted to see how well the cluster scales for this benchmark as we add more and more data nodes.&lt;br /&gt;&lt;br /&gt;A data node in MySQL Cluster operates best when threads are locked to CPUs as shown in a previous &lt;a href="http://mikaelronstrom.blogspot.com/2010/09/how-to-speed-up-sysbench-on-mysql.html"&gt;blog&lt;/a&gt; of mine. Currently the main threads that operates in a data nodes is the thread handling local database operations (there are up to four of those threads), the thread doing the transaction synchronisation and finally the thread handling receive of messages on sockets connected to other data nodes or API nodes. Thus to achieve best operation one needs at least 6 CPUs to execute a data node. Personally I often configure 8 CPUs to allow for the other threads to perform their action without inhibiting query performance. Other threads are handling replication, file system interaction and cluster control.&lt;br /&gt;&lt;br /&gt;To our disposal when running this benchmark we had access machines with dual Intel Xeon 5670 @2.93 GHz. This means 12 CPUs per socket. One thing to consider when running a benchmark like this is that the networking is an important part of the infrastructure. We had access to an Infiniband network here and used IP-over-Infiniband as communication media. It's most likely even better to use the Sockets Direct Protocol (SDP) but we had limited time to set things up and the bandwidth of IPoIB was quite sufficient. This made it possible to have more than one data node per machine.&lt;br /&gt;&lt;br /&gt;In order to run flexAsynch on bigger clusters we also needed to handle multiple instances of flexAsynch running in parallel. In order to handle this I changed flexAsynch a little bit to enable one process to only create a table or only delete a table. I also made it possible to run the flexAsynch doing only inserts, only reads or only updates. To make it easier to get proper numbers I used a set of timers for read and update benchmarks. The first timer specified the warmup time, thus operations were executed but not counted since we're still in the phase where multiple APIs are starting up. The next timer specifies the actual time to execute the benchmark and finally a third timer specifies the cooldown time where again transactions are run but nor counted since not all APIs start and stop at exactly the same time. Using this manner we will get accurate numbers of read and update operations. For inserts we don't use timers and thus the insert numbers are less accurate.&lt;br /&gt;&lt;br /&gt;The results of those benchmarks will be posted in blogs soon coming out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8894856453543115655?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8894856453543115655/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8894856453543115655' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8894856453543115655'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8894856453543115655'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-benchmark.html' title='MySQL Cluster Benchmark'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-4411878861783096492</id><published>2011-04-11T13:56:00.000+02:00</published><updated>2011-04-11T13:56:38.293+02:00</updated><title type='text'>MySQL Cluster and Sharding</title><content type='html'>Sharding is here defined as the ability to partition the data into partitions defined by a condition on a set of fields. This ability is central to the workings of MySQL Cluster. Within a Cluster we automatically partition the tables into fragments (shards in the internet world). By default there is a fixed amount of fragments per node. As mentioned we also use replication inside a Cluster, the replication happens per fragment. We define the number of replicas we want in the Cluster and then the MySQL Cluster SW maintains this number of fragment replicas per fragment. These fragment replicas are all kept in synch. Thus for MySQL Cluster the sharding is automatic and happens inside the Cluster even using commodity hardware.&lt;br /&gt;&lt;br /&gt;One of the defining features of MySQL Cluster is to keep the fragments up and running at all times and that they are restored after a Cluster crash. However MySQL Cluster also supports adding nodes to the Cluster while it is operational, this means that we can add nodes on a running Cluster and repartition the tables during normal operation. This is part of the normal MySQL Cluster and is used in operation by many users and customers to increase the size of the Clusters in production clusters.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-4411878861783096492?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/4411878861783096492/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=4411878861783096492' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4411878861783096492'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4411878861783096492'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-and-sharding.html' title='MySQL Cluster and Sharding'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6777009858607190633</id><published>2011-04-11T09:26:00.000+02:00</published><updated>2011-04-11T09:26:52.986+02:00</updated><title type='text'>MySQL Cluster API, the NDB API</title><content type='html'>As mentioned in a previous blog the programming API on the client side is a very important part of the performance of MySQL Cluster. Every API that is used to access the Data Server in MySQL Cluster uses the NDB API. The NDB API is used in the NDB storage handler to make it possible to access data from MySQL APIs which is residing in MySQL Cluster.&lt;br /&gt;&lt;br /&gt;The base of the good performance of the programming API is the ability to batch operations in various manners. In early MySQL Cluster history the MySQL Storage Engine API had very few interfaces that allowed for handling multiple records at a time. As we progressed, the Storage Engine API have added several APIs that can handle multiple records at a time. There is even some development work which has been presented at the UC 2010 where the Storage Engine API now can push entire queries down to the storage engine, even join queries. This has also been presented at a recent &lt;a href="http://www.mysql.com/news-and-events/on-demand-webinars/display-od-583.html"&gt;webinar&lt;/a&gt; with engineers.&lt;br /&gt;&lt;br /&gt;The NDB API uses a model where one first defines the operation to issue towards the database. The calls to build an operation doesn't interact with the actual database. The actual message is sent to the data node only after the execute method have been called. The NDB API is designed to handle batching of operations in two levels. The first level is that it is possible to batch inside one thread. This means that one can open several transactions in parallel within the same thread and execute them in parallel with one execute call. In addition it is also possible to have several threads working in parallel and it is possible for every one of those threads to also be executing multiple transactions in parallel.&lt;br /&gt;&lt;br /&gt;So the possibilities for parallelism using the NDB API is tremendous. Much of the cost of accessing a database is paid in the networking, so by using the parallel transactions inside a thread (called Asynchronous NDB API) and by using the multithreaded capabilities of the NDB API, it is possible to decrease the networking cost greatly by making TCP/IP packets larger. Mostly the cost of sending a TCP/IP packet is Fixed_cost + #Bytes * Byte_cost. The fixed cost was in the past about the same cost as sending 60 bytes. This extra cost of small messages have to be paid both in the server part and in the client part. Thus it pays off very well to send larger messages. When the message sizes goes towards 1 kByte, the extra cost is down in the range of 6-7% extra cost compared to infinite-sized messages whereas a 200 byte message have an additional 30% added cost.&lt;br /&gt;&lt;br /&gt;An additional benefit of batching is that there will be less context switches since handling of several messages in parallel can be handled without context switches.&lt;br /&gt;&lt;br /&gt;You can learn more about performance optimization of your own applications by reading this &lt;a href="http://www.mysql.com/why-mysql/white-papers/mysql_wp_cluster_perfomance.php"&gt;whitepaper&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6777009858607190633?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6777009858607190633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6777009858607190633' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6777009858607190633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6777009858607190633'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-api-ndb-api.html' title='MySQL Cluster API, the NDB API'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1531370462422583014</id><published>2011-04-09T08:51:00.000+02:00</published><updated>2011-04-09T08:51:39.760+02:00</updated><title type='text'>MySQL Cluster performance aspects</title><content type='html'>MySQL Cluster was designed for high performance from the very beginning. To achieve high performance one has to understand many aspects of computing. As an example the protocol is very important. In the original design work in 1994 we had a master thesis student build a prototype using a protocol which was based on BER encoding and other standard parts of many telecom protocols. After seeing the code in this prototype which was several thousands of lines of code just to handle the protocol, I realised that this type of protocol will simply cost too much on both the client side as well as on the server side. So this type of prototypes in early design work is extremely useful since it would have been very difficult to change this protocol once we started down the path of developing the Data Server.&lt;br /&gt;&lt;br /&gt;Based on this work we instead opted for a protocol where almost everything in the protocol was of fixed size and entirely based on sending 32-bit words. We didn't want a protocol which transferred bytes to avoid the extra computational complexity this would require. So the NDB protocol which is used for query processing uses a message called TCKEYREQ, this message has about 10 32-bit words describing various fixed parameters such as TableId, ConnectionId, PartitionId and so forth. There is also a 32-bit word that contains a set of bits that is used to interpret the message. Actually reading this protocol can be done, completely avoiding branches since the bits can be used to address the proper words in the protocol message through some arithmetic. The only branching needed happens in taking care of keys and the actual query information which is of variable size.&lt;br /&gt;&lt;br /&gt;The next important component of performance is the execution model. The MySQL Cluster Data nodes uses an execution model which is extremely well suited for modern CPUs. The Data nodes uses a set of threads, where each thread implements its own little OS with a scheduler. All communication inside the data nodes is based on messages. From a SW point of view the code to receive internal messages is exactly the same as the handling of messages arriving over the network. When sending a message it's the address which defines the type of message. The address contains three parts, the node id, the thread id and the module id (block number in the code). If the message is sent to a module with the same node id and thread id as the sending thread, then the message is an internal message and it will be sent by putting the message in the local message buffer, if the node id is the same but the thread id differs, then the message will be sent to another thread. The communication between threads is extremely efficient based on shared memory communication and this code is using the most efficient ways to communicate based on the HW and the OS. Finally when the node id differs, the message is sent as a network packet over to another data node or client node. There is a TCP/IP link between all nodes (fully connected mesh) and the data node will use mechanisms to ensure that the packets sent contains as many messages as possible without sacrificing latency (the user can affect the acceptable latency through a config parameter).&lt;br /&gt;&lt;br /&gt;Given this model it means that a thread can be actively executing thousands of queries without any need of doing any context switches. This is one reason why MySQL Cluster benefits greatly when threads are locked to certain CPU cores and there is no contention from other programs to use these CPU cores. The data nodes have their own local OS and thus work extremely efficiently when the OS scheduler stays out of the way.&lt;br /&gt;&lt;br /&gt;This particular model of executing where each thread of execution executes until it decides to send a message (the unit of execution is always execution of a message) was very popular in the 70s because of its efficiency. It was replaced by the time-sharing model given the simplicity of the time-sharing model. When designing MySQL Cluster we decided that a Data Server to handle millions of queries per second has more requirements on the efficiency of execution compared to the requirements of the simplicity of the design. Another great benefit of this execution model is that as the load on the Data Server increases, the throughput also grows. This is so since the execution threads will execute for longer time before they will look at the sockets for incoming traffic, this means that more messages will be gathered every time and thus the cost of each message byte decreases, the same happens with sending messages that as the number of messages to execute per round grows, the more data will be sent on each send call and thus decreasing the cost of each sent message byte.&lt;br /&gt;&lt;br /&gt;The design is extremely modular even though its using a more complex execution model. Each module can only communicate with other modules using messages and the modules share no data. Thus if an error occurs in a module it's either due to bugs in this model or due to bad input data to the module. To debug the data node we trace every important branch, every message executed with it's data. This means that if a crash occurs we have very detailed information about how the crash occurred including the last thousand or so branches taken in the code and a few thousand of the last messages executed in the data node.&lt;br /&gt;&lt;br /&gt;The final aspect of performance is the actual implementation of the database algorithms. To cover this in one blog message is obviously not possible but it covers an efficient design of data structures (we implement a hash based index and an ordered index), efficient implementation of the actual record storage with an efficient data structure to contain the record (includes capabilities to handle variable sized data and handling NULLable fields in a storage efficient manner and even being able to add fields to a record by usage of dynamic fields which are NULL when not present in the record). It includes an efficient model for recovery and finally an efficient model for transaction handling. In all of those aspects MySQL Cluster have added additional innovation to the world of databases with a particular focus on the performance aspects.&lt;br /&gt;&lt;br /&gt;There is actually one more important part of the performance of MySQL Cluster and this is the programming API on the client side. I will discuss this in my next blog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1531370462422583014?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1531370462422583014/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1531370462422583014' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1531370462422583014'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1531370462422583014'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-performance-aspects.html' title='MySQL Cluster performance aspects'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8977325221386009767</id><published>2011-04-08T21:09:00.000+02:00</published><updated>2011-04-08T21:09:24.382+02:00</updated><title type='text'>MySQL Cluster - NoSQL access with some SQL</title><content type='html'>As someone noted in a blog, the NDB API is a NoSQL API that was designed 15 years ago. When I wrote my Ph.D thesis (which is the design document that NDB Cluster is based on) I called it Design and Modelling of a Parallel Data Server for Telecom Applications. The important name I used here is Data Server. It was never intended as a pure SQL DBMS. It was always intended for any needs of Data Storage. The requirements on this Data Server was also written up in my &lt;a href="http://www.google.com/url?sa=t&amp;source=web&amp;cd=2&amp;ved=0CBoQFjAB&amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.48.884%26rep%3Drep1%26type%3Dpdf&amp;rct=j&amp;q=mikael%20ronstr%C3%B6m%20thesis&amp;ei=SFyfTZ-cIMb5sgbg7enrAQ&amp;usg=AFQjCNH30_pH8gUD_fx-Ix0ctmxqJhekOg&amp;cad=rja"&gt;thesis&lt;/a&gt; for those who care to read it and included HLR's (the telecom database used to keep track of your mobile phone), News-on-Demand, Multimedia Email, Event Data Services (logging of events in the telco and web applications used for charging, billing and understanding the customers) and a genealogy application.&lt;br /&gt;&lt;br /&gt;MySQL Cluster have been very successful in the telecom space and chances are very high that a MySQL Cluster solution is used whenever you place a mobile phone call. Also many ISPs use MySQL Cluster to handle DNS lookups, authentication and many other internet services. As an example here the ISP I use every day and through which I post this blog message is using MySQL Cluster for this type of service. So I invoke services of the MySQL Cluster every time I access the web from my home. In addition, we have seen MySQL Cluster adopted into eCommerce, session management, content delivery, user profile management and on-line gaming applications.&lt;br /&gt;&lt;br /&gt;MySQL Cluster was from the very start designed to handle many other applications as well in the web space. Today the internet environment contains quite a few different APIs to use for handling web data. MySQL Cluster already now have a plethora of different APIs that can be used to access the basic Data Server. MySQL Cluster can be used with every possible API that can be used to access a MySQL Server. In addition we have the Cluster/J API which is a low-level Java API with similar characteristics to the NDB API. Based on the Cluster/J API we have a standard JPA interface to MySQL Cluster. We even have an LDAP interface which means that the same data can be accessed through LDAP, SQL, Cluster/J, JPA, NDB API and many other interfaces based upon these of which I am sure I don't know every one. Another interesting interface is mod-ndb which makes it possible to query MySQL Cluster using a REST API and get results in JSON.&lt;br /&gt;&lt;br /&gt;We are however not satisfied with the set of APIs we have towards MySQL Cluster so we'll be adding even more as we go to make the Data Server capabilities available to you from even more surroundings, these will be including additional APIs commonly used in the web space. Stay tuned for Tomas Ulin's keynote at the UC and Collaborate next week.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8977325221386009767?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8977325221386009767/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8977325221386009767' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8977325221386009767'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8977325221386009767'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2011/04/mysql-cluster-nosql-access-with-some.html' title='MySQL Cluster - NoSQL access with some SQL'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7185798642766184320</id><published>2010-12-21T13:42:00.000+01:00</published><updated>2010-12-21T13:42:06.220+01:00</updated><title type='text'>MySQL Server and NUMA architectures</title><content type='html'>When you run MySQL on a large NUMA box it is possible to control the memory placement and the use of CPUs through the use of numactl. Most modern servers are NUMA boxes nowadays.&lt;br /&gt;&lt;br /&gt;numactl works with a concept called NUMA nodes. One NUMA node contains CPUs and memory where all the CPUs can access the memory in this NUMA node at an equal delay. However to access memory in a different NUMA node will typically be slower and it can often be 50% or even 100% slower to access memory in a different NUMA node compared to the local NUMA node. One NUMA node is typically one chip with a memory bus shared by all CPU cores in the chip. There can be multiple chips in one socket.&lt;br /&gt;&lt;br /&gt;With numactl the default option is to allocate memory from the NUMA node the CPU currently running on is connected to. There is also an option to interleave memory allocation on the different parts of the machine by using the interleave option.&lt;br /&gt;&lt;br /&gt;Memory allocation actually happens in two steps. The first step is the one that makes a call to malloc. This invokes a library linked with your application, this could be e.g. the libc library or a library containing tcmalloc or jemalloc or some other malloc implementation. The malloc implementation is very important for performance of the MySQL Server, but in most cases the malloc library doesn't control the placement of the allocated memory.&lt;br /&gt;&lt;br /&gt;The allocation of physical memory happens when the memory area is touched, either the first time or after the memory have been swapped out and a page fault happens. This is the time that we assign memory to the actual NUMA node it's going to be allocated on. To control how the Linux OS decides on this memory allocation one can use the numactl program.&lt;br /&gt;&lt;br /&gt;numactl provides options to decide on whether to use interleaved memory location or local memory. The problem with local memory can be easily seen if we consider that the first thing that happens in the MySQL Server is a recovery of the InnoDB and this recovery is single-threaded so will thus make a large piece of the memory in the buffer pool to be attached to the NUMA node where the recovery took place. Using interleaved allocation means that we get a better spread of the memory allocation.&lt;br /&gt;&lt;br /&gt;We can also use the interleave option to specify which NUMA nodes the memory should be chosen from. Thus the interleave option acts both as a way of binding the MySQL Server to NUMA nodes as well as interleaving memory allocation on all the NUMA nodes the server is bound to.&lt;br /&gt;&lt;br /&gt;numactl finally also provides an ability to bind the MySQL Server to specific CPUs in the computer. This can be either by locking to NUMA nodes, or by locking to individual CPU cores.&lt;br /&gt;&lt;br /&gt;So e.g. on a machine with 8 NUMA nodes one might start the MySQL Server like this:&lt;br /&gt;numactl --interleave=2-7 --cpunodebind=2-7 mysqld ....&lt;br /&gt;This will allow the benchmark program to use NUMA node 0 and 1 without interfering with the MySQL Server program. If we want to use the normal local memory allocation it should more or less be sufficient to remove the interleave option since we have bound the MySQL Server to NUMA node 2-7 there should be very slim risk that the memory is allocated from elsewhere. We could however also use&lt;br /&gt;--memnodebind=2-7 to ensure that the memory allocation happens in the desired NUMA nodes.&lt;br /&gt;&lt;br /&gt;So how effective is numactl compared to e.g. using taskset. From a benchmark performance point of view there is not much difference unless you get memory very unbalanced through a long recovery at the start of the MySQL Server. Given that taskset allows the server to be bound to certain CPU cores, it also means effectively that the memory is bound to the NUMA nodes of the CPUs the MySQL Server was bound to by taskset.&lt;br /&gt;&lt;br /&gt;However binding to a subset of the NUMA nodes or CPUs in the computer is definitely a good idea. On a large NUMA box one can gain at least 10% performance by locking to a subset of the machine compared to allowing the MySQL Server to freely use the entire machine.&lt;br /&gt;&lt;br /&gt;Binding the MySQL Server also improves the stability of the performance. Also binding to certain CPUs can be an instrument in ensuring that different appplications running on the same computer don't interfere with each other. Naturally this can also be done by using virtual machines.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7185798642766184320?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7185798642766184320/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7185798642766184320' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7185798642766184320'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7185798642766184320'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/12/mysql-server-and-numa-architectures.html' title='MySQL Server and NUMA architectures'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1296514883088467156</id><published>2010-11-30T12:52:00.000+01:00</published><updated>2010-11-30T12:52:37.572+01:00</updated><title type='text'>The King is dead, long live the King</title><content type='html'>In MySQL 5.5 we introduced a possibility to use alternative malloc implementations for MySQL. In Solaris we have found mtmalloc to be the optimal malloc implementation. For Linux we've previously found tcmalloc to be the optimal malloc implementation. However recently when working on a new MySQL feature I discovered a case where tcmalloc had a performance regression after running a very tough benchmark for about an hour. Actually I found a similar issue with the standard libc malloc implementation. So it seems that many malloc implementations gets into fragmentation issues when running for an extended period at very high load.&lt;br /&gt;&lt;br /&gt;So I decided to contact Mark Callaghan to see if he had seen similar issues. He hadn't, but he pointed me towards an alternative malloc implementation which is jemalloc. It turns out that jemalloc is the malloc implementation used in FreeBSD among other things. I found a downloadable tarball of jemalloc, downloaded it and installed it on my benchmark computers. Given that MySQL already supports any generic malloc implementation it was a simple matter of pointing LD_PRELOAD towards jemalloc instead of towards tcmalloc to make this experiment.&lt;br /&gt;&lt;br /&gt;The background is that tcmalloc gave about +5-10% better performance than libc malloc on Linux. Both libc malloc and tcmalloc have had performance regressions in certain situations. So the new results for jemalloc was very exciting. I got +15% compared to libc malloc and so far after using it for about a month I have found no performance regressions using jemalloc.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1296514883088467156?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1296514883088467156/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1296514883088467156' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1296514883088467156'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1296514883088467156'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/11/king-is-dead-long-live-king.html' title='The King is dead, long live the King'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1725362331784485988</id><published>2010-10-28T11:49:00.000+02:00</published><updated>2010-10-28T11:49:21.199+02:00</updated><title type='text'>Impact of changes in the run-time environment of MySQL 5.5</title><content type='html'>When experimenting with the Google patches a few years ago I found that tcmalloc had a fairly large impact on performance of the MySQL Server. So the question I asked myself was obviously whether the libc malloc have regained some of the lost territory (also had regressions of major drop in performance in certain libc versions). Using tcmalloc used to have a 5-10% positive impact on performance, the matter of the fact is that this gain remains. I lost 8-10% in performance on all thread counts tested (16, 32, 64, 128 and 256) by not using tcmalloc in running Sysbench RW.&lt;br /&gt;&lt;br /&gt;The experiments are performed on a fairly high-end x86 box with 4 sockets. I run the sysbench program on the same machine as the MySQL Server runs on. So this means that it's interesting to check whether I get better performance by locking the MySQL Server to 3 of the 4 sockets and let sysbench use its own socket compared to not control CPU usage at all.&lt;br /&gt;&lt;br /&gt;What I discovered is a mixed picture. Performance when locking to CPU's was much more stable although top performance was better without locking. Performance at 16 threads improved 3% and at 32 threads it improved 7%. But at higher thread counts the performance was better for the locked scenario, 10% at 64 threads and 4% at 256 threads. I used the Linux feature taskset to lock the MySQL Server and Sysbench to certain CPUs.&lt;br /&gt;&lt;br /&gt;So the conclusion is that locking to CPUs gives a more stable environment. When the number of threads increases the scheduler is allowed to use more CPUs than what is beneficial for MySQL execution. I've seen this also in other experiments that making sure that MySQL reuses the CPU caches as much as possible is very important for performance. Thus when MySQL competes with other programs on use of CPUs and there are many concurrent MySQL threads it's usually not beneficial to performance since the CPU caches will be too cold.&lt;br /&gt;&lt;br /&gt;Using Unix sockets instead of TCP/IP sockets is very beneficial for MySQL performance still. I haven't made any recent experiments in this area but the difference is definitely significant. I have also seen OS bottlenecks sometimes appear when using TCP/IP sockets. This is an area for further investigation which I have had on my TODO list for a while. It's also interesting to experiment with different communication mechanisms when the Sysbench program and MySQL runs on different computers. However this is for future testing.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1725362331784485988?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1725362331784485988/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1725362331784485988' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1725362331784485988'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1725362331784485988'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/10/impact-of-changes-in-run-time.html' title='Impact of changes in the run-time environment of MySQL 5.5'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8590072594284732321</id><published>2010-10-28T10:24:00.000+02:00</published><updated>2010-10-28T10:24:09.357+02:00</updated><title type='text'>Finding the optimum configuration of MySQL 5.5 running Sysbench</title><content type='html'>Sysbench is a commonly used benchmark tool to discover ways to improve MySQL performance. It is certainly not representative for every application, but it's still a useful tool for finding bottlenecks in the MySQL code.&lt;br /&gt;&lt;br /&gt;In MySQL 5.5 a great number of new scalability improvements have been developed. Some of these will always be active and some of them requires using new configuration parameters.&lt;br /&gt;&lt;br /&gt;In order to assist users of MySQL I am performing a fairly extensive benchmark series where I test the various configuration parameters that have an effect on running MySQL/InnoDB using Sysbench.&lt;br /&gt;&lt;br /&gt;The parameters can be categorized into:&lt;br /&gt;1) Changes of run-time environment&lt;br /&gt;2) Compile time parameters (including choice of compiler)&lt;br /&gt;3) Configuration parameters for MySQL&lt;br /&gt;4) Configuration parameters for InnoDB&lt;br /&gt;&lt;br /&gt;Finally there is also a set of parameters one can use to affect the execution of sysbench itself to have it behave differently. It's possible to have Sysbench use secondary index instead of primary key index, it's possible to let the table be partitioned and it's possible to run sysbench using multiple tables instead of only one (it's very syntetic to only have one table in a system which gets all the queries).&lt;br /&gt;&lt;br /&gt;The parameters that I have found to be important to consider for performance of MySQL 5.5 when running sysbench are:&lt;br /&gt;1.1) Use of tcmalloc&lt;br /&gt;1.2) Use of taskset (lock MySQL to certain CPUs)&lt;br /&gt;1.3) Affecting memory allocation by use of numactl&lt;br /&gt;1.4) Connecting to MySQL through socket or using TCP/IP&lt;br /&gt;2.1) Use of gcc compiler with platform-specific optimisations&lt;br /&gt;2.2) Use --with-fast-mutexes flag when compiling MySQL&lt;br /&gt;2.3) Choice of compiler (gcc, Open64, icc, SunStudio, ..)&lt;br /&gt;2.4) Using feedback compilation in compilation&lt;br /&gt;2.5) Compiling sysbench itself with platform specific optimisations&lt;br /&gt;3.1) Choice of transaction isolation (READ COMMITTED or REPEATABLE READ)&lt;br /&gt;3.2) Use of large pages through setting --large-pages=ON&lt;br /&gt;3.3) Compiling with Performance Schema activated&lt;br /&gt;3.4) Running with Performance Schema activated through --perfschema&lt;br /&gt;3.5) Deactivating Query Cache completely through --query_cache_size=0&lt;br /&gt;     and --query_cache_type=0&lt;br /&gt;4.1) Using new file format Barracuda or old format Antelope through &lt;br /&gt;     --innodb_file_format=x&lt;br /&gt;4.2) Setting --innodb_stats_on_metadata to ON/OFF&lt;br /&gt;4.3) Deactivating InnoDB doublewrite through --skip-innodb_doublewrite&lt;br /&gt;4.4) Setting innodb-change-buffering to none/insert/all and so forth&lt;br /&gt;4.5) Setting number of buffer pool instances using innodb-buffer-pool-instances=x&lt;br /&gt;4.6) Setting InnoDB log file size using innodb-log-file-size=x&lt;br /&gt;4.7) Setting InnoDB log buffer size using innodb-log-buffer-size=x&lt;br /&gt;4.8) Setting InnoDB flush method (default fsync, O_DIRECT, O_DSYNC) using&lt;br /&gt;     --innodb_flush_method=x&lt;br /&gt;4.9) Setting InnoDB to use file per table using --innodb_file_per_table &lt;br /&gt;4.10) Deactivating InnoDB adaptive hash index using&lt;br /&gt;     --skip-innodb-adaptive-hash-index&lt;br /&gt;4.11) Adapting read-ahead using --innodb_read_ahead=x&lt;br /&gt;4.12) Using InnoDB max purge lag through --innodb_max_purge_lag=x&lt;br /&gt;4.13) Using InnoDB purge thread through --innodb_purge_thread=1&lt;br /&gt;4.14) Changing behaviour of InnoDB spin loops by changing&lt;br /&gt;     --innodb_sync_spin_loops=x and --innodb_spin_wait_delay=x&lt;br /&gt;4.15) Changing InnoDB IO capacity by setting --innodb-io-capacity=x&lt;br /&gt;4.16) Setting InnoDB buffer pool size through --innodb-buffer-pool-size=x&lt;br /&gt;4.17) Setting InnoDB dirty pages percent through --innodb_dirty_pages_pct=x&lt;br /&gt;4.18) Setting InnoDB old blocks percentage through --innodb_old_blocks_pct=x&lt;br /&gt;4.19) Activating support for InnoDB XA through --innodb_support_xa=FALSE/TRUE&lt;br /&gt;4.20) Activating InnoDB thread concurrency using&lt;br /&gt;     --innodb-thread-concurrency=x&lt;br /&gt;4.21) Setting InnoDB commit mechanism through&lt;br /&gt;     --innodb-flush-log-at-trx-commit=x&lt;br /&gt;4.22) Setting number of read and write IO threads through&lt;br /&gt;     --innodb-read-io-threads=x and --innodb-write-io-threads=x&lt;br /&gt;&lt;br /&gt;This makes for a total of 36 parameters that can be tuned to affect sysbench performance. Most of these parameters have only a limited impact on performance but some of them can have a large impact and there is also a number of them where most reasonable values are ok, but if a bad value is used it can have a major impact on performance.&lt;br /&gt;&lt;br /&gt;In addition the problem of finding the optimal values is a multi-dimensional search. So changing one parameter might very well affect the impact of other parameters. So my approach will be to start with a reasonable baseline configuration, then try each parameter and see how it affects the outcome, come to a new baseline by changing the most important parameter. Next step is to restart tests varying all the parameters which have had a measurable impact on performance. Using this method we should be able to find an optimum to a reasonable degree. Finding the absolute optimum is probably more or less practically impossible.&lt;br /&gt;&lt;br /&gt;It's important here to understand that the MySQL defaults can sometimes have really bad values for performance. The reason is that MySQL defaults have been choosen mainly to make MySQL run on any HW platform and use very small resources. Remember that the normal use of MySQL isn't running sysbench on a 32-core server :) So it is important to consider all of these parameters to get the optimal performance of the MySQL Server.&lt;br /&gt;&lt;br /&gt;I plan to write about these variables and how they affect sysbench performance in a number of upcoming blogs. I might also do a similar run with MySQL using the DBT2 benchmark. So hopefully after finishing these test runs I have found a reasonable optimum configuration to run sysbench and probably also the configuration parameters that needs changing when running DBT2 instead.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8590072594284732321?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8590072594284732321/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8590072594284732321' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8590072594284732321'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8590072594284732321'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/10/finding-optimum-configuration-of-mysql.html' title='Finding the optimum configuration of MySQL 5.5 running Sysbench'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3244125666243319868</id><published>2010-09-22T14:23:00.000+02:00</published><updated>2010-09-22T14:23:49.036+02:00</updated><title type='text'>How to speed up Sysbench on MySQL Cluster by 14x</title><content type='html'>The time period up to the 2010 MySQL Users conference was as usual packed with hard work. The last two conferences have been very focused on getting scalability of the MySQL Server and InnoDB improved. This year had a nasty surprise after the conference in the form of ash cloud from an icelandic volcano. When you're tired from hard work and just want to go home and get some rest then the concept of staying at a hotel room with absolutely no idea of when one could return home is no fun at all. So when I finally returned home I was happy that summer was close by, vacation days were available (swedes have a lot of vacation :)).&lt;br /&gt;&lt;br /&gt;Now the summer is gone and I am rested up again, ready to take on new challenges and we've had some really interesting meet-ups in the MySQL team to discuss future developments. The renewed energy also is sufficient to write up some of the stories from the work I did during the summer :)&lt;br /&gt;&lt;br /&gt;During the summer I had the opportunity to also get some work done on scalability of the MySQL Cluster product as well. Given that I once was the founder of this product it was nice to return again and check where it stands in scalability terms.&lt;br /&gt;&lt;br /&gt;The objective was to compare MySQL Cluster to the Memory engine. The result of the exercise was almost obvious from the start. The memory engine having a table lock will have very limited scalability on any workload that contains writes. It will however have very good scalability on read-only workloads as this isn't limited by the table lock since readers don't contend each other. The Cluster engine should have good and fairly even results on read and write workloads.&lt;br /&gt;&lt;br /&gt;Much to my surprise the early results showed a completely different story. The Memory engine gave me a performance of about 1-2 tps to start with. The early results of MySQL Cluster was also very dismaying. I covered the Memory engine in a previous blog, so in this blog I will focus on the MySQL Cluster benchmarks.&lt;br /&gt;&lt;br /&gt;So the simple task of benchmarking as usual turned into some debugging of where the performance problems comes from.&lt;br /&gt;&lt;br /&gt;In the first experiment I used the default configuration of the Fedora Linux OS, I also used the default set-up of the MySQL Cluster storage engine. It turned out that there are huge possibilities in adapting those defaults.&lt;br /&gt;&lt;br /&gt;First the Fedora has a feature called cpuspeed. By default this feature is activated. The feature provides power management by scaling down the CPU frequency on an entire socket. The problem is that when you run the MySQL Server with few threads, it doesn't react to the workload and scales down frequency although there is a lot of work to do. So for the MySQL Server in general this means about half the throughput on up to 16 threads. However the impact on MySQL Cluster is even worse. The performance drops severely on all thread counts. Most likely this impact comes from the very short execution times of the NDB data nodes. It's possible that this small execution times is too short to even reach  the radar of the power management tools in Linux.&lt;br /&gt;&lt;br /&gt;So a simple sudo /etc/init.d/cpuspeed stop generated a major jump in performance of the MySQL Cluster in a simple Sysbench benchmark (all the benchmarks discussed here used 1 data node and all things running on one machine unless otherwise stated).&lt;br /&gt;&lt;br /&gt;The next simple step was to add the MySQL Cluster configuration parameter MaxNoOfExecutionThreads to the configuration scripts and set this to the maximum which is 8. This means that one thread will handle receive on the sockets, one thread will handle transaction coordination and four threads will handle local database handling. There will also be a couple of other threads which are of no importance to a benchmark.&lt;br /&gt;&lt;br /&gt;These two configuration together added about ~3.5x in increased performance. Off to a good start, but still performance isn't at all where I want it to be.&lt;br /&gt;&lt;br /&gt;In NDB there is a major scalability bottleneck in the mutex protecting all socket traffic. It's the NDB API's variant of the big kernel mutex. There is however one method of decreasing the impact of this mutex by turning the MySQL Server into several client nodes from an NDB perspective. This is done by adding the --ndb-cluster-connection-pool parameter when starting the MySQL Server. We achieved the best performance when setting this to 8, in a bigger cluster it would probably  make more sense to set it to 2 or 3 since this resolves most of the scalability issues without using up so much nodes in the NDB cluster.&lt;br /&gt;&lt;br /&gt;Changing this from 1 to 8 added another ~2x in performance. So now the performance is up a decent 8x from the first experiments. No wonder I was dismayed by the early results.&lt;br /&gt;&lt;br /&gt;However the story isn't done yet :)&lt;br /&gt;&lt;br /&gt;MySQL Cluster has another feature whereby the various threads can be locked to CPUs. By using this feature we can achieve two things, the first is that the NDB data nodes doesn't share CPUs with the MySQL Server. This has some obvious benefits from CPU cache point of view for both node types. We can also avoid that the data node threads are moved from CPU to CPU which is greatly advantegous in busy workloads. So we locked the data nodes to 6 cores. The configuration variable we used to achieve this is LockExecuteThreadToCpu which is set to a comma separated list of CPU ids.&lt;br /&gt;&lt;br /&gt;I also locked the MySQL Server and Sysbench to different set of CPUs using the taskset program available in Linux.&lt;br /&gt;&lt;br /&gt;Using this locking of NDB data node threads to CPUs achieved another 80% boost in performance. So the final result gave us a decent 14x performance improvement.&lt;br /&gt;&lt;br /&gt;So in summary things that matters greatly to performance of MySQL Cluster for Sysbench with a single data node.&lt;br /&gt;&lt;br /&gt;1) Ensure the Linux cpuspeed isn't activated&lt;br /&gt;&lt;br /&gt;2) Make sure to set MaxNoOfExecutionThreads to 8&lt;br /&gt;&lt;br /&gt;3) Make sure the --ndb-cluster-configuration-pool parameter to the MySQL Server using around 8 nodes per MySQL Server&lt;br /&gt;&lt;br /&gt;4) Lock NDB Data node threads to CPUs by using the LockExecuteThreadToCpu.&lt;br /&gt;&lt;br /&gt;5) Lock MySQL Server and Sysbench processes to different sets of CPUs from NDB Data nodes and each other.&lt;br /&gt;&lt;br /&gt;Doing this experiments also generated a lot of interesting ideas on how to improve things even further.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3244125666243319868?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3244125666243319868/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3244125666243319868' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3244125666243319868'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3244125666243319868'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/09/how-to-speed-up-sysbench-on-mysql.html' title='How to speed up Sysbench on MySQL Cluster by 14x'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3954458660848291123</id><published>2010-09-22T14:15:00.001+02:00</published><updated>2010-09-22T14:15:37.292+02:00</updated><title type='text'>How to get Sysbench on Memory engine to perform</title><content type='html'>I had the opportunity to test the Memory engine during the summer. What I expected to be a very simple exercise in running the Sysbench benchmark turned out to be a lot more difficult than I expected.&lt;br /&gt;&lt;br /&gt;My first experiments with the Memory engine gave very dismaying results. I got 0-2 TPS which is ridiculously bad. So I couldn't really think this was proper results, so I started searching for problems in my benchmarking environment. Eventually I started setting up a normal MySQL client session to the MySQL Server while the benchmark was running and issued some of the queries in the benchmark by hand and I was surprised to see some simple queries take seconds.&lt;br /&gt;&lt;br /&gt;EXPLAIN came to my rescue. EXPLAIN showed that the range scans in Sysbench was in fact turned into full table scans. Now this was surprising given that the primary key index in most engines is always ordered, so a range scan should normally be translated to a simple ordered index scan.&lt;br /&gt;&lt;br /&gt;What I found is that the Memory engine has a different default compared to the other storage engines. When we designed the NDB handler we decided that any SQL users expects to have an ordered index when they create an index on a table. Since primary key indexes are always hash-based, this meant that in NDB the default primary key index is actually two indexes, one primary hash index and one secondary ordered index on the same fields.&lt;br /&gt;&lt;br /&gt;Not so in the Memory engine. The memory engine also uses a hash-based index by default. So when you create a new index on a Memory engine and specify no type, the index wil become a hash index. Hash indexes are not very good at range queries :) so thus the surprise to me when benchmarking the Memory engine in Sysbench.&lt;br /&gt;&lt;br /&gt;Fixing this issue required some code changes in the sysbench test. It required the proposed fix from Mark Callaghan to add secondary indexes to sysbench. This was however not sufficient since this patch only added an index by adding KEY xid (id) and didn't specifically specify this index had to be an ordered index. So I changed this to KEY xid (id) USING BTREE and off the performance went.&lt;br /&gt;&lt;br /&gt;The Memory engine is still more or less a single threaded engine for any write workload like Sysbench RW which limits performance very much.&lt;br /&gt;&lt;br /&gt;However the Sysbench Readonly benchmark for the Memory engine was interesting since it used no limiting locks (concurrent readers don't contend with each other). I decided to see how much scalability was achievable for a storage engine without any concurrency issues internally.&lt;br /&gt;&lt;br /&gt;I found that performance of the Memory engine was limited to 8% more than the InnoDB engine in the same benchmark. So my interpretation of this result is that for readonly workloads, the main limiting factor on scalability is the MySQL Server scalability issues and not the InnoDB ones.&lt;br /&gt;&lt;br /&gt;So going forward when working on further improving the scalability of the MySQL Server parts there is a perfect benchmark that can be used to see how scalable the MySQL Server part is by using the Memory engine.&lt;br /&gt;&lt;br /&gt;I plan on also adding a feature to sysbench making it possible to use more than one Sysbench table to see how much added scalability we get when there are several tables involved in the query mix. Sysbench is a a very syntectic benchmark in this manner that it only uses one table.&lt;br /&gt;&lt;br /&gt;Only using one table provokes more bottlenecks than is found in most normal workloads, e.g. the LOCK_open in the MySQL Server, the meta data locks introduced in MySQL 5.5, the index mutex in InnoDB and even some unbalance to the usage of multiple buffer pools come from only using one table in the benchmark.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3954458660848291123?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3954458660848291123/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3954458660848291123' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3954458660848291123'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3954458660848291123'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/09/how-to-get-sysbench-on-memory-engine-to.html' title='How to get Sysbench on Memory engine to perform'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1598001367939401096</id><published>2010-09-18T20:43:00.000+02:00</published><updated>2010-09-18T20:43:22.526+02:00</updated><title type='text'>Multiple Buffer Pools in MySQL 5.5</title><content type='html'>In our work to improve MySQL scalability we tested many opportunities to scale the MySQL Server and the InnoDB storage engine. The InnoDB buffer pool was often one of the hottest contention points in the server. This is very natural since every access to a data page, UNDO page, index page uses the buffer pool and even more so when those pages are read from disk and written to disk.&lt;br /&gt;&lt;br /&gt;As usual there were two ways to split the buffer pool mutex, one way is to split it functionally into different mutexes protecting different parts. There have been experiments in particular on splitting out the buffer pool page hash table, the flush list. Other parts that have been broken out in experiments are the LRU list, the free list and other data structures internally in the buffer pool. Additionally it is as usual possible to split the buffer pool into multiple buffer pools. Interestingly one can also combine using multiple buffer pools with splitting the buffer pool mutex into smaller parts. The advantage of using multiple buffer pools is that it is very rare that it is necessary to grab multiple mutexes for the buffer pool operation which quickly becomes the case when splitting the buffer pool into multiple mutex protection areas.&lt;br /&gt;&lt;br /&gt;After working on scalability improvements in MySQL and InnoDB I noted that all the discussion was around how to split the buffer pool mutex and no dicsussion centered around how to make multiple buffer pools out of the buffer pool. I decided to investigate how difficult it would be to make this change. I quickly realised that it needed a thorugh walk through of the code. It required a code check that required checking about 150 methods and their interaction. This sounds like a very big task, but fortunately the InnoDB code is well structured and have fairly simple dependencies between its methods. After this walk through of the buffer pool code one quickly found that there were 3 different ways of getting hold of the buffer pool, one method was to calculate it using the space id and page id. This is the normal method in most methods used in the external buffer pool interface. However there were numerous occasions where we only had access to the block or page data structure and it would be a bit useless to recalculate the hash value in every method that needed access to the buffer pool data structure. So it was decided to leave a reference to the buffer pool in every page data structure. There were also a few occasions where one needed to access all buffer pools.&lt;br /&gt;&lt;br /&gt;The analysis proved that most of the accesses to the buffer pool was completely independent of other accesses to the buffer pool for other pages. InnoDB uses read-ahead and neighbour writes in the IO operations that are started from the buffer pool. These always operate on an extent of 64 pages. Thus it made sense to map the pages of 64 pages into one buffer pool to avoid having to operate on multiple buffer pools on every IO operation.&lt;br /&gt;&lt;br /&gt;With these design ideas there were only a few occasions where it was necessary to operate on all buffer pools. One such operation was when the log required knowledge of the page with the oldest LSN of the buffer pool. Now this operation requires looping over all buffer pools and checking the minimum LSN of each buffer pool instance. This is a fairly rare operation so isn't a scalability issue.&lt;br /&gt;&lt;br /&gt;The other operation with requirement to loop over all pages needed a bit more care,  this operation is the background operation flushing buffer pool pages to disk. A couple of problems needs consideration here. First it is necessary to flush pages regularly from all buffer pool instances, secondly it's still important to flush neighbours. Given that many disks are fairly slow, it can be problematic to spread the load in this manner to many buffer pools. This is an important consideration when deciding how many buffer pool instances to configure.&lt;br /&gt;&lt;br /&gt;The default number of buffer pool is one and for most small configurations with less than 8 cores it's mostly a good idea not to increase this value. If you have an installation that uses 8 cores or more one should also pay attention to the disk subsystem that is used. Given that InnoDB often writes up to 64 neighbours in each operation and that the flushing should happen each second, it makes sense to have a disk subsystem capable of having 500 IO operations per second to use 8 buffer pool instances. This can be set in the innodb_io_capacity configuration variable. One SSD drive should be capable of handling this, two fast hard drives or 3 slow ones.&lt;br /&gt;&lt;br /&gt;In our experiments we have mostly used 8 buffer pools, more buffer pools can be useful at times. The main problem with many buffer pools is related to the IO operations. It is important to have a balanced IO load in the MySQL server.&lt;br /&gt;&lt;br /&gt;Our analysis of using multiple buffer pool instances have shown some interesting facts. First the accesses to the buffer pools is in no way evenly spread out. This is not surprising given that e.g. the root page of an index is a very frequently accessed page. So using sysbench with only one table, there will obviously be much more accesses to certain buffer pool instances. Our experiments shows that in sysbench using 8 buffer pools, the hottest buffer pool receives about one third of all accesses. Given that sysbench is a worst case scenario for the multiple buffer pool case, this means that most applications that tend to use more tables and more indexes should have a much more even load on the buffer pools.&lt;br /&gt;&lt;br /&gt;So how much does multiple buffer pools improve the scalability of the MySQL Server. The answer is as usual dependent on application, OS, HW and so forth. But some general ideas can be found from our experiments. In sysbench using a load which is entirely held in main memory, so the disk is only used for flushing data pages and logging, in this system the multiple buffer pools can provide up to 10% improvement of the throughput in the system. In dbStress, the benchmark &lt;a href="http://dimitrik.free.fr"&gt; Dimitri&lt;/a&gt; uses, we have seen all the way up to 30% improvement. The reason here is most likely that dbStress uses more tables and have avoided many other bottlenecks in the MySQL Server and thus the buffer pool was a worse bottleneck in dbStress compared to sysbench. From the code it is also easy to see that the more IO operations the buffer pool performs, the more the buffer pool mutex will be acquired and also often held for a longer time. One such example is the search for a free page on the LRU list every time a read is performed into the buffer pool from the disk.&lt;br /&gt;&lt;br /&gt;Furthermore the use of multiple buffer pool opens up for many more improvements and also it doesn't remove the possibility to split the buffer pool mutex even more.&lt;br /&gt;&lt;br /&gt;Another manner of displaying the importance of using multiple buffer pools is the mutex statistics on the buffer pool mutex. With one buffer pool the buffer pool had about 750k accesses per second in a sysbench test where the MySQL Server had access to 16 cores. 50% of those accesses met a mutex already held, so it's obvious that the InnoDB mutex subsystem is very well aligned with the buffer pool mutex which have very short duration which makes spinning waiting for it very fruitful. Anyways a mutex which is held 50% of the time makes the buffer pool mutex a limiting factor of the MySQL Server. Quite a few threads will often spend time in the queue waiting for the buffer pool mutex. So splitting the buffer pool into 8 instances even in sysbench means that the hottest buffer pool receives about one third of the 750k accesses so should be held about 17% of the time. Our later experiments shows that the hottest buffer pool mutexes are now held up to about 14-15% of the time. So the theory matches the real world fairly well. This means that the buffer pool is still a major factor in the MySQL Scalability equation but is now more on par with the other bottlenecks in the MySQL Server.&lt;br /&gt;&lt;br /&gt;The development project of multiple buffer pools happened at a time when the MySQL and InnoDB teams could start working together. I was impressed by the willingness to cooperate and the competence in the InnoDB team that made it possible to introduce multiple buffer pools into MySQL 5.5. Our cooperation has continued since then and this has led to improvements in productivity on both parts. So for you as a MySQL user this spells good times going forward.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1598001367939401096?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1598001367939401096/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1598001367939401096' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1598001367939401096'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1598001367939401096'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/09/multiple-buffer-pools-in-mysql-55.html' title='Multiple Buffer Pools in MySQL 5.5'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2518607042447214127</id><published>2010-09-18T20:35:00.000+02:00</published><updated>2010-09-18T20:35:55.893+02:00</updated><title type='text'>Split log_sys mutex in MySQL 5.5</title><content type='html'>One important bottleneck in the MySQL Server is the log_sys mutex in InnoDB. Experiments using mutex statistics showed that this mutex was accessed about 250k times per second and that about 75% of those accesses had to queue up to get the mutex. One particular nuisance is that while holding the log_sys mutex it is necessary to grab the buffer pool mutex to put the changed pages to the start of the flush list indicating it is now the youngest dirty page in the buffer pool (this happens as part of the mini commit functionality in InnoDB). To some extent this contention point is decreased by splitting out the buffer flush list from the buffer pool mutex.&lt;br /&gt;&lt;br /&gt;We found a simple improvement of this particular problem. The simple solution is to introduce a new mutex, log_flush_order mutex, this mutex is acquired while still holding the log_sys mutex, as soon as it is acquired we can release the log_sys mutex. This gives us the property that the log_sys mutex is available for other operations such as starting a new log write while we still serialise the input of the dirty pages into the buffer pool flush list.&lt;br /&gt;&lt;br /&gt;As can be easily seen this solution decrease the hold time of the log_sys mutex while not decreasing the frequency it is acquired.&lt;br /&gt;In our experiments we saw that this very simple solution improved a Sysbench RW test by a few percent.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2518607042447214127?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2518607042447214127/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2518607042447214127' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2518607042447214127'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2518607042447214127'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/09/split-logsys-mutex-in-mysql-55.html' title='Split log_sys mutex in MySQL 5.5'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-4143084484159046356</id><published>2010-04-13T23:46:00.004+02:00</published><updated>2010-04-13T23:52:07.219+02:00</updated><title type='text'>Pointers to my presentations of MySQL 5.5 Scalability enhancements</title><content type='html'>Here is some pointers to my MySQL conference slides.&lt;br /&gt;Here is the presentation on MySQL 5.5 Performance and&lt;br /&gt;Scalability improvements&lt;br /&gt;&lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/13363"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is the presentation on the MySQL 5.5 Performance&lt;br /&gt;and Scalability benchmarks&lt;br /&gt;&lt;a href="http://en.oreilly.com/mysql2010/public/schedule/detail/14298"&gt;here&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-4143084484159046356?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/4143084484159046356/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=4143084484159046356' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4143084484159046356'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4143084484159046356'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/04/pointers-to-my-presentations-of-mysql.html' title='Pointers to my presentations of MySQL 5.5 Scalability enhancements'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3873240331399463662</id><published>2010-04-13T17:30:00.007+02:00</published><updated>2010-04-13T17:49:22.325+02:00</updated><title type='text'>Scalability enhancements of MySQL 5.5.4-m3</title><content type='html'>The MySQL 5.5.4-m3 beta version contains a number of&lt;br /&gt;interesting new scalability features.&lt;br /&gt;&lt;br /&gt;It contains the following InnoDB improvements:&lt;br /&gt;Multiple Buffer Pool instances&lt;br /&gt; - For example if the buffer pool is 8 GByte in size&lt;br /&gt;   the buffer pool can be split into 4 buffer pools&lt;br /&gt;   each containing 2 GBytes. Each page is mapped into&lt;br /&gt;   one and only one of these buffer pools.&lt;br /&gt;Split Log_sys mutex&lt;br /&gt; - We have ensured that the Log mutex and the buffer&lt;br /&gt;   pool mutex is more independent of each other. Also&lt;br /&gt;   the log_sys mutex is through this split less&lt;br /&gt;   contended.&lt;br /&gt;Split out flush list from buffer pool mutex&lt;br /&gt;Split Rollback Segment mutex into 128 instances&lt;br /&gt;Separate Purge Thread from Master Thread&lt;br /&gt; - Splitting out the purge thread from the master thread&lt;br /&gt;   is very important to ensure that performance is stable.&lt;br /&gt;Extended Change buffering, now also Deletes and purges are&lt;br /&gt;possible to buffer.&lt;br /&gt;&lt;br /&gt;It contains the following MySQL Server improvements:&lt;br /&gt;Split LOCK_open into&lt;br /&gt; - MDL hash mutex&lt;br /&gt; - MDL table lock mutex&lt;br /&gt; - atomic variable refresh_version&lt;br /&gt; - LOCK_open&lt;br /&gt;We also removed some parts not needing mutex&lt;br /&gt;protection from LOCK_open. All these  mutexes are taken&lt;br /&gt;independently of each other in almost all places.&lt;br /&gt;&lt;br /&gt;Remove LOCK_alarm (used in network handling)&lt;br /&gt;Remove LOCK_thread_count as scalability bottleneck&lt;br /&gt;&lt;br /&gt;In addition the InnoDB recovery has been improved by&lt;br /&gt;decreasing recovery time by 10x.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3873240331399463662?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3873240331399463662/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3873240331399463662' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3873240331399463662'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3873240331399463662'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/04/scalability-enhancements-of-mysql-554.html' title='Scalability enhancements of MySQL 5.5.4-m3'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-5895180319301139491</id><published>2010-04-13T14:01:00.009+02:00</published><updated>2010-04-13T17:48:40.755+02:00</updated><title type='text'>MySQL 5.5.4-m3 scales to 32 cores</title><content type='html'>The newly released MySQL 5.5 beta version MySQL 5.5.4-m3&lt;br /&gt;has a large number of significant performance improvements.&lt;br /&gt;These improvements makes it possible for MySQL to scale&lt;br /&gt;well even on 32-core servers. The graph below shows how&lt;br /&gt;MySQL 5.5.4-m3 scales from 12 cores to 32 cores using a&lt;br /&gt;single thread per core. The benchmark used here is&lt;br /&gt;dbStress. dbStress uses a number of tables which spreads&lt;br /&gt;the impact of mutexes and improves scalability.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_iUr9qDslPzg/S8RkwxRknXI/AAAAAAAAADg/uicvWeEBkzs/s1600/Scale32core.JPG"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 600px; height: 229px;" src="http://2.bp.blogspot.com/_iUr9qDslPzg/S8RkwxRknXI/AAAAAAAAADg/uicvWeEBkzs/s400/Scale32core.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5459599437303422322" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The graph below shows a similar scalability analysis on a&lt;br /&gt;smaller server where the benchmark used was Sysbench RW.&lt;br /&gt;The red line shows the scalability of MySQL 5.1.45, the&lt;br /&gt;green line shows scalability of MySQL 5.5.3-m3 and the&lt;br /&gt;blue line shows MySQL 5.5.4-m3. So this graph shows that&lt;br /&gt;even with a single table in Sysbench RW we are able to&lt;br /&gt;scale very well to 16 cores. The graph also displays how&lt;br /&gt;our work on scaling MySQL since the release of MySQL 5.1&lt;br /&gt;as GA in december 2008 is paying off in a significant&lt;br /&gt;manner. So MySQL is following very well in the development&lt;br /&gt;of new multi-core CPUs. The major performance enhancement&lt;br /&gt;in MySQL 5.5.3-m3 is the use of the InnoDB plugin with&lt;br /&gt;its inclusion of Google patches and other significant&lt;br /&gt;enhancements to InnoDB. MySQL 5.5.4-m3 contains a large&lt;br /&gt;number of new scalability enhancements that will be&lt;br /&gt;explained more about on this blog.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_iUr9qDslPzg/S8RnqpfDcTI/AAAAAAAAADo/dnl8Yz6PNss/s1600/sb_rw_16core.JPG"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 600px; height: 226px;" src="http://4.bp.blogspot.com/_iUr9qDslPzg/S8RnqpfDcTI/AAAAAAAAADo/dnl8Yz6PNss/s400/sb_rw_16core.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5459602630668153138" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-5895180319301139491?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/5895180319301139491/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=5895180319301139491' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5895180319301139491'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5895180319301139491'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2010/04/mysql-554-m3-scales-to-32-cores.html' title='MySQL 5.5.4-m3 scales to 32 cores'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_iUr9qDslPzg/S8RkwxRknXI/AAAAAAAAADg/uicvWeEBkzs/s72-c/Scale32core.JPG' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3402346743723763626</id><published>2009-12-03T11:41:00.000+01:00</published><updated>2009-12-03T11:42:36.068+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='threadpool'/><category scheme='http://www.blogger.com/atom/ns#' term='eventports'/><category scheme='http://www.blogger.com/atom/ns#' term='epoll'/><category scheme='http://www.blogger.com/atom/ns#' term='Windows'/><category scheme='http://www.blogger.com/atom/ns#' term='kqueue'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><title type='text'>New threadpool design</title><content type='html'>In MySQL 6.0 a threadpool design was implemented based on&lt;br /&gt;libevents and mutexes.&lt;br /&gt;&lt;br /&gt;This design unfortunately had a number of deficiences:&lt;br /&gt;1) The performance under high load was constrained due to a global&lt;br /&gt;mutex protecting libevent (see BUG#42288).&lt;br /&gt;&lt;br /&gt;2) The design had no flexibility to handle cases where threads were&lt;br /&gt;blocked due to either locking or latches. E.g. a thread held up by a&lt;br /&gt;table lock will be kept in the threadpool as an active thread until&lt;br /&gt;the table lock is released. If all threads are blocked in this state,&lt;br /&gt;it's easy to see that also any query that want to release the table&lt;br /&gt;lock cannot be processed since all threads in the thread pool are&lt;br /&gt;blocked waiting for the table lock (see BUG#34797).&lt;br /&gt;&lt;br /&gt;3) The design is intended to support very many connections but&lt;br /&gt;didn't use the most efficient methods to do this on Windows.&lt;br /&gt;libevent uses poll on Windows which isn't a scalable API when&lt;br /&gt;there are thousands of connections.&lt;br /&gt;&lt;br /&gt;Also in all of the benchmarking with MySQL it's been clear that&lt;br /&gt;performance of MySQL often drops significantly when there are too&lt;br /&gt;many threads hitting the MySQL Server. We have seen vast&lt;br /&gt;improvements of this the last year and there are some additional&lt;br /&gt;improvements of this in the pipeline for inclusion into the next&lt;br /&gt;MySQL milestone release. However the basic problem is still there,&lt;br /&gt;that too many waiters in the queue can lead to various performance&lt;br /&gt;drop off, one reason for such drop offs can be when mutex waits&lt;br /&gt;starts to timeout in InnoDB.&lt;br /&gt;&lt;br /&gt;So actually when we're looking at the threadpool design now, we're&lt;br /&gt;aiming at solving two issues in one. The first is to remove this&lt;br /&gt;scalability dropoff at high thread counts and the second is to&lt;br /&gt;efficiently handle MySQL servers with thousands of connections.&lt;br /&gt;Threadpool also enables us to have more control over on which&lt;br /&gt;CPU threads are scheduled to execute on. We can even dynamically&lt;br /&gt;adapt the CPU usage to optimize for lower power consumption by&lt;br /&gt;the MySQL Server with a clever threadpool design.&lt;br /&gt;&lt;br /&gt;We're currently in the phase of experimenting with different&lt;br /&gt;models, however we opted for a design based around usage of epoll&lt;br /&gt;on Linux, eventports on Solaris and kqueue for FreeBSD and&lt;br /&gt;Mac OS X. We will also make a poll-based variant work mostly for&lt;br /&gt;portability reasons although it's scalability won't be so great.&lt;br /&gt;For Windows we're experimenting with some Windows specific&lt;br /&gt;API's such as the IO Completion API.&lt;br /&gt;&lt;br /&gt;The code to support thread pooling in MySQL is actually very&lt;br /&gt;small so it's easy to adapt the code for a new experiment.&lt;br /&gt;&lt;br /&gt;Last week we found a model that seems to work very fine.&lt;br /&gt;The benchmarks shows that the performance on 1-32 threads is&lt;br /&gt;around 97-103% of one thread per connection performance. When&lt;br /&gt;we go beyond 32 threads the thread pool gains more and more,&lt;br /&gt;it's getting to about 130% at 256 threads and reaches 250%&lt;br /&gt;better performance on 1024 threads. However this model still&lt;br /&gt;have the problem of deadlocks, so there is still some work on&lt;br /&gt;refining this model. The current approach we have is fixing&lt;br /&gt;the deadlock problem but removes about 10-15% of the&lt;br /&gt;performance on lower number of threads. We have however&lt;br /&gt;numerous ideas on how to improve this.&lt;br /&gt;&lt;br /&gt;The basic idea with our current approach is to use thread groups,&lt;br /&gt;where each group works indepently of other groups in handling a&lt;br /&gt;set of connections. We're experimenting with the number of&lt;br /&gt;threads per group and also how to handle the situation when the&lt;br /&gt;last thread in the group is getting ready to execute a query.&lt;br /&gt;&lt;br /&gt;Compared to maximum performance around 32 threads we reach&lt;br /&gt;about 67% of this performance also on 1024 concurrently active&lt;br /&gt;threads. The drop off 33% is expected since there is some&lt;br /&gt;additional load when we reach an overload situation to ensure&lt;br /&gt;that the proper thread is handling the task. At low number of&lt;br /&gt;threads it's possible to immediately schedule the current worker&lt;br /&gt;thread to work on the query, but in the overload situation there&lt;br /&gt;is some queueing and context switching needed to handle the&lt;br /&gt;situation. However the overhead at overload is constant, so it&lt;br /&gt;doesn't worsen when the number of threads goes to a very high&lt;br /&gt;number.&lt;br /&gt;&lt;br /&gt;To handle the problems with blocked threads, we will implement a&lt;br /&gt;new part of the storage engine API and API towards the MySQL&lt;br /&gt;Server where the MySQL Server and the storage engines can&lt;br /&gt;announce that they're planning to go inactive for some reason.&lt;br /&gt;The threadpool will however handle the situation even if a thread&lt;br /&gt;goes to sleep without announcing it, it will simply be more&lt;br /&gt;performant if the announcement comes in those situations.&lt;br /&gt;&lt;br /&gt;The new MySQL development model with milestone release is a&lt;br /&gt;vital new injection to the MySQL development leading to the&lt;br /&gt;possibility of making new features available to the MySQL&lt;br /&gt;community users in an efficient manner without endangering the&lt;br /&gt;quality of the MySQL Server. There is a very strict quality model&lt;br /&gt;before approving any new feature into a milestone release.&lt;br /&gt;The 6.0 thread pool design would not meet this strict quality&lt;br /&gt;model. The new design must meet this strict quality model before&lt;br /&gt;being accepted although we have good hopes for this to happen.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3402346743723763626?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3402346743723763626/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3402346743723763626' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3402346743723763626'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3402346743723763626'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/12/new-threadpool-design.html' title='New threadpool design'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2097422431286092032</id><published>2009-11-20T08:23:00.002+01:00</published><updated>2009-11-20T08:40:19.687+01:00</updated><title type='text'>Partitioning as performance booster</title><content type='html'>When I developed partitioning for MySQL the main goal was&lt;br /&gt;to make it easier for MySQL users to manage large tables&lt;br /&gt;by enabling them to easily add and drop partitions.&lt;br /&gt;&lt;br /&gt;It turns out that partitioning can also be used as a manner&lt;br /&gt;to make MySQL more scalable. The reason is that in some&lt;br /&gt;cases the storage engine have internal locks per table or&lt;br /&gt;per index (one such example is the btr_search_latch in InnoDB).&lt;br /&gt;&lt;br /&gt;So in this case adding a&lt;br /&gt;PARTITION BY KEY (key_part)&lt;br /&gt;PARTITIONS 4&lt;br /&gt;to the table definition makes a very hot table into 4 tables&lt;br /&gt;from the storage engine point of view.&lt;br /&gt;&lt;br /&gt;This would mostly be beneficial in cases where the main&lt;br /&gt;operation is primary key lookups on the table. Dividing the&lt;br /&gt;indexes in cases of scans can be both positive and negative.&lt;br /&gt;So this solution is definitely not a winner for all situations.&lt;br /&gt;&lt;br /&gt;I haven't tried this out yet myself in my benchmark suites,&lt;br /&gt;but I plan to make some experiments in this area. It is usable&lt;br /&gt;in sysbench, it's possible to use for DBT2 (have used partitioning&lt;br /&gt;for DBT2 in MySQL Cluster benchmarks a lot already) and it's&lt;br /&gt;possible to use in Dimitri's dbStress benchmark.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2097422431286092032?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2097422431286092032/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2097422431286092032' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2097422431286092032'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2097422431286092032'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/partitioning-as-performance-booster.html' title='Partitioning as performance booster'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-149368189943814219</id><published>2009-11-20T08:18:00.002+01:00</published><updated>2009-11-20T08:23:11.409+01:00</updated><title type='text'>Full automation of DBT2 test runs in benchmark scripts</title><content type='html'>My benchmark scripts was upgraded once more today.&lt;br /&gt;I fixed all issues pertaining to full automation&lt;br /&gt;of DBT2 runs. So now it is as easy to start a&lt;br /&gt;DBT2 test run as it previously was to start a&lt;br /&gt;sysbench run. Obviously DBT2 was already earlier&lt;br /&gt;supported in the benchmark scripts, so what I did&lt;br /&gt;now was add the final steps to make it fully&lt;br /&gt;automated. This includes also generating the DBT2&lt;br /&gt;load files needed to load the DBT2 database.&lt;br /&gt;&lt;br /&gt;See the download section on www.iclaustron.com&lt;br /&gt;for the tarball including some description of&lt;br /&gt;how to configure sysbench and DBT2 test runs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-149368189943814219?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/149368189943814219/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=149368189943814219' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/149368189943814219'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/149368189943814219'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/full-automation-of-dbt2-test-runs-in.html' title='Full automation of DBT2 test runs in benchmark scripts'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6936738247882504971</id><published>2009-11-17T14:00:00.003+01:00</published><updated>2009-11-17T14:09:44.792+01:00</updated><title type='text'>New version of benchmark scripts also supporting Drizzle</title><content type='html'>I updated my benchmark scripts this week. These scripts can now&lt;br /&gt;run:&lt;br /&gt;&lt;br /&gt;- Sysbench benchmarks for MySQL and Drizzle&lt;br /&gt;- DBT2 benchmarks for MySQL and MySQL Cluster&lt;br /&gt;- TPC-W benchmark for MySQL Cluster&lt;br /&gt;&lt;br /&gt;There is also a number of scripts to start and stop&lt;br /&gt;MySQL, Drizzle and MySQL Cluster nodes.&lt;br /&gt;&lt;br /&gt;In this version I added Drizzle support for sysbench and also&lt;br /&gt;added a README-AUTOMATED file that describes the steps needed&lt;br /&gt;to set-up a completely automated sysbench run for MySQL&lt;br /&gt;and Drizzle.&lt;br /&gt;&lt;br /&gt;To run a MySQL sysbench benchmark one needs the DBT2 tarball,&lt;br /&gt;the sysbench tarball and a MySQL tarball (gzipped tarballs).&lt;br /&gt;&lt;br /&gt;The tarball is found on www.iclaustron.com in the downloads&lt;br /&gt;section and this version is named dbt2-0.37.47.tar.gz.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6936738247882504971?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6936738247882504971/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6936738247882504971' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6936738247882504971'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6936738247882504971'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/new-version-of-benchmark-scripts-also.html' title='New version of benchmark scripts also supporting Drizzle'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7607739465409331713</id><published>2009-11-12T12:32:00.005+01:00</published><updated>2009-11-12T15:41:30.982+01:00</updated><title type='text'>Improvements by LOCK_* patches</title><content type='html'>I have done a long series of tests to verify that&lt;br /&gt;the impact of the LOCK_alarm removal, removing two&lt;br /&gt;variables from LOCK_threadcount protection to being&lt;br /&gt;atomic increments instead and decreasing hold time&lt;br /&gt;of LOCK_open is positive in most if not all cases.&lt;br /&gt;&lt;br /&gt;There are a number of test cases needed:&lt;br /&gt;1) With and Without cpuspeed activated&lt;br /&gt;2) With sysbench local and with sysbench on another&lt;br /&gt;server&lt;br /&gt;3) With MySQL Server limited to 2,4,6,8,12,16 cores.&lt;br /&gt;4) With number of threads going from 1 to 256 threads&lt;br /&gt;in fair sized steps.&lt;br /&gt;&lt;br /&gt;I've done most of those tests for Sysbench Readonly&lt;br /&gt;and Sysbench Readwrite. The results are positive in&lt;br /&gt;almost all cases towards the baseline which is based&lt;br /&gt;off the MySQL 5.4.3 tree (not exactly 5.4.3 but close&lt;br /&gt;enough).&lt;br /&gt;&lt;br /&gt;In the case of networked benchmark with cpuspeed&lt;br /&gt;activated the gain is biggest, top performance goes up&lt;br /&gt;about 10% and also top performance moves from 64 to 128&lt;br /&gt;threads, for 256 threads the performance increases&lt;br /&gt;by about 50%.&lt;br /&gt;&lt;br /&gt;When cpuspeed is deactivated and we use networked&lt;br /&gt;benchmarks the network handling becomes a bottleneck, so&lt;br /&gt;the numbers here are less interesting since we need to&lt;br /&gt;resolve the network bottleneck first.&lt;br /&gt;&lt;br /&gt;With cpuspeed activated and local communication the top&lt;br /&gt;performance increase by about 8% and there is gain&lt;br /&gt;for all number of threads. The gain is a bit higher&lt;br /&gt;on sysbench readonly than on sysbench readwrite.&lt;br /&gt;&lt;br /&gt;With cpuspeed deactivated and local communication we&lt;br /&gt;naturally get the best numbers but also the&lt;br /&gt;smallest gains. Top performance of Sysbench&lt;br /&gt;Readonly increased by 2.5% and for Readwrite&lt;br /&gt;it increased 4%, the top performance for &lt;br /&gt;sysbench readwrite also moved from 16 threads&lt;br /&gt;to 32 threads. The improvement is slightly&lt;br /&gt;better on more threads.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7607739465409331713?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7607739465409331713/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7607739465409331713' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7607739465409331713'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7607739465409331713'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/improvements-by-lock-patches.html' title='Improvements by LOCK_* patches'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7053468885640059890</id><published>2009-11-11T14:14:00.003+01:00</published><updated>2009-11-11T14:45:17.085+01:00</updated><title type='text'>Analysis of InnoDB adaptive hash index parameter for sysbench</title><content type='html'>As I mentioned in a previous blog post I was suspicious that the&lt;br /&gt;adaptive hash index in InnoDB added to the scalability issues in&lt;br /&gt;the MySQL Server. So I decided to run a test where I disabled the&lt;br /&gt;adaptive hash index.&lt;br /&gt;&lt;br /&gt;The results were fairly much as expected. The adaptive hash&lt;br /&gt;index usage improves performance on low thread counts &lt;br /&gt;(up to about 16) by a few percent. However at 32 threads and&lt;br /&gt;beyond the performance is better without the adaptive hash&lt;br /&gt;index and equal in some cases. In particular the top performance&lt;br /&gt;goes up by about 3% when this is disabled.&lt;br /&gt;&lt;br /&gt;This confirms the documentation in the InnoDB manual that the&lt;br /&gt;adaptive hash index improves performance as long as the lock&lt;br /&gt;around it doesn't hurt performance. So for the majority of users&lt;br /&gt;it's a good idea to have it turned on, but for users with high-end&lt;br /&gt;servers it's a good idea to test if turning it off will improve&lt;br /&gt;performance.&lt;br /&gt;&lt;br /&gt;For sysbench benchmarks it's clearly a good idea to turn it off.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7053468885640059890?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7053468885640059890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7053468885640059890' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7053468885640059890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7053468885640059890'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/analysis-of-innodb-adaptive-hash-index.html' title='Analysis of InnoDB adaptive hash index parameter for sysbench'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8193836772607831770</id><published>2009-11-11T12:05:00.005+01:00</published><updated>2009-11-11T12:49:18.910+01:00</updated><title type='text'>245% improvement of MySQL performance in 1 year</title><content type='html'>When I did sysbench benchmarks 1 year ago I used a&lt;br /&gt;4-socket server, a Linux kernel based on 2.6.18 and&lt;br /&gt;MySQL 5.1. The sysbench readwrite numbers I got then&lt;br /&gt;was around 2700. When I run the same benchmarks&lt;br /&gt;now the numbers I get are 9300.&lt;br /&gt;&lt;br /&gt;These improvements obviously comes from a mixture&lt;br /&gt;of HW development, OS development (now using&lt;br /&gt;a 2.6.31 based kernel) and MySQL development.&lt;br /&gt;&lt;br /&gt;The machine is still a 4-socket server, the operating&lt;br /&gt;system is still Linux and the database is still MySQL,&lt;br /&gt;but the performance has improved by 245%. Needless&lt;br /&gt;to say this is an extraordinary performance&lt;br /&gt;improvement in just one year and clearly shows that&lt;br /&gt;the both the HW industry and the open source SW&lt;br /&gt;industry is quickly picking up on how to improve&lt;br /&gt;performance using multi-core multi-socket servers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8193836772607831770?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8193836772607831770/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8193836772607831770' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8193836772607831770'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8193836772607831770'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/245-improvement-of-mysql-performance-in.html' title='245% improvement of MySQL performance in 1 year'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3394377591524048085</id><published>2009-11-11T11:32:00.006+01:00</published><updated>2009-11-11T14:08:04.148+01:00</updated><title type='text'>Effect of CPU Powersave mode on Sysbench benchmarks</title><content type='html'>When I started analysing the various patches that I had made&lt;br /&gt;for improving the MySQL Server performance I did by mistake&lt;br /&gt;forget to turn cpuspeed off in Linux. This is a feature that&lt;br /&gt;makes it possible to run the CPU's on a much lower frequency&lt;br /&gt;in cases when they aren't so heavily used.&lt;br /&gt;&lt;br /&gt;So at first I considered simply turning it on and forgetting the&lt;br /&gt;data I had produced. Then I realised that actually to have&lt;br /&gt;cpuspeed activated is the default behaviour and for many&lt;br /&gt;servers out there in the world it is actually the best mode since&lt;br /&gt;most servers goes from high load to low load frequently.&lt;br /&gt;&lt;br /&gt;So I decided that it would be worthwhile to analyse behaviour&lt;br /&gt;both with and without this feature turned on.&lt;br /&gt;&lt;br /&gt;The cpuspeed feature was particularly involved when running&lt;br /&gt;sysbench on a different server, so thus using a socket based&lt;br /&gt;communication. In this case the performance drop off at&lt;br /&gt;64,128 and 256 threads were fairly significant. However the&lt;br /&gt;performance drop off when I added fixes for LOCK_open,&lt;br /&gt;LOCK_alarm and LOCK_threadcount was very significant. I&lt;br /&gt;got 50% better performance with these patches when&lt;br /&gt;cpuspeed was activated and I was running sysbench over the&lt;br /&gt;network.&lt;br /&gt;&lt;br /&gt;When I ran sysbench and mysqld on the same host the impact&lt;br /&gt;of the patches and cpuspeed was smaller but still significant.&lt;br /&gt;Turning cpuspeed on decreases the performance of sysbench&lt;br /&gt;readwrite by almost 10% for the baseline (MySQL 5.4.3) whereas&lt;br /&gt;with the patches that fixes the LOCK_* problems the drop in&lt;br /&gt;using cpuspeed is only 1%.&lt;br /&gt;&lt;br /&gt;So it seems like having many extra mutexes to pass through&lt;br /&gt;doesn't hurt performance so much when running at full CPU&lt;br /&gt;speed all the time, but as soon as the CPU power save mode&lt;br /&gt;is activated these many extra mutexes to pass through has a&lt;br /&gt;significant negative effect.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3394377591524048085?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3394377591524048085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3394377591524048085' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3394377591524048085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3394377591524048085'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/effect-of-cpu-powersave-mode-on.html' title='Effect of CPU Powersave mode on Sysbench benchmarks'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7992322630718016507</id><published>2009-11-11T10:48:00.008+01:00</published><updated>2009-11-11T13:56:45.302+01:00</updated><title type='text'>GDB stack trace analysis of Sysbench benchmark</title><content type='html'>After many unsuccessful attempts to get MySQL to run faster I got&lt;br /&gt;a very simple but effective gdb script courtesy of Intel. The script&lt;br /&gt;is very simple and attaches to the mysqld process and does a&lt;br /&gt;backtrace on all threads in the process and then it gathers&lt;br /&gt;statistics on all stacktraces.&lt;br /&gt;&lt;br /&gt;With this script I did some analysis on what goes in sysbench&lt;br /&gt;readonly and sysbench readwrite.&lt;br /&gt;&lt;br /&gt;Starting with sysbench readonly I discovered a lot of things I&lt;br /&gt;already knew such that LOCK_open is a major bottleneck.&lt;br /&gt;There were also many other things that popped up such as:&lt;br /&gt;LOCK_alarm, LOCK_threadcount, LOCK_grant,&lt;br /&gt;btr_search_latch and the mutex on the table object which is&lt;br /&gt;used to discover TABLE level locks on the MySQL level. This&lt;br /&gt;last lock is contended mostly because sysbench is only&lt;br /&gt;operating on one table. So most normal benchmarks will&lt;br /&gt;not have any major problems with this mutex since it's rare&lt;br /&gt;with applications that put so heavy weight on a single table.&lt;br /&gt;&lt;br /&gt;Interestingly the kernel_mutex in InnoDB wasn't so prevalent&lt;br /&gt;in sysbench readonly. Also I was a bit surprised to find the&lt;br /&gt;btr_search_latch there since it's a RW-lock, but it seemed&lt;br /&gt;like every now and then someone took a X-lock on the&lt;br /&gt;btr_search_latch even in readonly queries. Probably has&lt;br /&gt;something to do with InnoDB adaptive hash index.&lt;br /&gt;&lt;br /&gt;One surprising lock here is the LOCK_grant which is also&lt;br /&gt;a RW-lock and this is never taken in anything else than&lt;br /&gt;the Read mode unless one changes the grants which&lt;br /&gt;doesn't happen in a sysbench run. Some discussions&lt;br /&gt;concluded that pthread_rwlock is actually implemented&lt;br /&gt;by using a mutex and this is the cause of the contention&lt;br /&gt;on LOCK_grant. So to resolve that a read-optimised&lt;br /&gt;RW-lock is needed for the MySQL Server code.&lt;br /&gt;&lt;br /&gt;To remove LOCK_alarm there is already code in the MySQL&lt;br /&gt;Server prepared to do that so the patch to remove&lt;br /&gt;LOCK_alarm is fairly straightforward.&lt;br /&gt;&lt;br /&gt;To remove LOCK_threadcount isn't necessary, it's sufficient&lt;br /&gt;to remove two variables thread_running and global_query_id&lt;br /&gt;from being protected by this mutex and instead using&lt;br /&gt;atomic variables. To handle this one can use the my_atomic&lt;br /&gt;framework and add 64-bit support to it. Then the fix of&lt;br /&gt;LOCK_threadcount is straightforward.&lt;br /&gt;&lt;br /&gt;To resolve LOCK_open is obviously a bigger problem but a&lt;br /&gt;first step is to simply remove the hash calculation from&lt;br /&gt;LOCK_open.&lt;br /&gt;&lt;br /&gt;The MySQL runtime team is working on a metadata locking&lt;br /&gt;infrastucture that was in MySQL 6.0 but still have some&lt;br /&gt;quality issues. But when this code is ready it will also make&lt;br /&gt;it possible to resolve the LOCK_open problem. Actually the&lt;br /&gt;problem then is both about the new lock LOCK_mdl added&lt;br /&gt;by the metadata locking code and LOCK_open. But the new&lt;br /&gt;structure makes it possible to have more aggressive&lt;br /&gt;solutions on both LOCK_open and LOCK_mdl code.&lt;br /&gt;&lt;br /&gt;In analysing the sysbench readwrite benchmark the main&lt;br /&gt;contender together with LOCK_open was not very&lt;br /&gt;surprisingly the kernel_mutex in InnoDB. There are some&lt;br /&gt;ideas on how to improve this as well but first things first.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7992322630718016507?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7992322630718016507/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7992322630718016507' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7992322630718016507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7992322630718016507'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/gdb-stack-trace-analysis-of-sysbench.html' title='GDB stack trace analysis of Sysbench benchmark'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8793022171497929956</id><published>2009-11-11T09:36:00.005+01:00</published><updated>2009-11-11T10:22:45.033+01:00</updated><title type='text'>New partitioning SQL syntax added in next MySQL milestone release treee</title><content type='html'>This blogs gives some insights into the new SQL syntax added in WL#3352&lt;br /&gt;and WL#4444 and WL#4571 which all have been included in the&lt;br /&gt;mysql-next-mr tree which is the base for the next milestone release&lt;br /&gt;codenamed Betony.&lt;br /&gt;&lt;br /&gt;The purpose of these new changes is to enable improved partition pruning, and&lt;br /&gt;also making it possible to partition on strings. Supporting TRUNCATE on&lt;br /&gt;partitions will improve partition management and the ability to use separate&lt;br /&gt;key caches for different partitions makes MyISAM partitioned tables more&lt;br /&gt;useful. Finally we added a new function TO_SECONDS which makes it&lt;br /&gt;possible to get more fine-grained dates for partitions and subpartitions.&lt;br /&gt;&lt;br /&gt;This information will soon find its way into documentation but if you want&lt;br /&gt;to get started right away here it comes.&lt;br /&gt;&lt;br /&gt;There are 5 additions effectively:&lt;br /&gt;1) The ability to RANGE partition by column list instead of by function&lt;br /&gt;2) The ability to LIST partition by column list instead of by function&lt;br /&gt;3) A new function TO_SECONDS which can be used in partition functions&lt;br /&gt;and where partition pruning will be used also on ranges.&lt;br /&gt;4) The ability to TRUNCATE a partition&lt;br /&gt;5) The ability to use a keycache in MyISAM per partition&lt;br /&gt;&lt;br /&gt;Here's a few examples of how to use these now additions:&lt;br /&gt;&lt;br /&gt;1)&lt;br /&gt;CREATE TABLE t1 (a varchar(5), b int)&lt;br /&gt;PARTITION BY RANGE COLUMNS (a,b)&lt;br /&gt;( PARTITION p0 VALUES LESS THAN ("abc", 1),&lt;br /&gt;  PARTITION p1 VALUES LESS THAN ("def", 2));&lt;br /&gt;&lt;br /&gt;Some things noteworthy here:&lt;br /&gt;The checks for the constants is fairly strict. Thus using "1" for a constant&lt;br /&gt;to b isn't allowed, the constant must be of the same type as the field it&lt;br /&gt;maps to. Also sql_mode will be ignored for those partition constants.&lt;br /&gt;Thus e.g. even if sql_mode specifies that non-existing dates are allowed&lt;br /&gt;they will not be allowed in the partition constants since these constants&lt;br /&gt;will be a part of the table and need to live for longer than the current&lt;br /&gt;session.&lt;br /&gt;&lt;br /&gt;Character sets are allowed and the string constants will be interpreted&lt;br /&gt;in the character set their field belongs to. Also character set strings&lt;br /&gt;without mapping are allowed. If one tries to use SHOW CREATE TABLE&lt;br /&gt;on the partition table and the mapping of the partition constants from&lt;br /&gt;field charset to UTF8 fails or if mapping to client charset fails, then the&lt;br /&gt;partition constants will be written in hex string format.&lt;br /&gt;&lt;br /&gt;A partition constant can be MAXVALUE, NULL is however not allowed.&lt;br /&gt;There were some considerations to also be able to use MINVALUE&lt;br /&gt;which effectively would make it possible to create partitions where only&lt;br /&gt;the NULL values can go in. However this is still possible if one knows&lt;br /&gt;the minimum value of the field.&lt;br /&gt;&lt;br /&gt;It's possible to partition on integer fields, string fields and date fields.&lt;br /&gt;It's not possible to partition on BLOBs, SETs, ENUMs, GEOMETRY fields,&lt;br /&gt;BIT fields.&lt;br /&gt;&lt;br /&gt;2)&lt;br /&gt;CREATE TABLE t1 (a varchar(1))&lt;br /&gt;PARTITION BY LIST COLUMNS (a)&lt;br /&gt;( PARTITION p0 VALUES IN ("a","b","c"));&lt;br /&gt;&lt;br /&gt;CREATE TABLE t1 (a varchar(1), b int)&lt;br /&gt;PARTITION BY LIST COLUMNS (a,b)&lt;br /&gt;( PARTITION p0 VALUES IN (("a",1),("b",2),("c",3)));&lt;br /&gt;&lt;br /&gt;Noteworthy here is that parenthesis are required when more than one&lt;br /&gt;field is in the list of columns partitioned on. It's required to not use&lt;br /&gt;parenthesis when there is only one field.&lt;br /&gt;&lt;br /&gt;NULL values are allowed as in MySQL 5.1 but not MAXVALUE.&lt;br /&gt;&lt;br /&gt;3)&lt;br /&gt;CREATE TABLE t1 (a datetime)&lt;br /&gt;PARTITION BY RANGE (TO_SECONDS(a))&lt;br /&gt;( PARTITION p0 VALUES LESS THAN (TO_SECONDS("2009-11-11 08:00:00")),&lt;br /&gt;  PARTITION p1 VALUES LESS THAN (MAXVALUE));&lt;br /&gt;&lt;br /&gt;Same syntax as in MySQL 5.1 but also possible to use TO_SECONDS as a&lt;br /&gt;partition function.&lt;br /&gt;&lt;br /&gt;4)&lt;br /&gt;ALTER TABLE t1 TRUNCATE PARTITION p0;&lt;br /&gt;ALTER TABLE t1 TRUNCATE PARTITION p1;&lt;br /&gt;ALTER TABLE t1 TRUNCATE PARTITION ALL;&lt;br /&gt;&lt;br /&gt;This delete all rows in the given partitions and resets the given partitions&lt;br /&gt;auto_increment values (if exists) to 0. &lt;br /&gt;&lt;br /&gt;The syntax works in the same manner as for ANALYZE, OPTIMIZE and other&lt;br /&gt;commands that can be applied on partitions already in MySQL 5.1.&lt;br /&gt;&lt;br /&gt;5) &lt;br /&gt;&lt;br /&gt;CACHE INDEX t1 PARTITION p0 IN keycache_fast;&lt;br /&gt;CACHE INDEX t1 PARTITION p1, p2 IN keycache_slow;&lt;br /&gt;LOAD INDEX INTO CACHE t1 PARTITION p0;&lt;br /&gt;LOAD INDEX INTO CACHE t1 PARTITION p0, p1;&lt;br /&gt;&lt;br /&gt;This new syntax makes it possible to have separate key caches for different&lt;br /&gt;partitions in a partitioned table using the MyISAM storage engine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8793022171497929956?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8793022171497929956/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8793022171497929956' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8793022171497929956'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8793022171497929956'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/new-partitioning-sql-syntax-added-in.html' title='New partitioning SQL syntax added in next MySQL milestone release treee'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3834543859505234267</id><published>2009-11-10T16:47:00.013+01:00</published><updated>2009-11-10T17:34:44.195+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='partitioning'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='WL#3352'/><title type='text'>Improvement of MySQL partitioning included in MySQL's next milestone release tree</title><content type='html'>It was quite some time since I last blogged. It's not due to&lt;br /&gt;inactivity. For those of you that have followed my blog might&lt;br /&gt;have seen earlier blog posts about a new partitioning feature.&lt;br /&gt;&lt;br /&gt;This new partitioning I first blogged about in July 2006 and&lt;br /&gt;that blog is still the 3rd most popular blog of my blogs, even&lt;br /&gt;when looking at the last months views. The work on this started&lt;br /&gt;out in 2005 and so it's nice to now get it in a state where it's&lt;br /&gt;quality is ready for more heavy testing. For those interested&lt;br /&gt;in partitioning I think this feature will enlarge the number of&lt;br /&gt;cases where partitioning is applicable. It's now possible to&lt;br /&gt;partitioning on many more field types and also on multiple fields&lt;br /&gt;in an efficient manner.&lt;br /&gt;&lt;br /&gt;This feature described by &lt;a href="http://forge.mysql.com/worklog/task.php?id=3352"&gt;WL#3352&lt;/a&gt; has now been pushed&lt;br /&gt;to the mysql-next-mr tree. For those of you new to our new&lt;br /&gt;milestone release model this tree is where we push new features&lt;br /&gt;before clone off. After clone off this tree is merged with&lt;br /&gt;mysql-trunk tree and after about 6 months of bug fixing a Milestone&lt;br /&gt;Release is performed. The current mysql-trunk tree is the tree from&lt;br /&gt;which the current MySQL 5.4 releases are produced. A milestone&lt;br /&gt;release is of beta quality and some milestone releases will be&lt;br /&gt;continued towards a GA release. There will be a new milestone&lt;br /&gt;release with about 3-6 months interval. For more information on&lt;br /&gt;the release model see &lt;a href="http://dev.mysql.com/tech-resources/articles/mysql-release-model.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The WL#3352 was pushed into the Milestone which has the&lt;br /&gt;codename Betony in the article referred to above. The current&lt;br /&gt;MySQL 5.4 release series is called Azalea (actually 5.4.0 and&lt;br /&gt;5.4.1 belonged to Summit and 5.4.2 was the first Azalea release).&lt;br /&gt;&lt;br /&gt;The major advantages of this new feature is that it makes it&lt;br /&gt;possible to partition on string fields and also to partition on&lt;br /&gt;multiple fields and still get good partition pruning. Previous&lt;br /&gt;partitioning required a partition function that delivered an&lt;br /&gt;integer result and a couple of functions could deliver good&lt;br /&gt;partition pruning.&lt;br /&gt;&lt;br /&gt;Now it is possible to partition on most fields and even a set&lt;br /&gt;of them and always get good partition pruning.&lt;br /&gt;&lt;br /&gt;The final result of the new syntax is the following:&lt;br /&gt;&lt;br /&gt;CREATE TABLE t1 (a varchar(5) character set ucs2, b int)&lt;br /&gt;PARTITION BY RANGE COLUMNS (a,b)&lt;br /&gt;( PARTITION p0 VALUES LESS THAN (_ucs2 0x2020, 1),&lt;br /&gt;  PARTITION p1 VALUES LESS THAN (MAXVALUE, MAXVALUE));&lt;br /&gt;&lt;br /&gt;So the keyword COLUMNS indicates that a list of fields is&lt;br /&gt;used to partition on instead of a function. The new&lt;br /&gt;partitioning applies to RANGE and LIST partitioning.&lt;br /&gt;&lt;br /&gt;All the management functions for partitioning still applies.&lt;br /&gt;However the major difference comes when you do a query like:&lt;br /&gt;&lt;br /&gt;select * from t1 WHERE a &gt; _ucs2 0x2020;&lt;br /&gt;&lt;br /&gt;In this case the partition pruning will discover that only&lt;br /&gt;the partition p1 is possible to find records and will thus&lt;br /&gt;prune away partition p0. In MySQL 5.1 it's only possible&lt;br /&gt;to perform pruning on intervals for single fields and&lt;br /&gt;the partition function must also be of a type that is&lt;br /&gt;safe to always increase such as YEAR or TO_DAYS (actually&lt;br /&gt;this new feature also added the function TO_SECONDS to this&lt;br /&gt;list of functions that can be pruned on efficiently).&lt;br /&gt;&lt;br /&gt;So partition pruning on this works very much like an&lt;br /&gt;index. Not surprisingly the partition pruning code&lt;br /&gt;reuses the code for the range optimiser which looks&lt;br /&gt;at what indexes can be used for a certain query.&lt;br /&gt;&lt;br /&gt;If you wonder why I am using these _ucs2 constants&lt;br /&gt;as examples it's because I had to learn a lot about the&lt;br /&gt;character set code in MySQL to get everything right with&lt;br /&gt;this feature. Actually even found and fixed a few MySQL&lt;br /&gt;character set bugs in the process :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3834543859505234267?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3834543859505234267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3834543859505234267' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3834543859505234267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3834543859505234267'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/11/improvement-of-mysql-partitioning.html' title='Improvement of MySQL partitioning included in MySQL&apos;s next milestone release tree'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-894724118608911746</id><published>2009-09-15T17:46:00.008+02:00</published><updated>2009-09-15T18:03:06.470+02:00</updated><title type='text'>New launchpad tree for Column List Partitioning</title><content type='html'>I have added a new Launchpad tree for an improved&lt;br /&gt;&lt;a href="https://code.launchpad.net/~mikael-ronstrom/+junk/mysql-trunk-wl3352"&gt;partitioning feature&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This new tree is based off mysql-trunk which is the base for&lt;br /&gt;the next generation MySQL Server. The tree is now entering QA&lt;br /&gt;and have been extensively tested by development and thus it is&lt;br /&gt;very interesting to get feedback on usability of feature and&lt;br /&gt;feedback on quality issues. This will speed up the delivery of&lt;br /&gt;this new feature.&lt;br /&gt;&lt;br /&gt;You can find more description of the feature in a previous blog:&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2009/08/partition-by-columnlist-ready-for-alpha.html"&gt;Description of feature&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-894724118608911746?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/894724118608911746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=894724118608911746' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/894724118608911746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/894724118608911746'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/09/new-launchpad-tree-for-column-list.html' title='New launchpad tree for Column List Partitioning'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-5489785241860213545</id><published>2009-08-04T15:23:00.005+02:00</published><updated>2009-08-04T15:49:58.740+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='partitioning'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Partititon'/><title type='text'>Partition by column_list ready for alpha testers</title><content type='html'>I got time to spend on a really old worklog I completed coding&lt;br /&gt;already october 2005. I blogged about it in July 2006 and&lt;br /&gt;interestingly enough it's still the second most read blog&lt;br /&gt;entry on my blog (probably related to search engines in some&lt;br /&gt;way).&lt;br /&gt;&lt;br /&gt;I have merged it with the azalea tree (this is an internal code&lt;br /&gt;name for our development tree, name is likely to change). This&lt;br /&gt;tree contains subquery optimisations, Batched join and some more&lt;br /&gt;optimisations.&lt;br /&gt;&lt;br /&gt;I have fixed a whole bunch of bugs that always shows up in early&lt;br /&gt;code. The code quality is still alpha but at least you won't find&lt;br /&gt;10 bugs per hour :)&lt;br /&gt;&lt;br /&gt;&lt;a href="https://code.launchpad.net/~mikael-ronstrom/mysql-server/mysql-5.1-wl3352"&gt;&lt;br /&gt;Here&lt;/a&gt; you can find the launch pad tree for this code.&lt;br /&gt;&lt;br /&gt;There are two important additions made possible by this tree.&lt;br /&gt;1) New function to_seconds that is recognized by range optimiser&lt;br /&gt;to enable partition pruning when partitioning like:&lt;br /&gt;partition by range (to_seconds(time))&lt;br /&gt;2) New partitioning functionality that makes it possible to&lt;br /&gt;perform partition pruning over multiple fields.&lt;br /&gt;&lt;br /&gt;Most of the bugs I have fixed had to do with this partition pruning&lt;br /&gt;of multiple fields. The routine to discover which partitions are&lt;br /&gt;needed is called find_used_partitions (in sql/opt_range.cc) and this&lt;br /&gt;function is called recursively over a key tree. A key tree can be&lt;br /&gt;very complex and more or less have AND of key parts using next_key_part&lt;br /&gt;pointer and OR condition using left and right pointers. These left and&lt;br /&gt;right pointers can however show up a little here and there in the tree&lt;br /&gt;so one has to be very careful about how variables are assigned, saved&lt;br /&gt;and restored. I havent' worked so much with recursive functions so this&lt;br /&gt;is an interesting adventure.&lt;br /&gt;&lt;br /&gt;Here's my latest addition of a test case to give you an idea of how it&lt;br /&gt;works and also what works right now.&lt;br /&gt;&lt;br /&gt;create table t1 (a int, b char(10), c varchar(5), d int)&lt;br /&gt;partition by range column_list(a,b,c)&lt;br /&gt;subpartition by key (c,d)&lt;br /&gt;subpartitions 3&lt;br /&gt;( partition p0 values less than (column_list(1,'abc','abc')),&lt;br /&gt;  partition p1 values less than (column_list(2,'abc','abc')),&lt;br /&gt;  partition p2 values less than (column_list(3,'abc','abc')),&lt;br /&gt;  partition p3 values less than (column_list(4,'abc','abc')));&lt;br /&gt;&lt;br /&gt;insert into t1 values (1,'a','b',1),(2,'a','b',2),(3,'a','b',3);&lt;br /&gt;insert into t1 values (1,'b','c',1),(2,'b','c',2),(3,'b','c',3);&lt;br /&gt;insert into t1 values (1,'c','d',1),(2,'c','d',2),(3,'c','d',3);&lt;br /&gt;insert into t1 values (1,'d','e',1),(2,'d','e',2),(3,'d','e',3);&lt;br /&gt;select * from t1 where (a = 1 AND b &lt; 'd' AND (c = 'b' OR (c = 'c' AND d = 1)) OR&lt;br /&gt;                       (a = 1 AND b &gt;= 'a' AND (c = 'c' OR (c = 'd' AND d = 2))));&lt;br /&gt;&lt;br /&gt;So in the above select statement we are performing partition&lt;br /&gt;pruning over 3 fields and subpartition pruning over 2 fields&lt;br /&gt;and there are 5 different ranges in the query.&lt;br /&gt;&lt;br /&gt;So please go ahead and try this new tree out and see if it&lt;br /&gt;works for you.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-5489785241860213545?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/5489785241860213545/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=5489785241860213545' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5489785241860213545'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5489785241860213545'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/08/partition-by-columnlist-ready-for-alpha.html' title='Partition by column_list ready for alpha testers'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8466465928681634071</id><published>2009-07-08T20:03:00.007+02:00</published><updated>2009-07-08T20:24:33.002+02:00</updated><title type='text'>New update to DBT2 clone with automated sysbench runs</title><content type='html'>I have made an update to the DBT2 clone where I packed in all my&lt;br /&gt;benchmarking support scripts.&lt;br /&gt;&lt;br /&gt;This update adds a new script bench_prepare.sh that should be run&lt;br /&gt;from the benchmark server and uses the input of 3 tarballs, the&lt;br /&gt;DBT2 tarball, the sysbench tarball and a MySQL tarball. It will&lt;br /&gt;automatically build all needed binaries on both the benchmark&lt;br /&gt;server and on the MySQL Server machine (they could be on same&lt;br /&gt;machine or on different machine).&lt;br /&gt;&lt;br /&gt;The script only requires one parameter --default-directory where&lt;br /&gt;one configuration file called autobench.conf should be placed.&lt;br /&gt;This directory will also be used to house all result files,&lt;br /&gt;builds and generated configuration files for all involved scripts.&lt;br /&gt;&lt;br /&gt;The aim is to continue develop such that we can also benchmark&lt;br /&gt;easily using different Linux versions.&lt;br /&gt;&lt;br /&gt;The tarball can be downloaded from &lt;a href="http://www.iclaustron.com/downloads.html"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The script can also handle a MySQL Server which is Windows-based,&lt;br /&gt;but the benchmark server cannot run Windows for the moment.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8466465928681634071?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8466465928681634071/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8466465928681634071' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8466465928681634071'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8466465928681634071'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/07/new-update-to-dbt2-clone-with-automated.html' title='New update to DBT2 clone with automated sysbench runs'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2149071227686817080</id><published>2009-06-05T12:08:00.003+02:00</published><updated>2009-06-05T13:34:57.614+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Follow-up Analysis of Split Rollback Segment Mutex</title><content type='html'>I performed a new set of tests of the patch to split the&lt;br /&gt;rollback segment mutex on Linux. All these tests gave&lt;br /&gt;positive results with improvements in the order of 2%.&lt;br /&gt;&lt;br /&gt;One could also derive from the results some conclusions.&lt;br /&gt;The first conclusion is that this split mainly improves&lt;br /&gt;things when the number of threads is high and thus&lt;br /&gt;contention of mutexes is higher. At 256 threads a number&lt;br /&gt;of results improved up to 15%.&lt;br /&gt;&lt;br /&gt;The numbers on lower number of threads were more timid&lt;br /&gt;although in many cases an improvement was still seen.&lt;br /&gt;&lt;br /&gt;What was also noticeable was that the sysbench read-write&lt;br /&gt;with less reads which makes the transactions much shorter&lt;br /&gt;the positive impact was much greater and the positive&lt;br /&gt;impact on long transactions was much smaller (+0.4%&lt;br /&gt;versus +2.5%). The impact on the short transaction test&lt;br /&gt;with less reads was very positive also on lower number&lt;br /&gt;of threads, the result on 32 threads improved 7%.&lt;br /&gt;&lt;br /&gt;So the conclusion is that this patch is a useful contribution&lt;br /&gt;to improvements and in particular improves matters on high&lt;br /&gt;number of threads and with short transactions. According to&lt;br /&gt;a comment on the previous blog it is also very positive in&lt;br /&gt;insert benchmarks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2149071227686817080?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2149071227686817080/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2149071227686817080' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2149071227686817080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2149071227686817080'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/06/follow-up-analysis-of-split-rollback.html' title='Follow-up Analysis of Split Rollback Segment Mutex'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6740389914155855011</id><published>2009-06-04T18:06:00.007+02:00</published><updated>2009-06-04T18:38:47.748+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Results of shootout on split page hash in InnoDB</title><content type='html'>I have now tried out the buffer split page hash patches on&lt;br /&gt;both a Linux/x86 box and a SPARC/Solaris server (tests done&lt;br /&gt;by Dimitri).&lt;br /&gt;&lt;br /&gt;The three variants in short description are:&lt;br /&gt;1) The Google v3 derived patch. This introduces a new array&lt;br /&gt;of mutexes that only protect the buffer page hash. Thus some&lt;br /&gt;extra checking is needed to ensure the page hasn't been&lt;br /&gt;removed from the hash before using it. This is a very simple&lt;br /&gt;and attractive patch from that point of view. The patch uses&lt;br /&gt;an array of 64 mutexes.&lt;br /&gt;&lt;br /&gt;2) A variant I developed with some inspiration from the Percona&lt;br /&gt;patches. This patch uses an array of page hashes which each has&lt;br /&gt;its own read-write lock. I've tried this with 1, 4 and 16 page&lt;br /&gt;hashes and 4 is the optimum number. The rw-lock protects the&lt;br /&gt;page hash long enough to ensure that the block hasn't been&lt;br /&gt;possible to remove from the hash before the mutex is acquired.&lt;br /&gt;&lt;br /&gt;3) The last variant is a mix of the two first which uses the&lt;br /&gt;simplicity of the Google patch, uses a rw-lock instead and&lt;br /&gt;separate page hashes (to ensure read ahead doesn't have to&lt;br /&gt;go into all mutexes). Used an array of 4 page hashes here.&lt;br /&gt;&lt;br /&gt;The conclusion is that the only version that has consistently&lt;br /&gt;improved the MySQL 5.4.0 numbers is the version I originally&lt;br /&gt;developed (2 above).&lt;br /&gt;&lt;br /&gt;On sysbench read-write all versions improve numbers compared to&lt;br /&gt;MySQL 5.4.0. 2 and 3 improve 2% whereas the original Google&lt;br /&gt;patch improved with 1%.&lt;br /&gt;&lt;br /&gt;On sysbench read-only on Linux it was much harder to beat the&lt;br /&gt;MySQL 5.4.0 version. Only 2) did so and only by 0.5%. This is&lt;br /&gt;not so surprising since this mutex is not a blocker for read-only&lt;br /&gt;workloads. 1) gave -1% and 3) gave -0.3%.&lt;br /&gt;&lt;br /&gt;On a write intensive workload on Linux 1) and 3) performed 0.5%&lt;br /&gt;better than MySQL 5.4.0 whereas 2) gave 2% improvement.&lt;br /&gt;&lt;br /&gt;Finally on a sysbench read-write with less reads on Linux, all&lt;br /&gt;variants lost to MySQL 5.4.0. 1) by 2%, 2) by 0.1% and 3) by&lt;br /&gt;1%.&lt;br /&gt;&lt;br /&gt;Also the numbers from SPARC/Solaris give similar data. The major&lt;br /&gt;difference is that the positive impact on SPARC servers is much&lt;br /&gt;bigger, all the way up to 30% improvements in some cases. The&lt;br /&gt;most likely reason for this is that SPARC servers&lt;br /&gt;have bigger CPU caches and are thus more held back by lack of&lt;br /&gt;concurrency and not so much by increased working set. The x86&lt;br /&gt;box had 512kB cache per core and a 2MB L3 cache and is likely&lt;br /&gt;to be very sensitive to any increase of the working set.&lt;br /&gt;&lt;br /&gt;So the likely rationale for worse numbers in some cases is that&lt;br /&gt;more mutexes or rw-locks gives more cache misses.&lt;br /&gt;&lt;br /&gt;So given the outcome I will continue to see if I can keep the&lt;br /&gt;simplicity of the Google patch and still maintain the improved&lt;br /&gt;performance of my patch.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6740389914155855011?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6740389914155855011/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6740389914155855011' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6740389914155855011'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6740389914155855011'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/06/results-of-shootout-on-split-page-hash.html' title='Results of shootout on split page hash in InnoDB'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3420319795406507803</id><published>2009-06-03T14:00:00.004+02:00</published><updated>2009-06-04T18:38:33.349+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Some ideas on InnoDB kernel_mutex</title><content type='html'>I've noted that one reason that InnoDB can get difficulties&lt;br /&gt;when there are many concurrent transactions in the MySQL Server&lt;br /&gt;is that the lock time of the kernel_mutex often increases&lt;br /&gt;linearly with the number of active transactions. One such&lt;br /&gt;example is in trx_assign_read_view where each transaction&lt;br /&gt;that does a consistent read creates a copy of the transaction&lt;br /&gt;list to be able to deduce the read view of the transaction or&lt;br /&gt;statement.&lt;br /&gt;&lt;br /&gt;This means that each transaction is copied to the local transaction&lt;br /&gt;list while holding the critical kernel_mutex.&lt;br /&gt;&lt;br /&gt;Another such case is that most operations will set some kind of&lt;br /&gt;intention lock on the table. This lock code will walk through&lt;br /&gt;all locks on the table to check for compatible locks and the&lt;br /&gt;first time it will even do so twice. Thus if all threads use the&lt;br /&gt;same table (as they do in e.g. sysbench) then the number of locks&lt;br /&gt;on the table will be more or less equal to the number of active&lt;br /&gt;transactions.&lt;br /&gt;&lt;br /&gt;Thus as an example when running with 256 threads compared to 16&lt;br /&gt;threads the kernel_mutex lock will be held for 16 times longer&lt;br /&gt;and possibly even more since with more contention the mutex is&lt;br /&gt;needed for even longer time to start up waiting transactions.&lt;br /&gt;&lt;br /&gt;So this is an obvious problem, so what is then the solution?&lt;br /&gt;Not extremely easy but one thing one can do is to make the&lt;br /&gt;kernel_mutex into a read-write lock instead of a mutex. Then&lt;br /&gt;many threads can traverse those lists in parallel. It will&lt;br /&gt;still block others needing write access to the kernel_mutex&lt;br /&gt;but it should hopefully improve things.&lt;br /&gt;&lt;br /&gt;Another solution that is also going to improve the problem is&lt;br /&gt;to use thread pools. Thread pools ensure that not as many&lt;br /&gt;threads are active at a time. However we still have a problem&lt;br /&gt;that transactions can still be as many active in parallel as&lt;br /&gt;there are connections (although InnoDB has a limit of 1024&lt;br /&gt;concurrent active transactions). So the thread pool needs&lt;br /&gt;to prioritize connections with active transactions in cases&lt;br /&gt;where there are too many threads active at a time.&lt;br /&gt;&lt;br /&gt;This type of load regulation is often used in telecom systems&lt;br /&gt;where it is more important to give priority to those that have&lt;br /&gt;already invested time in running the activity. Those that are&lt;br /&gt;newcomer comes in when there are empty slots not taken by&lt;br /&gt;already running activities.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3420319795406507803?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3420319795406507803/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3420319795406507803' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3420319795406507803'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3420319795406507803'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/06/some-ideas-on-innodb-kernelmutex.html' title='Some ideas on InnoDB kernel_mutex'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7291145936549599925</id><published>2009-06-02T21:07:00.004+02:00</published><updated>2009-06-04T18:38:13.238+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Increasing log file size increases performance</title><content type='html'>I have been trying to analyse a number of new patches we've&lt;br /&gt;developed for MySQL to see their scalability. However I've&lt;br /&gt;have gotten very strange results which didn't at all compare&lt;br /&gt;with my old results and most of changes gave negative impact :(&lt;br /&gt;Not so nice.&lt;br /&gt;&lt;br /&gt;As part of debugging the issues with sysbench I decided to go&lt;br /&gt;back to the original version I used previously (sysbench 0.4.8).&lt;br /&gt;Interestingly even then I saw a difference on 16 and 32 threads&lt;br /&gt;whereas on 1-8 threads and 64+ threads the result were the same&lt;br /&gt;as usual.&lt;br /&gt;&lt;br /&gt;So I checked my configuration and it turned out that I had changed&lt;br /&gt;log file size to 200M from 1300M and also used 8 read and write&lt;br /&gt;threads instead of 4. I checked quickly and discovered that the&lt;br /&gt;parameter that affected the sysbench results was the log file size.&lt;br /&gt;So increasing the log file size from 200M to 1300M increased the&lt;br /&gt;top result at 32 threads from 3300 to 3750, a nice 15% increase.&lt;br /&gt;The setting of the number of read and write threads had no&lt;br /&gt;significant impact on performance.&lt;br /&gt;&lt;br /&gt;This is obviously part of the problem which is currently being&lt;br /&gt;researched both by &lt;a href="http://mysqlha.blogspot.com/index.html"&gt;Mark Callaghan&lt;/a&gt; and &lt;a href"http://dimitrik.free.fr/blog/"&gt;Dimitri&lt;/a&gt;.&lt;br /&gt;Coincidentally Dimitri has just recently blogged about this and&lt;br /&gt;provided a number of more detailed comparisons of the&lt;br /&gt;performance of various settings of the log file size in InnoDB.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7291145936549599925?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7291145936549599925/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7291145936549599925' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7291145936549599925'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7291145936549599925'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/06/increasing-log-file-size-increases.html' title='Increasing log file size increases performance'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6095385905720193771</id><published>2009-05-20T08:14:00.004+02:00</published><updated>2009-05-20T08:34:05.210+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><title type='text'>MySQL 5.4 Webinar</title><content type='html'>The quality of MySQL 5.4.0 is very high for a beta product.&lt;br /&gt;Four weeks after we released it as beta we have not had&lt;br /&gt;any real serious bugs reported yet. There are some issues&lt;br /&gt;due to deprecation of features, version numbers and a&lt;br /&gt;bug in the SHOW INNODB STATUS printout and some concerns&lt;br /&gt;with the new defaults when running on low-end machines.&lt;br /&gt;It's also important as usual to read the documentation&lt;br /&gt;before upgrading, it contains some instructions needed to&lt;br /&gt;make an upgrade successful. The upgrade issue comes from&lt;br /&gt;changing the defaults of the InnoDB log file sizes.&lt;br /&gt;&lt;br /&gt;For those of you who want to know more about MySQL 5.4.0&lt;br /&gt;and it's characteristics and why you should use it, please&lt;br /&gt;join this &lt;a href="http://www.mysql.com/news-and-events/web-seminars/display-343.html"&gt;webinar&lt;/a&gt; where Allan Packer will explain what&lt;br /&gt;has been done in MySQL 5.4.0.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6095385905720193771?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6095385905720193771/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6095385905720193771' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6095385905720193771'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6095385905720193771'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/mysql-54-webinar.html' title='MySQL 5.4 Webinar'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8352107937638382854</id><published>2009-05-19T14:37:00.006+02:00</published><updated>2009-05-19T14:58:33.368+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmarks'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Patches ready for buf page hash split shootout</title><content type='html'>Today I created a patch that builds on the Google v3&lt;br /&gt;patch where I added some ideas of my own and some ideas&lt;br /&gt;from the Percona patches. The patch is &lt;a href="http://lists.mysql.com/commits/74473"&gt;here.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is a reference to the patch derived from the Google&lt;br /&gt;v3 &lt;a href="http://lists.mysql.com/commits/73859"&gt;patch.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lists.mysql.com/commits/74475"&gt;Here&lt;/a&gt; is a reference to my original patch (this is likely to&lt;br /&gt;contain a bug somewhere so usage for other than benchmarking&lt;br /&gt;isn't recommended).&lt;br /&gt;&lt;br /&gt;So it will be interesting to see a comparison of all those&lt;br /&gt;variants directly against each other on a number of benchmarks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8352107937638382854?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8352107937638382854/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8352107937638382854' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8352107937638382854'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8352107937638382854'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/patches-ready-for-buf-page-hash-split.html' title='Patches ready for buf page hash split shootout'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-537317119967716080</id><published>2009-05-19T13:15:00.012+02:00</published><updated>2009-05-19T14:59:19.653+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='buffer pool mutex'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Analysis of split flush list from buffer pool</title><content type='html'>In the Google v3 patch the buffer pool mutex have been&lt;br /&gt;split into an array of buffer page hash mutexes and a&lt;br /&gt;buffer flush list mutex and the buffer pool mutex also&lt;br /&gt;remains.&lt;br /&gt;&lt;br /&gt;I derived the patch splitting out the buffer flush list&lt;br /&gt;mutex from the Google v3 patch against the MySQL 5.4.0&lt;br /&gt;tree. The patch is &lt;a href="http://lists.mysql.com/commits/73739"&gt;here.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I derived a lot of prototype patches based on MySQL 5.4.0&lt;br /&gt;and Dimitri tried them out. This particular patch seems&lt;br /&gt;to be the most successful in the pack of patches we&lt;br /&gt;tested. It had a consistent positive impact.&lt;br /&gt;&lt;br /&gt;The main contribution of this patch is twofold. It&lt;br /&gt;decreases the pressure on the buffer pool mutex by&lt;br /&gt;splitting out a critical part where the oldest dirty&lt;br /&gt;pages are flushed out to disk. In addition this patch&lt;br /&gt;also decreases the pressure on the log_sys mutex by&lt;br /&gt;releasing the log_sys mutex earlier for the mini-&lt;br /&gt;transactions. In addition it removes interaction&lt;br /&gt;between the buffer pool mutex and the log_sys mutex.&lt;br /&gt;So previously both mutexes had to be held for a&lt;br /&gt;while, this is no longer necessary since only the&lt;br /&gt;flush list mutex is needed, not the buffer pool&lt;br /&gt;mutex.&lt;br /&gt;&lt;br /&gt;The new patch is the b11 variant which is red in&lt;br /&gt;the comparison graphs.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_iUr9qDslPzg/ShKhGgnKsmI/AAAAAAAAACg/Rczbmg4rPeI/s1600-h/Hist_ALL_b11_RW1.16cores2.ccr24-tps_avg-1.gif"&gt;&lt;img style="float:center; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://1.bp.blogspot.com/_iUr9qDslPzg/ShKhGgnKsmI/AAAAAAAAACg/Rczbmg4rPeI/s400/Hist_ALL_b11_RW1.16cores2.ccr24-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5337505641592959586" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;As we can see the read-write tests have a pretty significant boost&lt;br /&gt;from this patch, it improves top performance by 5% and by 10-20%&lt;br /&gt;on higher number of threads. It also moves the maximum from 16 to&lt;br /&gt;32 threads.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_iUr9qDslPzg/ShKhb8WDmgI/AAAAAAAAACo/Hyj8sli7r_g/s1600-h/Hist_ALL_b9_RW0.16cores2.ccr24-tps_avg-1.gif"&gt;&lt;img style="float:center; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 145px;" src="http://3.bp.blogspot.com/_iUr9qDslPzg/ShKhb8WDmgI/AAAAAAAAACo/Hyj8sli7r_g/s400/Hist_ALL_b9_RW0.16cores2.ccr24-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5337506009814637058" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Even on read-only there are some positive improvements although&lt;br /&gt;it is very possible those are more random in nature.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_iUr9qDslPzg/ShKfKqsVSfI/AAAAAAAAACY/j9TFAsj35F0/s1600-h/Hist_ccrALL_RW1.16cores2.MySQL-5.Perfb11-gcc43-tps_avg-1.gif"&gt;&lt;img style="float:center; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://4.bp.blogspot.com/_iUr9qDslPzg/ShKfKqsVSfI/AAAAAAAAACY/j9TFAsj35F0/s400/Hist_ccrALL_RW1.16cores2.MySQL-5.Perfb11-gcc43-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5337503513995201010" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Finally the above picture shows that this patch also moves the&lt;br /&gt;optimal InnoDB thread concurrency up to 24 from 16 since it&lt;br /&gt;allows for more concurrency inside InnoDB. This is also visible&lt;br /&gt;by looking at the numbers for InnoDB Thread Concurrency set to 0&lt;br /&gt;as seen below.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_iUr9qDslPzg/ShKjbgcztlI/AAAAAAAAACw/hwYOFwDAJ6k/s1600-h/Hist_ALL_b9_RW1.16cores2.ccr0-tps_avg-1.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://2.bp.blogspot.com/_iUr9qDslPzg/ShKjbgcztlI/AAAAAAAAACw/hwYOFwDAJ6k/s400/Hist_ALL_b9_RW1.16cores2.ccr0-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5337508201350018642" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-537317119967716080?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/537317119967716080/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=537317119967716080' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/537317119967716080'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/537317119967716080'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/analysis-of-split-flush-list-from.html' title='Analysis of split flush list from buffer pool'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_iUr9qDslPzg/ShKhGgnKsmI/AAAAAAAAACg/Rczbmg4rPeI/s72-c/Hist_ALL_b11_RW1.16cores2.ccr24-tps_avg-1.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1526399927829744332</id><published>2009-05-15T09:56:00.006+02:00</published><updated>2009-05-19T14:59:51.445+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmarks'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Shootout of split page hash from InnoDB buffer pool mutex</title><content type='html'>One of the hot mutexes in InnoDB is the buffer pool mutex.&lt;br /&gt;Among other things this mutex protects the page hash where&lt;br /&gt;pages reside when they are in the cache.&lt;br /&gt;&lt;br /&gt;There is already a number of variants of how to split out&lt;br /&gt;this mutex. Here follows a short description of the various&lt;br /&gt;approaches.&lt;br /&gt;&lt;br /&gt;1) Google v3 approach&lt;br /&gt;Ben Hardy at Google took the approach of using an array of&lt;br /&gt;mutexes (64 mutexes) and this mutex only protects the&lt;br /&gt;actual read, insert and delete from the page hash table.&lt;br /&gt;This has the consequence of a very simple patch, it means&lt;br /&gt;also that when the block has been locked one has to check&lt;br /&gt;that the owner of the block hasn't changed since we didn't&lt;br /&gt;protect the block between the read of the hash and the&lt;br /&gt;locking of the block, thus someone is capable of coming in&lt;br /&gt;between and grabbing the block for another page before we&lt;br /&gt;get to lock the block. In addition this patch focuses&lt;br /&gt;mainly on optimising the path in the buf_page_get_gen&lt;br /&gt;which is the routine used to get a page from the page&lt;br /&gt;cache and thus the hot-spot.&lt;br /&gt;&lt;br /&gt;2) Percona approaches&lt;br /&gt;Percona has done a series of approaches where the first&lt;br /&gt;only split the page hash as one mutex and still protecting&lt;br /&gt;the blocks from being changed while holding this mutex.&lt;br /&gt;Next step was to change the mutex into a read-write lock.&lt;br /&gt;&lt;br /&gt;3) My approach&lt;br /&gt;My approach was inspired by Percona but added two main&lt;br /&gt;things. First it split the page hash into a number of&lt;br /&gt;page hashes and had one RW-lock per page hash (this&lt;br /&gt;number has been tested with 4, 8 and 16 and 4 was the&lt;br /&gt;optimal on Linux at least). In addition to avoid having&lt;br /&gt;to lock and unlock multiple pages while going through&lt;br /&gt;the read ahead code the hash function to decide which&lt;br /&gt;page hash to use decided on the same page hash for all&lt;br /&gt;pages within 1 MByte (which is the unit of read ahead&lt;br /&gt;in InnoDB).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Pros and Cons&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The simplest patch is the Google patch which makes for&lt;br /&gt;a very simple patch and also by only focusing on&lt;br /&gt;buf_page_get_gen avoids a lot of possible extra traps&lt;br /&gt;that are likely if one tries to solve too much of the&lt;br /&gt;problem.&lt;br /&gt;&lt;br /&gt;Using a RW-lock instead of a mutex seems like at least&lt;br /&gt;a manner of improving the concurrency but could of&lt;br /&gt;course impose a higher overhead as well so here&lt;br /&gt;benchmarking should show which is best here.&lt;br /&gt;&lt;br /&gt;When using an array of locks it makes sense to optimise&lt;br /&gt;for read ahead functionality since this is a hot-spot&lt;br /&gt;in the code as has been shown in some blogs lately.&lt;br /&gt;&lt;br /&gt;4) Mixed approach&lt;br /&gt;So a natural solution is then to also try a mix of the&lt;br /&gt;Google variant with my approach. So still using an&lt;br /&gt;array of locks (either mutex or RW-locks, whatever&lt;br /&gt;has the optimal performance) but ensuring that the&lt;br /&gt;pages within a read ahead area is locked by the same&lt;br /&gt;lock.&lt;br /&gt;&lt;br /&gt;This approach reuses the simplicity of the Google&lt;br /&gt;approach, the total lack of deadlock problems for&lt;br /&gt;the Google approach with the optimised layout from&lt;br /&gt;my approach and the idea of RW-locks from Percona.&lt;br /&gt;&lt;br /&gt;We don't have any results of this shootout yet.&lt;br /&gt;This shootout should also discover the optimum number&lt;br /&gt;of areas to split the page cache into, Google has&lt;br /&gt;used 64, but my results so far indicates a number of&lt;br /&gt;4 seems more appropriate.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1526399927829744332?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1526399927829744332/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1526399927829744332' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1526399927829744332'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1526399927829744332'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/shootout-of-split-page-hash-from-innodb.html' title='Shootout of split page hash from InnoDB buffer pool mutex'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8541777099020232489</id><published>2009-05-14T14:53:00.010+02:00</published><updated>2009-05-19T15:00:18.823+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb-thread-concurrency-timer-based'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmarks'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb-thread-concurrency'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>More analysis of InnoDB Thread Concurrency</title><content type='html'>When I worked with &lt;a href="http://dimitrik.free.fr/blog/"&gt;Dimitri&lt;/a&gt; on the analysis of the&lt;br /&gt;Split Rollback Segment Mutex he came up with numbers&lt;br /&gt;on InnoDB Thread Concurrency set to 16 and 32 and I was curious&lt;br /&gt;to see if 24 was the optimal setting. So he made some new runs and&lt;br /&gt;some new graphs that I found interesting.&lt;br /&gt;&lt;br /&gt;The first graph analyses behaviour of MySQL 5.4.0 on a SPARC&lt;br /&gt;Server using InnoDB Thread Concurrency set to 0, 16, 24 and 32.&lt;br /&gt;Interestingly for both readonly and readwrite benchmarks the&lt;br /&gt;optimal setting for concurrency is 16 whereas the top numbers&lt;br /&gt;(at 32 threads) is achieved with concurrency set to 24 or 32.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_iUr9qDslPzg/SgwWBb28cCI/AAAAAAAAAB4/iKa-0Axxii4/s1600-h/Hist_ccrALL_RW0.16cores2.MySQL-5.4.0-gcc43-tps_avg-1.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://4.bp.blogspot.com/_iUr9qDslPzg/SgwWBb28cCI/AAAAAAAAAB4/iKa-0Axxii4/s400/Hist_ccrALL_RW0.16cores2.MySQL-5.4.0-gcc43-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335663872441085986" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_iUr9qDslPzg/SgwWMEtlayI/AAAAAAAAACA/LfGxqOqR_98/s1600-h/Hist_ccrALL_RW1.16cores2.MySQL-5.4.0-gcc43-tps_avg-1.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://4.bp.blogspot.com/_iUr9qDslPzg/SgwWMEtlayI/AAAAAAAAACA/LfGxqOqR_98/s400/Hist_ccrALL_RW1.16cores2.MySQL-5.4.0-gcc43-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335664055206374178" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;So on the current MySQL 5.4.0 on this particular benchmark and&lt;br /&gt;platform it seems that 16 is the optimal setting. However Dimitri&lt;br /&gt;also analysed the same thing using the new patch for Splitting the&lt;br /&gt;Rollback Segment Mutex and now the story changes.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://lh3.ggpht.com/_iUr9qDslPzg/SgwXXM365oI/AAAAAAAAACI/3qK6xfqZwUg/s1600-h/Hist_ccrALL_RW1.16cores2.MySQL-5.Perfb9-gcc43-tps_avg-1.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://lh3.ggpht.com/_iUr9qDslPzg/SgwXXM365oI/AAAAAAAAACI/3qK6xfqZwUg/s400/Hist_ccrALL_RW1.16cores2.MySQL-5.Perfb9-gcc43-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335665345887397506" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_iUr9qDslPzg/SgwXiUegtCI/AAAAAAAAACQ/EAqS05Dg4jQ/s1600-h/Hist_ccrALL_RW0.16cores2.MySQL-5.Perfb9-gcc43-tps_avg-1.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://4.bp.blogspot.com/_iUr9qDslPzg/SgwXiUegtCI/AAAAAAAAACQ/EAqS05Dg4jQ/s400/Hist_ccrALL_RW0.16cores2.MySQL-5.Perfb9-gcc43-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335665536906867746" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;So with this patch setting InnoDB Thread Concurrency to 24&lt;br /&gt;is now the optimum setting. So it's clear that as we get more&lt;br /&gt;and more improvements to the scalability of the MySQL Server and&lt;br /&gt;InnoDB it will be optimal with more and more parallel threads&lt;br /&gt;inside InnoDB as well. So this means that this setting is quite&lt;br /&gt;likely to change as development proceeds but for MySQL 5.4.0 a&lt;br /&gt;setting of around 16-24 is often a good one. To actually change&lt;br /&gt;the default setting requires much more testing of various&lt;br /&gt;workloads on many different computer architectures.&lt;br /&gt;&lt;br /&gt;Similar testing I have performed on Linux using sysbench implies&lt;br /&gt;that the optimal setting is around 24-28. Also the difference&lt;br /&gt;between setting it to 0 and 24 is much smaller on Linux (15%&lt;br /&gt;on 256 threads as shown in blog yesterday). We haven't analysed&lt;br /&gt;the big difference on these SPARC Servers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8541777099020232489?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8541777099020232489/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8541777099020232489' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8541777099020232489'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8541777099020232489'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/more-analysis-of-innodb-thread.html' title='More analysis of InnoDB Thread Concurrency'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_iUr9qDslPzg/SgwWBb28cCI/AAAAAAAAAB4/iKa-0Axxii4/s72-c/Hist_ccrALL_RW0.16cores2.MySQL-5.4.0-gcc43-tps_avg-1.gif' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2392934125826279954</id><published>2009-05-14T11:33:00.009+02:00</published><updated>2009-05-19T15:00:49.477+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmarks'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Analysis of Split of Rollback Segment Mutex</title><content type='html'>When I read the blog about  &lt;a href="http://www.mysqlperformanceblog.com/2009/01/18/partial-fix-of-innodb-scalability-rollback-segments"&gt;Split Rollback Segment Mutex&lt;/a&gt;,&lt;br /&gt;I was interested to verify those results in the context of MySQL 5.4.0.&lt;br /&gt;&lt;br /&gt;The patch can be found &lt;a href="http://lists.mysql.com/commits/74018"&gt;here.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We've analysed this patch both on a large SPARC system and on my&lt;br /&gt;benchmark x86/Linux machine. Our results tend to be positive for&lt;br /&gt;readwrite benchmarks but sometimes negative for readonly&lt;br /&gt;benchmarks. Also the gain is much smaller than found in the&lt;br /&gt;blog.&lt;br /&gt;&lt;br /&gt;Also this patch has two negative effects, the first is that it&lt;br /&gt;provides an upgrade problem, this can probably be handled in the&lt;br /&gt;InnoDB code, but requires quite some digging. The other is that&lt;br /&gt;instead of writing UNDO results to one UNDO log, we write it to&lt;br /&gt;several UNDO logs, thus decreasing the buffering effect to the&lt;br /&gt;file system.&lt;br /&gt;&lt;br /&gt;On Linux I found on readwrite benchmarks up to 7-8%&lt;br /&gt;improvements of the top results. On readonly it sometime dropped&lt;br /&gt;about 1-3%. I also tried with varying numbers of rollback&lt;br /&gt;segments and found 4 and 8 to be better than 16. So from the&lt;br /&gt;above point of view the number of rollback segments set to 4 is&lt;br /&gt;probably best. The patch uses 8 (it's actually set to 9 since&lt;br /&gt;the system rollback segment is a bit special).&lt;br /&gt;&lt;br /&gt;Here are some graphs from Dimitri running it on some fat SPARC&lt;br /&gt;server (MySQL-5.Perfb9-gcc43 is 5.4.0 plus the above patch).&lt;br /&gt;&lt;br /&gt;The first graph shows the behaviour when InnoDB Thread Concurrency&lt;br /&gt;is 0, here we see a speedup in the range of 3-5%.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_iUr9qDslPzg/SgwDMwj90zI/AAAAAAAAABo/DOqqzvrQjIs/s1600-h/Hist_ALL_b9_RW1.16cores2.ccr0-tps_avg-1.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://3.bp.blogspot.com/_iUr9qDslPzg/SgwDMwj90zI/AAAAAAAAABo/DOqqzvrQjIs/s400/Hist_ALL_b9_RW1.16cores2.ccr0-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335643176256262962" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The same results for the readonly benchmark shows positive results as well.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_iUr9qDslPzg/SgwD8kt7XkI/AAAAAAAAABw/bpwnGVDFLq8/s1600-h/Hist_ALL_b9_RW0.16cores2.ccr0-tps_avg-1.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 175px;" src="http://4.bp.blogspot.com/_iUr9qDslPzg/SgwD8kt7XkI/AAAAAAAAABw/bpwnGVDFLq8/s400/Hist_ALL_b9_RW0.16cores2.ccr0-tps_avg-1.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335643997710540354" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;When one sets InnoDB Thread Concurrency equal to 16, 24 or 32&lt;br /&gt;the behaviour is different. It turns out that we get worse&lt;br /&gt;performance using 16 but get more positive impact using 24 and&lt;br /&gt;even more using 32. So it seems that this patch requires less&lt;br /&gt;limits to parallelism to get the best behaviour.&lt;br /&gt;&lt;br /&gt;So one impact of this patch is that it can sustain a higher&lt;br /&gt;number of concurrent threads and there is a small positive impact&lt;br /&gt;on the performance as well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2392934125826279954?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2392934125826279954/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2392934125826279954' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2392934125826279954'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2392934125826279954'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/analysis-of-split-of-rollback-segment.html' title='Analysis of Split of Rollback Segment Mutex'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_iUr9qDslPzg/SgwDMwj90zI/AAAAAAAAABo/DOqqzvrQjIs/s72-c/Hist_ALL_b9_RW1.16cores2.ccr0-tps_avg-1.gif' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-718702013476234847</id><published>2009-05-13T18:00:00.006+02:00</published><updated>2009-05-19T15:01:06.239+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb-thread-concurrency-timer-based'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmarks'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb-thread-concurrency'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>More data on InnoDB Thread Concurrency</title><content type='html'>Here is the performance graph comparing using&lt;br /&gt;InnoDB Thread Concurrency equal to 0 and&lt;br /&gt;InnoDB Thread Concurrency equal to 24 using&lt;br /&gt;sysbench readwrite with the new InnoDB&lt;br /&gt;Thread concurrency algorithm as introduced&lt;br /&gt;in MySQL 5.4.0.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_iUr9qDslPzg/SgrvjWfFZhI/AAAAAAAAABg/GcpWHgBXYgU/s1600-h/image001.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 400px; height: 244px;" src="http://3.bp.blogspot.com/_iUr9qDslPzg/SgrvjWfFZhI/AAAAAAAAABg/GcpWHgBXYgU/s400/image001.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335340099184190994" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-718702013476234847?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/718702013476234847/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=718702013476234847' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/718702013476234847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/718702013476234847'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/more-data-on-innodb-thread-concurrency.html' title='More data on InnoDB Thread Concurrency'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_iUr9qDslPzg/SgrvjWfFZhI/AAAAAAAAABg/GcpWHgBXYgU/s72-c/image001.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-9030950933264969421</id><published>2009-05-13T17:35:00.009+02:00</published><updated>2009-05-19T15:01:30.635+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmarks'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>Analysis of Google patches on 4,8 and 12 cores</title><content type='html'>One of the goals we had originally with the MySQL 5.4&lt;br /&gt;development was to improve scaling from 4 cores to&lt;br /&gt;8 cores. So in my early testing I ran comparisons of&lt;br /&gt;the Google SMP + IO + tcmalloc patches on 4, 8 and 12&lt;br /&gt;cores to see how it behaved compared with a stock&lt;br /&gt;MySQL 5.1.28 version (Note the comparison here was&lt;br /&gt;done on a very early version of 5.4, 5.4.0 have a&lt;br /&gt;set of additional patches applied to it).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_iUr9qDslPzg/SgrsEWI9ZUI/AAAAAAAAABY/2KfTblHP3BA/s1600-h/image001.gif"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 500px; height: 274px;" src="http://3.bp.blogspot.com/_iUr9qDslPzg/SgrsEWI9ZUI/AAAAAAAAABY/2KfTblHP3BA/s400/image001.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335336267980563778" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;What we can see here is that the Google SMP patch and use&lt;br /&gt;of tcmalloc makes a difference already on a 4-core server&lt;br /&gt;using 4 threads. On 1 and 2 threads the difference is only&lt;br /&gt;on the order of 1-2% so not really of smaller significance.&lt;br /&gt;&lt;br /&gt;An interesting note in the graph is that 8-core numbers using&lt;br /&gt;the Google improvements outperform the 12-core stock MySQL&lt;br /&gt;5.1.28.&lt;br /&gt;&lt;br /&gt;So what we concluded in those graphs is that the scaling from 4-cores&lt;br /&gt;to 8-cores had improved greatly and that there also was a good scaling&lt;br /&gt;from 8 cores to 12 cores. This improvement increased even more with&lt;br /&gt;the 5.4 release. The main purpose of showing these numbers is to show&lt;br /&gt;the difference between 4, 8 and 12 cores.&lt;br /&gt;&lt;br /&gt;All benchmarks were executed on a 16-core x86 box with 4 cores&lt;br /&gt;dedicated to running sysbench.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-9030950933264969421?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/9030950933264969421/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=9030950933264969421' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/9030950933264969421'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/9030950933264969421'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/analysis-of-google-patches-on-48-and-12.html' title='Analysis of Google patches on 4,8 and 12 cores'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_iUr9qDslPzg/SgrsEWI9ZUI/AAAAAAAAABY/2KfTblHP3BA/s72-c/image001.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-358421570173479692</id><published>2009-05-13T13:32:00.017+02:00</published><updated>2009-05-13T15:41:10.877+02:00</updated><title type='text'>Analysis of Google patches in MySQL 5.4</title><content type='html'>Early on in the MySQL 5.4 development we tried out the&lt;br /&gt;impact of the Google SMP patch and the Google IO patch.&lt;br /&gt;At first we wanted to see which of the patches that&lt;br /&gt;made most of an impact. The Google patches in MySQL 5.4&lt;br /&gt;have 3 components at least that impact the performance.&lt;br /&gt;1) Replace InnoDB memory manager by a malloc variant&lt;br /&gt;2) Replace InnoDB RW-lock implementation&lt;br /&gt;3) Make InnoDB use more IO threads&lt;br /&gt;&lt;br /&gt;When disabling the InnoDB one opens up for a whole array&lt;br /&gt;of potential candidates for malloc. Our work concluded&lt;br /&gt;that tcmalloc behaved best on Linux and mtmalloc was&lt;br /&gt;best on Solaris, see blog posts on Solaris below.&lt;br /&gt;&lt;a href="http://blogs.sun.com/timc/entry/scalability_and_stability_for_sysbench"&gt;&lt;br /&gt;Malloc on Solaris investigation&lt;/a&gt;&lt;br /&gt;&lt;a href="http://blogs.sun.com/timc/entry/mysql_5_1_memory_allocator"&gt;&lt;br /&gt;Battle of the Mallocators on Solaris&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I did also do some testing on Linux where I compared 4 different&lt;br /&gt;cases (all variants were based on MySQL 5.1.28):&lt;br /&gt;1) Using the Google SMP patch, Google IO patch (with 4 read and&lt;br /&gt;4 write threads) and using tcmalloc&lt;br /&gt;2) Using tcmalloc and no other Google patches&lt;br /&gt;3) Using plain malloc from libc&lt;br /&gt;4) Using plain MySQL 5.1.28 using InnoDB memory manager&lt;br /&gt;&lt;br /&gt;Here are the results:&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_iUr9qDslPzg/SgrHqe-_gqI/AAAAAAAAABQ/A5y4N5fWNPg/s1600-h/image001.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:left;cursor:pointer; cursor:hand;width: 500px; height: 344px;" src="http://1.bp.blogspot.com/_iUr9qDslPzg/SgrHqe-_gqI/AAAAAAAAABQ/A5y4N5fWNPg/s400/image001.gif" border="0" alt=""id="BLOGGER_PHOTO_ID_5335296241259479714" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;So as we can see here the replacement of the InnoDB memory manager&lt;br /&gt;by standard malloc had no benefits whereas replacing it with&lt;br /&gt;tcmalloc gave 10% extra performance. The Google SMP patch added&lt;br /&gt;another 10% performance in sysbench readwrite. We have also&lt;br /&gt;tested other OLTP benchmarks where the Google SMP patch added&lt;br /&gt;about 5-10% performance improvement. As shown by Mark Callaghan&lt;br /&gt;there are however other benchmarks where the Google SMP patch&lt;br /&gt;provides much greater improvements.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-358421570173479692?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/358421570173479692/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=358421570173479692' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/358421570173479692'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/358421570173479692'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/analysis-of-google-patches-in-mysql-54.html' title='Analysis of Google patches in MySQL 5.4'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_iUr9qDslPzg/SgrHqe-_gqI/AAAAAAAAABQ/A5y4N5fWNPg/s72-c/image001.gif' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-4971583817594841453</id><published>2009-05-12T19:08:00.003+02:00</published><updated>2009-05-12T19:40:54.990+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='innodb-thread-concurrency-timer-based'/><category scheme='http://www.blogger.com/atom/ns#' term='innodb-thread-concurrency'/><title type='text'>MySQL 5.4 Patches: InnoDB Thread Concurrency</title><content type='html'>When benchmarking MySQL with InnoDB we quickly discovered&lt;br /&gt;that using InnoDB Thread Concurrency set to 0 was an&lt;br /&gt;improvement to performance since the implementation of&lt;br /&gt;InnoDB Thread Concurrency used a mutex which in itself was&lt;br /&gt;a scalability bottleneck.&lt;br /&gt;&lt;br /&gt;Given that InnoDB Thread Concurrency is a nice feature that&lt;br /&gt;ensures that one gets good performance also on an overloaded&lt;br /&gt;server I was hoping to find a way to make the implementation&lt;br /&gt;of this more scalable.&lt;br /&gt;&lt;br /&gt;I tried out many different techniques using a combination of&lt;br /&gt;mutexes and atomic variables. However every technique fell to&lt;br /&gt;the ground and was less performant than setting it to 0 and not&lt;br /&gt;using the InnoDB Thread Concurrency implementation. So I was&lt;br /&gt;ready to give up the effort and move on to other ideas.&lt;br /&gt;&lt;br /&gt;However after sleeping on it an inspirational idea came up.&lt;br /&gt;Why use a mutex at all, let's see how it works by using the&lt;br /&gt;OS scheduler to queue the threads that need to blocked. This&lt;br /&gt;should be more scalable to use than a mutex-based approach.&lt;br /&gt;There is obviously one bad thing about this approach and this&lt;br /&gt;is due to that new arrivees can enter before old waiters. To&lt;br /&gt;ensure we don't suffer too much from this a limit on the wait&lt;br /&gt;was necessary.&lt;br /&gt;&lt;br /&gt;So I quickly put together a solution that called yield once&lt;br /&gt;and slept for 10 milliseconds twice at most and every time it&lt;br /&gt;woke up it was checking an atomic variable to see if it was ok&lt;br /&gt;to enter. After those three attempts it would enter without&lt;br /&gt;checking.&lt;br /&gt;&lt;br /&gt;I tried it and saw a 1% decrease on low concurrency and 5%&lt;br /&gt;improvement on 32 threads and 10% on 64 threads and 15% on 128&lt;br /&gt;threads. Voila, it worked. Now I decided to search for the&lt;br /&gt;optimal solution to see how many yields and sleeps would be best.&lt;br /&gt;It turned out I had found the optimal number at the first attempt.&lt;br /&gt;&lt;br /&gt;The implementation still has corner cases where it provides less&lt;br /&gt;benefits so I kept the possibility to use the old implementation by&lt;br /&gt;adding a new variable here.&lt;br /&gt;&lt;br /&gt;So currently the default in MySQL 5.4 is still 0 for InnoDB Thread&lt;br /&gt;Concurrency. However we generally see optimal behaviour using&lt;br /&gt;InnoDB Thread Concurrency set to around 24, setting it higher is&lt;br /&gt;not bringing any real value to MySQL 5.4.0 and setting it lower&lt;br /&gt;decreases the possible performance one can achieve. This seems&lt;br /&gt;to be a fairly generic set-up that should work well in most cases.&lt;br /&gt;We might change the defaults for this later.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-4971583817594841453?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/4971583817594841453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=4971583817594841453' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4971583817594841453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4971583817594841453'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/05/mysql-54-patches-innodb-thread.html' title='MySQL 5.4 Patches: InnoDB Thread Concurrency'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1681195242542923427</id><published>2009-04-23T16:19:00.005+02:00</published><updated>2009-05-19T15:02:08.490+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL Cluster'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel MySQL'/><title type='text'>Join Executor for MySQL Cluster</title><content type='html'>Jonas in the Cluster team reported on his work on executing&lt;br /&gt;joins in the NDB kernel for MySQL Cluster &lt;a href="http://jonasoreland.blogspot.com"&gt;here.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This is a very interesting work we have in progress at MySQL.&lt;br /&gt;We are working on an extension of the Storage Engine API&lt;br /&gt;where the MySQL Server will present an Abstract Query Tree&lt;br /&gt;to the Storage Engine. The Storage Engine can then decide to&lt;br /&gt;execute the query on his own or decide that the MySQL Server&lt;br /&gt;should execute it in the classic manner. In the first prototype&lt;br /&gt;the optimisation will be done as usual and only after the&lt;br /&gt;optimisation phase will we present the join to the storage&lt;br /&gt;engine. However the specification also covers work on &lt;br /&gt;integrating this with the optimiser and also enabling the&lt;br /&gt;possibility for the storage engine to execute parts of the&lt;br /&gt;query and not the entire one. The specification of this&lt;br /&gt;work can be found &lt;a href="http://forge.mysql.com/worklog/task.php?id=4292"&gt;here.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Jonas is working on the backend part for this interface in&lt;br /&gt;MySQL Cluster.&lt;br /&gt;&lt;br /&gt;What is interesting with pushing joins to the NDB kernel is that&lt;br /&gt;it becomes very easy to parallelize the join execution. So what&lt;br /&gt;will happen when this feature is ready is that MySQL Cluster&lt;br /&gt;will shine on join performance and enable very good&lt;br /&gt;performance on all sorts of application using SQL.&lt;br /&gt;&lt;br /&gt;The reason that MySQL Cluster can so easily parallelize the query&lt;br /&gt;execution of the join is due to the software architecture of the&lt;br /&gt;NDB kernel. The NDB kernel is entirely developed as a message&lt;br /&gt;passing architecture. So to start a thread of execution in the&lt;br /&gt;NDB kernel one simply sends two messages when executing one&lt;br /&gt;message and to stop a thread one simply doesn't send any messages&lt;br /&gt;when executing a message. The problem then is more on that one&lt;br /&gt;should not parallelize too much to run out of resources in the&lt;br /&gt;system.&lt;br /&gt;&lt;br /&gt;So with this development MySQL Cluster will also be shining at&lt;br /&gt;Data Mining in an OLTP database. MySQL Cluster is designed for&lt;br /&gt;systems where you need massive amounts of read and write&lt;br /&gt;bandwidth (the cost of writing your data is close to the cost&lt;br /&gt;of reading the data). So with the new features it will be&lt;br /&gt;possible to do Data Mining on data updated in Real-time. Most&lt;br /&gt;Data Mining is performed on a specialised Data Warehousing&lt;br /&gt;solution. But to achieve this you need to transfer the data to&lt;br /&gt;the Data Warehouse. With MySQL Cluster it will be possible to&lt;br /&gt;both use the database for OLTP applications with heavy updates&lt;br /&gt;always occuring while still querying the data with parallel&lt;br /&gt;queries in parallel. MySQL Cluster is very efficient at&lt;br /&gt;executing individual queries in the NDB kernel and can also&lt;br /&gt;scale to very many machines and CPU cores.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1681195242542923427?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1681195242542923427/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1681195242542923427' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1681195242542923427'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1681195242542923427'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/04/join-executor-for-mysql-cluster.html' title='Join Executor for MySQL Cluster'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6579912069600093405</id><published>2009-04-23T15:55:00.004+02:00</published><updated>2009-04-23T16:26:41.786+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL Cluster'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmarks'/><category scheme='http://www.blogger.com/atom/ns#' term='NDB'/><title type='text'>Data on MySQL Performance</title><content type='html'>If you like to sift through tons of benchmark data about various&lt;br /&gt;MySQL versions, Dimitri at the Sun Benchmark Labs have published&lt;br /&gt;a serious amount of benchmark data in a report published &lt;a href="http://dimitrik.free.fr/db_STRESS_MySQL_540_and_others_Apr2009.html"&gt;here.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The report shows that the new MySQL 5.4.0 release&lt;br /&gt;have a very good performance. The report also shows how the day&lt;br /&gt;of a developer of performance improvements and the massive amount&lt;br /&gt;of benchmark data that needs to be analysed and sifted through&lt;br /&gt;to understand the impact of new performance improvements.&lt;br /&gt;&lt;br /&gt;I personally met Dimitri the first time in 2002 when I was working&lt;br /&gt;together with him for a couple of weeks on a benchmark on NDB Cluster&lt;br /&gt;(the storage engine of MySQL Cluster). Our goal then was to perform&lt;br /&gt;1 million reads per second on a 72-cpu SPARC box with UltraSparc-III&lt;br /&gt;CPU's @900MHz. We struggled a lot at the time but finally we managed&lt;br /&gt;to achieve the numbers we were hoping for. We actually surpassed the&lt;br /&gt;goal and reached 1.5 million reads per second and we also tried an&lt;br /&gt;update benchmark where we managed to do 340.000 update transactions&lt;br /&gt;per second (generating a disk write bandwidth of 250 MByte per second).&lt;br /&gt;&lt;br /&gt;This benchmark was interesting from a scientific point of view. When&lt;br /&gt;I defended my Ph.D thesis I claimed that one could get superlinear&lt;br /&gt;performance increases when adding more CPU's to a problem in the&lt;br /&gt;database world. To achieve this the workload needs to be constant and the&lt;br /&gt;number of CPU's increased. By increasing the number of CPU's and keeping&lt;br /&gt;the workload constant more CPU cache memory is used on the problem.&lt;br /&gt;This means that each CPU will execute more efficiently.&lt;br /&gt;&lt;br /&gt;In the above benchmark we managed to verify my claim that I made when&lt;br /&gt;defending my Ph.D thesis which I found very positive. The results we&lt;br /&gt;achieved on a 16-node cluster was 500.000 reads per second and on a&lt;br /&gt;32-node cluster we reached 1.500.000 reads per second.&lt;br /&gt;&lt;br /&gt;Dimitri has a background from the past of developing his own&lt;br /&gt;homegrown database, so we have had many interesting discussions both&lt;br /&gt;then and now on how to achieve the best performance of NDB and&lt;br /&gt;the MySQL Server.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6579912069600093405?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6579912069600093405/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6579912069600093405' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6579912069600093405'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6579912069600093405'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/04/data-on-mysql-performance.html' title='Data on MySQL Performance'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-4503367989389010935</id><published>2009-04-22T15:37:00.003+02:00</published><updated>2009-05-19T15:02:31.426+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='benchmarks'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>MySQL 5.4 Patches: Improvements to spin-loop</title><content type='html'>In InnoDB there is an implementation of both mutexes&lt;br /&gt;and RW-locks. The RW-locks implementation have been&lt;br /&gt;improved by the Google SMP patches. Both of these&lt;br /&gt;implementation relies on spin-loops as part of their&lt;br /&gt;implementation. The defaults in InnoDB is to check&lt;br /&gt;the condition, if it's not ok to enter to spin for&lt;br /&gt;about 5 microseconds and then come back to check the&lt;br /&gt;condition again.&lt;br /&gt;&lt;br /&gt;If one reads the Intel manual how to do spin-loops&lt;br /&gt;they propose to use a PAUSE instruction and then&lt;br /&gt;check the condition again, so a much more active&lt;br /&gt;checking of the condition. When we tried this out&lt;br /&gt;using the sysbench benchmark we found that using&lt;br /&gt;the Intel approach worsened performance. So instead&lt;br /&gt;we tried an approach of putting the PAUSE instruction&lt;br /&gt;into the InnoDB spinloop instead.&lt;br /&gt;&lt;br /&gt;This approach turned out to be a success. Even on&lt;br /&gt;machines with only one thread per core we were able&lt;br /&gt;to get a 3-4% increase in throughput. We also tried&lt;br /&gt;various settings of the defaults of the time of&lt;br /&gt;spinning in the spinloop and found that the original&lt;br /&gt;default values were very close to the optimum values.&lt;br /&gt;We found the optimum about 20% from the old default&lt;br /&gt;values and made this slight change to the default&lt;br /&gt;values of the spinloop.&lt;br /&gt;&lt;br /&gt;It's my expectation that as we remove locks and the&lt;br /&gt;mutexes and RW-locks gets less contended and there&lt;br /&gt;are more locks where the threads are waiting that&lt;br /&gt;this optimum value will change. The current best&lt;br /&gt;setting is very likely to be governed by the fact&lt;br /&gt;that the most waiting happens on very hot locks.&lt;br /&gt;So with improvements of the mutexes and RW-locks&lt;br /&gt;we should expect to see better performance with&lt;br /&gt;a shorter time in the spinloop.&lt;br /&gt;&lt;br /&gt;On the new SPARC CPU's that Sun has developed, the&lt;br /&gt;CMT boxes, we used the results from the paper:&lt;br /&gt;www.ideal.ece.ufl.edu/workshops/wiosca08/paper2.pdf&lt;br /&gt;which stated that the optimum instruction to use&lt;br /&gt;is a cache miss instruction, however as I don't&lt;br /&gt;know how to program a cache miss instruction we&lt;br /&gt;opted for the second best instruction which was a&lt;br /&gt;dummy test-and-set instruction. So the PAUSE&lt;br /&gt;instruction is replaced by a test-and-set instruction&lt;br /&gt;on SPARC CPU's.&lt;br /&gt;&lt;br /&gt;We expect that the improvements due to this small&lt;br /&gt;change is even bigger when there are multiple&lt;br /&gt;threads per core since the contention on the&lt;br /&gt;CPU pipeline is higher in those cases and it is&lt;br /&gt;important that the spinloop stays away as much&lt;br /&gt;as possible from being active executing&lt;br /&gt;instructions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-4503367989389010935?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/4503367989389010935/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=4503367989389010935' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4503367989389010935'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4503367989389010935'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/04/mysql-54-patches-improvements-to-spin.html' title='MySQL 5.4 Patches: Improvements to spin-loop'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2801205697891913316</id><published>2009-04-21T16:46:00.004+02:00</published><updated>2009-04-21T18:07:25.112+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='Solaris'/><category scheme='http://www.blogger.com/atom/ns#' term='InnoDB'/><title type='text'>MySQL 5.4 Scaling to 16 way x86 and 64-way CMT Servers</title><content type='html'>The release of the MySQL 5.4 contains patches which&lt;br /&gt;increases the scalability of the MySQL Server. I am planning to blog&lt;br /&gt;about those changes in some detail over the next few days. This blog&lt;br /&gt;will give an introduction and show what the overall results we have&lt;br /&gt;achieved are.&lt;br /&gt;&lt;br /&gt;The changes we have done in MySQL 5.4 to improve scalability and&lt;br /&gt;the ability to monitor the MySQL Server are:&lt;br /&gt;&lt;br /&gt;1) Google SMP patch&lt;br /&gt;2) Google IO patches&lt;br /&gt;3) Update of many antiquated defaults in the MySQL Server&lt;br /&gt;4) New InnoDB Thread Concurrency algorithm&lt;br /&gt;5) Improved Spinloop in InnoDB mutexes and RW-locks&lt;br /&gt;6) A couple of performance fixes backported from 6.0&lt;br /&gt;7) Operating system specific optimisations&lt;br /&gt;8) Ported the Google SMP patch to Solaris x86 and SPARC and work&lt;br /&gt;underway for Windows and Intel compiler as well&lt;br /&gt;9) Introducing DTrace probes in the MySQL Server&lt;br /&gt;10) A build script to make it easier for community to build an efficient&lt;br /&gt;MySQL Server based on source code&lt;br /&gt;&lt;br /&gt;As an example of the improvements made available through this work we&lt;br /&gt;have some benchmarks using sysbench readwrite and readonly.&lt;br /&gt;&lt;br /&gt;We have consistently seen improvements in the order of 30-40% of&lt;br /&gt;sysbench top numbers and on large number of threads 5.4.0 drops&lt;br /&gt;much less in performance than 5.1. The new InnoDB Thread Concurrency&lt;br /&gt;patch makes the results on high number of threads even more&lt;br /&gt;impressive where the results have gone up by another 5-15% at the&lt;br /&gt;expense of 1% less on the top results (there are even some DBT2&lt;br /&gt;runs that gave 200% improvement with the new algorithm).&lt;br /&gt;&lt;br /&gt;There is also a benchmark on EAStress which shows a 59% increase in&lt;br /&gt;performance from 5.1 to 5.4 using the new 16-way x86 Nehalem servers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2801205697891913316?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2801205697891913316/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2801205697891913316' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2801205697891913316'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2801205697891913316'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/04/mysql-54-scaling-to-16-way-x86-and-64.html' title='MySQL 5.4 Scaling to 16 way x86 and 64-way CMT Servers'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3089686053719229836</id><published>2009-04-21T16:45:00.001+02:00</published><updated>2009-04-21T18:07:41.723+02:00</updated><title type='text'>MySQL Cluster 7.0 scales linearly in two dimensions</title><content type='html'>As recently reported on my blog we have managed to get MySQL Cluster CGE 6.3 to scale linearly with the addition of more nodes into the system.&lt;br /&gt;In MySQL Cluster CGE 6.3 each node has a single thread handling most of&lt;br /&gt;the work together with a set of file system threads.&lt;br /&gt;&lt;br /&gt;In MySQL Cluster 7.0 the data nodes are now multithreaded. The design in&lt;br /&gt;7.0 follows the very efficient design of 6.3 where each thread has absolutely no lock contention with other threads. All communication&lt;br /&gt;between threads happens through messages. This means that scalability&lt;br /&gt;of the data nodes is excellent. The single thread have been split into&lt;br /&gt;up to four local data threads, one transaction handling threads,&lt;br /&gt;and one socket communication thread plus the already existing file&lt;br /&gt;system threads. With this set-up each data node can process 4.6X more&lt;br /&gt;DBT2 transactions compared to 6.3.&lt;br /&gt;&lt;br /&gt;This means that a 2-node cluster in 7.0 has the same performance as a&lt;br /&gt;10-node cluster for 6.3 and a 4-node cluster similar performance to a&lt;br /&gt;20-node cluster in 6.3. As earlier blogged each data node can handle&lt;br /&gt;many hundreds of thousands of operations per second, so a cluster of&lt;br /&gt;such nodes can handle many millions of operations per second.&lt;br /&gt;&lt;br /&gt;The efficiency of the data node is such that one data node can handle&lt;br /&gt;the traffic from a set of MySQL Servers residing on a 24-core MySQL&lt;br /&gt;Server. So an example of a basic set-up for the MySQL Cluster 7.0 is&lt;br /&gt;to use 2 8-core boxes with lots of memory and lots of disk bandwidth&lt;br /&gt;for the data nodes. To use 2 24-core servers for the MySQL Servers that&lt;br /&gt;mostly require CPU and networking bandwidth.&lt;br /&gt;&lt;br /&gt;An important consideration for setting up a MySQL Cluster 7.0 is to&lt;br /&gt;ensure that interrupts from the network stack doesn't kill performance&lt;br /&gt;and also to have separate network infrastructure between the data&lt;br /&gt;nodes in the cluster since it is very easy to overload the network&lt;br /&gt;given the capabilities of the MySQL Cluster software.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3089686053719229836?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3089686053719229836/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3089686053719229836' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3089686053719229836'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3089686053719229836'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/04/mysql-cluster-70-scales-linearly-in-two.html' title='MySQL Cluster 7.0 scales linearly in two dimensions'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3320573581955884458</id><published>2009-04-21T16:42:00.004+02:00</published><updated>2009-04-21T18:07:55.920+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><title type='text'>MySQL 5.4 Acknowledgements</title><content type='html'>The work started when MySQL was acquired by Sun has now started to bear&lt;br /&gt;fruit. Very soon after the acquisition a Sun team was formed to assist&lt;br /&gt;the MySQL performance team on improving the scalability of the MySQL&lt;br /&gt;server. At the same time also Google have been very active in improving&lt;br /&gt;scalability of InnoDB. MySQL 5.4 scalability improvements is very much&lt;br /&gt;the result of the efforts from the MySQL Performance team, the Sun&lt;br /&gt;performance team and the Google efforts.&lt;br /&gt;&lt;br /&gt;It's extremely fruitful to work with such a competent set of people. The&lt;br /&gt;Sun team has experience from scaling Oracle, DB2, Informix and so forth&lt;br /&gt;and knows extremely well how the interaction of software and hardware&lt;br /&gt;affects performance. The Google patches have shown themselves to be of&lt;br /&gt;excellent quality. From our internal testing we found two bugs in the&lt;br /&gt;early testing and both those had already been fixed by the Google team&lt;br /&gt;and so turnaround time was a day or two. For the last months we haven't&lt;br /&gt;found any issues. The MySQL performance team have also been able to add&lt;br /&gt;a few small but effective improvements on top of the Google patches.&lt;br /&gt;&lt;br /&gt;MySQL 5.4 also introduces DTrace support in the MySQL Server. This code&lt;br /&gt;is a result of a cooperation with the MySQL 6.0 development team, the&lt;br /&gt;original patch was developed for 6.0. We have spent quite some time&lt;br /&gt;on getting the DTrace support working on all variants of Solaris and&lt;br /&gt;Mac OS X platforms. For anyone interested in getting DTrace probes into&lt;br /&gt;their application I think the MySQL example is probably the most&lt;br /&gt;advanced example currently available on user-level DTrace probes and&lt;br /&gt;building such DTrace probes into a complex build system.&lt;br /&gt;&lt;br /&gt;Working with competent and motivated people is always great fun, so this&lt;br /&gt;has been a very rewarding project for me personally. I have always liked&lt;br /&gt;to work on performance improvements, in my work on founding&lt;br /&gt;MySQL Cluster we were involved in many such efforts and so far they have&lt;br /&gt;almost always been successful. Actually we're releasing a new version&lt;br /&gt;of MySQL Cluster 7.0 now as well with its own set of extreme performance&lt;br /&gt;improvements which I will mention in a separate blog.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3320573581955884458?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3320573581955884458/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3320573581955884458' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3320573581955884458'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3320573581955884458'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/04/mysql-54-acknowledgements.html' title='MySQL 5.4 Acknowledgements'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3836239033689543477</id><published>2009-01-17T19:51:00.003+01:00</published><updated>2009-05-19T15:02:52.863+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='iClaustron'/><category scheme='http://www.blogger.com/atom/ns#' term='DBT2'/><title type='text'>New DBT2 version uploaded with more documentation of new scripts</title><content type='html'>I have had a number of request for help on how to use the DBT2&lt;br /&gt;tree I'm maintaining on www.iclaustron.com. There is an extensive&lt;br /&gt;set of scripts used to make it very easy to run DBT2 runs and&lt;br /&gt;to start and stop cluster nodes and MySQL Servers. I personally&lt;br /&gt;use it also to start MySQL Servers and clusters also when not&lt;br /&gt;using DBT2.&lt;br /&gt;&lt;br /&gt;However these scripts haven't had an overall description yet&lt;br /&gt;although each component is very thoroughly documented by&lt;br /&gt;using --help on the scripts (I tend to document very&lt;br /&gt;heavily these things since I otherwise forget it myself).&lt;br /&gt;&lt;br /&gt;Now I added a new README file README-ICLAUSTRON which&lt;br /&gt;explains which scripts are used and their relation&lt;br /&gt;and which configuration files to set-up and a&lt;br /&gt;pointer to example configuration files.&lt;br /&gt;&lt;br /&gt;Hopefully this will make it easier to use DBT2,&lt;br /&gt;particularly DBT2 for MySQL Cluster.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3836239033689543477?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3836239033689543477/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3836239033689543477' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3836239033689543477'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3836239033689543477'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2009/01/new-dbt2-version-uploaded-with-more.html' title='New DBT2 version uploaded with more documentation of new scripts'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7551605923341831064</id><published>2008-12-03T23:43:00.004+01:00</published><updated>2008-12-03T23:57:52.140+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='LOCK_open'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><title type='text'>LOCK_open, THE mutex :)</title><content type='html'>In all my days at working at MySQL the&lt;br /&gt;LOCK_open mutex have always been a&lt;br /&gt;key mutex to understand, now that I'm&lt;br /&gt;working on scalability improvements of&lt;br /&gt;the server it's as important to change&lt;br /&gt;this mutex into something less contentious.&lt;br /&gt;&lt;br /&gt;So last week I finally decided to start&lt;br /&gt;thinking about how we can resolve this&lt;br /&gt;mutex which is at the heart of the MySQL&lt;br /&gt;Server. In principle the idea is that&lt;br /&gt;LOCK_open has been used to protect a&lt;br /&gt;hash table with all the open tables in&lt;br /&gt;the MySQL Server. However it has been&lt;br /&gt;used for many other purposes as well.&lt;br /&gt;So it's not trivial to move around it.&lt;br /&gt;&lt;br /&gt;However the main scalability problem&lt;br /&gt;with LOCK_open is the hash lock it&lt;br /&gt;provides. So what to do about it?&lt;br /&gt;&lt;br /&gt;My current thinking is that a four-thronged&lt;br /&gt;approach will do the trick.&lt;br /&gt;&lt;br /&gt;1) Divide and concquer, perform the hash calculation&lt;br /&gt;outside of the mutex and divide the hash into&lt;br /&gt;e.g. 16 smaller hashes. This creates one problem&lt;br /&gt;which is how to prune the open table cache.&lt;br /&gt;Obviously there is no longer a simple linked&lt;br /&gt;list where I can find the oldest entry. This&lt;br /&gt;problem I'm still contemplating, there's&lt;br /&gt;probably already a number of known good solutions&lt;br /&gt;to this problem since I find it popping up in&lt;br /&gt;almost every similar design. So it's a problem&lt;br /&gt;looking for a solution pattern.&lt;br /&gt;&lt;br /&gt;2) Shrink the amount of data it protects&lt;br /&gt;by only allowing it to protect the hash table&lt;br /&gt;and nothing more. This means e.g. that some&lt;br /&gt;counters need to be updated with atomic&lt;br /&gt;instructions instead.&lt;br /&gt;&lt;br /&gt;3) Shrink the time it is protected by inserting&lt;br /&gt;the table share into the hash rather than the&lt;br /&gt;table object (this is actually Monty's idea).&lt;br /&gt;&lt;br /&gt;4) Use a different technique for the lock that&lt;br /&gt;works better for short-term locks (usually&lt;br /&gt;spinlocks are more successful here).&lt;br /&gt;&lt;br /&gt;A combination of these techniques will hopefully&lt;br /&gt;make it possible to decrese the impact of&lt;br /&gt;LOCK_open on the server code.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7551605923341831064?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7551605923341831064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7551605923341831064' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7551605923341831064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7551605923341831064'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/12/lockopen-mutex.html' title='LOCK_open, THE mutex :)'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7212456625841680302</id><published>2008-12-03T21:12:00.002+01:00</published><updated>2008-12-03T21:40:54.225+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='partitioning'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='crash recovery'/><category scheme='http://www.blogger.com/atom/ns#' term='ALTER TABLE'/><title type='text'>Recovery features for ALTER TABLE of partitioned tables</title><content type='html'>A feature which hasn't been so public about the implementation&lt;br /&gt;of partitioning is the support for atomicity of many ALTER TABLE&lt;br /&gt;statements using partitioned tables.&lt;br /&gt;&lt;br /&gt;This atomicity exists for&lt;br /&gt;ALTER TABLE ADD PARTITION ....&lt;br /&gt;ALTER TABLE REORGANIZE PARTITION ...&lt;br /&gt;ALTER TABLE DROP PARTITION ...&lt;br /&gt;ALTER TABLE COALESCE PARTITION&lt;br /&gt;&lt;br /&gt;Given that partitioning often works with very large tables it&lt;br /&gt;was desirable to have a higher level of security for ALTER TABLE&lt;br /&gt;of partitioned tables. To support this a DDL log was implemented.&lt;br /&gt;This DDL log will in future versions be used also for many other&lt;br /&gt;meta data statements. The DDL log will record all files added,&lt;br /&gt;renamed and dropped during an ALTER TABLE command as above.&lt;br /&gt;&lt;br /&gt;The design is done in such a way that the ALTER TABLE will either&lt;br /&gt;fail and then all temporary files will be removed (even in the&lt;br /&gt;presence of crashes of MySQL Server). Otherwise the ALTER TABLE&lt;br /&gt;will succeed even if not all old files have been removed at&lt;br /&gt;the time of crash. The DDL log will be checked at restart of&lt;br /&gt;MySQL Server and will REDO or UNDO all necessary changes to&lt;br /&gt;complete the ALTER TABLE statement.&lt;br /&gt;&lt;br /&gt;Given that MySQL Server crashes isn't likely to happen very often&lt;br /&gt;in customer environments it was also desirable to add error&lt;br /&gt;injection to the MySQL Server for testing purposes.&lt;br /&gt;&lt;br /&gt;Here is a short cut from the file sql_partition.cc that displays&lt;br /&gt;what happens here:&lt;br /&gt;&lt;br /&gt;    if (write_log_drop_shadow_frm(lpt) ||&lt;br /&gt;        ERROR_INJECT_CRASH("crash_drop_partition_1") ||&lt;br /&gt;        mysql_write_frm(lpt, WFRM_WRITE_SHADOW) ||&lt;br /&gt;        ERROR_INJECT_CRASH("crash_drop_partition_2") ||&lt;br /&gt;        write_log_drop_partition(lpt) ||&lt;br /&gt;        ERROR_INJECT_CRASH("crash_drop_partition_3") ||&lt;br /&gt;&lt;br /&gt;At each ERROR_INJECT_CRASH it is possible to prepare&lt;br /&gt;MySQL Server such that it will crash at this point in&lt;br /&gt;the next statement using dbug statements that can&lt;br /&gt;be issued also as SQL statements now.&lt;br /&gt;&lt;br /&gt;So here one can see that we first log preparatory&lt;br /&gt;actions, insert a test point, continue with the&lt;br /&gt;next step of ALTER TABLE, insert a new test point,&lt;br /&gt;write the next log entry, insert new test point,&lt;br /&gt;and so forth.&lt;br /&gt;&lt;br /&gt;With this recovery mechanism the new ALTER TABLE&lt;br /&gt;statements should not cause problems with the&lt;br /&gt;partitioned table after the ALTER TABLE even in&lt;br /&gt;the presence of crashes in the middle of the&lt;br /&gt;ALTER TABLE statement.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7212456625841680302?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7212456625841680302/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7212456625841680302' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7212456625841680302'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7212456625841680302'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/12/recovery-features-for-alter-table-of.html' title='Recovery features for ALTER TABLE of partitioned tables'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-820828376576741830</id><published>2008-11-25T22:26:00.006+01:00</published><updated>2008-11-25T23:35:26.103+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='DX'/><category scheme='http://www.blogger.com/atom/ns#' term='Dolphin'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL Cluster'/><category scheme='http://www.blogger.com/atom/ns#' term='SSD'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel MySQL'/><title type='text'>Impressive numbers of Next Gen MySQL Cluster</title><content type='html'>I had a very interesting conversation on the phone with Jonas&lt;br /&gt;Oreland today (he also blogged about it on his blog at&lt;br /&gt;&lt;a href="http://jonasoreland.blogspot.com"&gt;http://jonasoreland.blogspot.com&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;There is a lot of interesting features coming up in MySQL Cluster&lt;br /&gt;version 6.4. Online Add Node is one of those, which can be done&lt;br /&gt;without any downtime and even with almost no additional memory&lt;br /&gt;needed other than the memory in the new machines added into the&lt;br /&gt;cluster. This is a feature I started thinking almost 10 years ago&lt;br /&gt;so it's nice to see the fourth version of the solution actually be&lt;br /&gt;implemented and it's a really neat solution to the problem,&lt;br /&gt;definitely fitting the word innovative.&lt;br /&gt;&lt;br /&gt;The next interesting feature is to use a more efficient protocol&lt;br /&gt;for handling large operations towards the data nodes. This makes it&lt;br /&gt;use less bits on the wire, but even more it saves a number of copy&lt;br /&gt;stages internally in the NDB data nodes. So this has a dramatic&lt;br /&gt;effect on performance of reads and writes of large records. It&lt;br /&gt;doubles the throughput for large records.&lt;br /&gt;&lt;br /&gt;In addition the new 6.4 version also adds multithreading to the&lt;br /&gt;data nodes. Previously the data nodes was a very efficient single&lt;br /&gt;thread which handled all the code blocks and also the send and&lt;br /&gt;receive handling. In the new 6.4 version the data nodes are split&lt;br /&gt;into at least 4 threads for database handling, one thread for send&lt;br /&gt;and receive and the usual assistance threads for file writes and&lt;br /&gt;so forth. This means that a data node will fit nicely into a 8-core&lt;br /&gt;server since also 1-2 cpu's are required for interrupt handling and&lt;br /&gt;other operating system activity.&lt;br /&gt;&lt;br /&gt;Jonas benchmarked using a benchmark from our popular flex-series&lt;br /&gt;of benchmark. It started with that I developed flexBench more&lt;br /&gt;than 10 years ago, it's been followed by flexAsynch, flexTT and a&lt;br /&gt;lot more variants of the same type. It can vary the number of&lt;br /&gt;threads, the size of the records, the number of operations per&lt;br /&gt;batch per thread and a number of other things. flexAsynch is&lt;br /&gt;really good at generating extremely high loads to the database&lt;br /&gt;without doing anything useful itself :)&lt;br /&gt;&lt;br /&gt;So what Jonas demonstrated today was a flexAsynch run where he&lt;br /&gt;managed to do more than 1 million reads per second using only&lt;br /&gt;one data node. MySQL Cluster is a clustered system so you can&lt;br /&gt;guess what happens when we have 16, 32 or 48 of those nodes&lt;br /&gt;tied together. It will do many tens of millions of reads per&lt;br /&gt;second. An interesting take on this is an article in&lt;br /&gt;Datateknik 3.0 (a magazine no longer around) where I was&lt;br /&gt;discussing how we had reached or was about to reach 1 million&lt;br /&gt;reads per second. I think&lt;br /&gt;this was sometime 2001 or 2002. I was asked where we&lt;br /&gt;were going next and I said that 100 million reads per&lt;br /&gt;second was the next goal. We're actually in range of&lt;br /&gt;achieving this now since I also have a patch lying&lt;br /&gt;around which can increase the number of data nodes&lt;br /&gt;in the cluster to 128 data nodes whereby with good&lt;br /&gt;scalability a 100 million reads per second per&lt;br /&gt;cluster is achievable.&lt;br /&gt;&lt;br /&gt;When Jonas called he had achieved 950k reads and then I told&lt;br /&gt;him to try out using the Dolphin DX cards which were also&lt;br /&gt;available on the machines. Then we managed to increase the&lt;br /&gt;performance to inch over 1 million upto 1.070.000.&lt;br /&gt;Quite nice. Maybe even more impressive that it also was&lt;br /&gt;possible to do more 600.000 write operations per second&lt;br /&gt;(these are all transactional).&lt;br /&gt;&lt;br /&gt;This run of flexAsynch was focused on seeing how many operations&lt;br /&gt;per second one could get through. I then decided I was&lt;br /&gt;interested in seeing also how much bandwidth we could handle&lt;br /&gt;in the system. So we changed the record size from 8 bytes to&lt;br /&gt;2000 Bytes. When trying it out with Gigabit Ethernet we reached&lt;br /&gt;60,000 reads and 55.000 inserts/updates per second. A quick&lt;br /&gt;calculation shows that we're doing almost 120 MBytes of reads&lt;br /&gt;and 110 MBytes of writes to the data node. This is obviously&lt;br /&gt;where the limit of Gigabit Ethernet goes so an easy catch of&lt;br /&gt;the bottleneck.&lt;br /&gt;&lt;br /&gt;Then we tried the same thing using the Dolphin DX cards. We got&lt;br /&gt;250.000 reads per second and more than 200.000 writes per&lt;br /&gt;second. This corresponds to almost 500 MBytes per second of&lt;br /&gt;reads from the database and more than 400 MBytes of writes to&lt;br /&gt;the data nodes.&lt;br /&gt;&lt;br /&gt;I had to check whether this was actually the limit of the set-up&lt;br /&gt;I had for the Dolphin cards (they can be set-up to use either&lt;br /&gt;x4 or x8 on the PCI Express). Interestingly enough after working&lt;br /&gt;in various ways with Dolphin cards for 15 years it's the first&lt;br /&gt;time I really cared about the bandwidth it could chunk through.&lt;br /&gt;The performance of MySQL Cluster have never been close to&lt;br /&gt;saturating the Dolphin links in the past.&lt;br /&gt;&lt;br /&gt;However today we managed to saturate the links. The maximum&lt;br /&gt;bandwidth achievable by a microbenchmark with a single process&lt;br /&gt;was 510 MBytes per second and we achieved almost 95% of this&lt;br /&gt;number. Very impressive indeed I think. What's even more&lt;br /&gt;interesting is that the Dolphin card used the x4 configuration&lt;br /&gt;so it can actually do 2x the bandwidth in the x8 setting and&lt;br /&gt;the CPU's were fairly lightly loaded on the system so it's&lt;br /&gt;likely that we could come very close to saturating the load&lt;br /&gt;even using a x8 configuration of the Dolphin cards. So that's&lt;br /&gt;a milestone to me, that MySQL Cluster have managed to&lt;br /&gt;saturate even the bandwidth of a cluster interconnect with&lt;br /&gt;very decent bandwidth.&lt;br /&gt;&lt;br /&gt;This actually imposes an interesting database recovery&lt;br /&gt;solution problem into the MySQL Cluster architecture. How&lt;br /&gt;does one handle 1 GBytes of writes to each data node in&lt;br /&gt;the system when used with persistent tables which has&lt;br /&gt;to be checkpointed and logged to disk. This requires&lt;br /&gt;bandwidth to the disk subsystem in multiple GBytes per&lt;br /&gt;second. It's only reasonable to even consider doing this&lt;br /&gt;with the upcoming new high-performance SSD drives. I&lt;br /&gt;heard an old colleague nowadays working for a disk&lt;br /&gt;company mention that he had demonstrated 6 GBytes&lt;br /&gt;per second to local disks, so this actually is a&lt;br /&gt;very nice fit. Turns out that this problem can also be&lt;br /&gt;solved.&lt;br /&gt;&lt;br /&gt;Actually SSD drives is also a very nice fit with also&lt;br /&gt;the disk data part of MySQL Cluster. Here it makes all&lt;br /&gt;the sense in the world to use SSD drives as the place&lt;br /&gt;to put the tablespaces for the disk part of MySQL&lt;br /&gt;Cluster. This way also the disk data becomes part of&lt;br /&gt;the real-time system and you can fairly easy build a&lt;br /&gt;terabyte database with an exceedingly high&lt;br /&gt;performance. Maybe this is to some extent a reply&lt;br /&gt;Mark Callaghans request for a data warehouse based&lt;br /&gt;on MySQL Cluster &lt;a href="http://mysqlha.blogspot.com/2008/10/mysql-conference-proposals-that-i-want.html"&gt;link&lt;/a&gt;. Not that we really focused so&lt;br /&gt;much on it, but the parallelism and performance&lt;br /&gt;available in a large MySQL Cluster based on 6.4 will&lt;br /&gt;be breathtaking even to me with 15 years of thinking&lt;br /&gt;into this behind me. A final word on this is that&lt;br /&gt;we are actually also working on a parallel query&lt;br /&gt;capability towards MySQL Cluster. This is going to&lt;br /&gt;based on some new advanced additions to the storage&lt;br /&gt;engine interface we're currently working on&lt;br /&gt;(Pushdown Query Fragment for those that joined the&lt;br /&gt;storage engine summit at Google in April this year).&lt;br /&gt;&lt;br /&gt;A nice thing with being part of Sun is that they're&lt;br /&gt;building the HW which is required to build these&lt;br /&gt;very large systems and are very interested in doing&lt;br /&gt;showcases for them. So all the technology to do&lt;br /&gt;what has been discussed above is available within&lt;br /&gt;Sun.&lt;br /&gt;&lt;br /&gt;Sorry for writing a very long blog. I know it's&lt;br /&gt;better to write short and to the point blogs,&lt;br /&gt;however I found so many interesting tilts on the&lt;br /&gt;subject.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-820828376576741830?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/820828376576741830/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=820828376576741830' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/820828376576741830'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/820828376576741830'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/11/impressive-numbers-of-next-gen-mysql.html' title='Impressive numbers of Next Gen MySQL Cluster'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-4599467540372418892</id><published>2008-11-24T11:57:00.003+01:00</published><updated>2008-11-24T12:38:54.559+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='poll'/><category scheme='http://www.blogger.com/atom/ns#' term='eventports'/><category scheme='http://www.blogger.com/atom/ns#' term='epoll'/><category scheme='http://www.blogger.com/atom/ns#' term='kqueue'/><category scheme='http://www.blogger.com/atom/ns#' term='IO Completion'/><title type='text'>Poll set to handle poll, eventports, epoll, kqueue and Windows IO Completion</title><content type='html'>This blog describes background and implementation of a poll&lt;br /&gt;set to monitor many sockets in one or several receive&lt;br /&gt;threads. The blog is intended as a description but also to&lt;br /&gt;enable some feedback on the design.&lt;br /&gt;&lt;br /&gt;I've been spending some time working out all the gory details&lt;br /&gt;of starting and stopping lots of send threads, connection&lt;br /&gt;threads and receive threads over the last few months.&lt;br /&gt;&lt;br /&gt;The receive threads will be monitoring a number of socket&lt;br /&gt;connections and as soon as data is available ensure that&lt;br /&gt;the data is received and forwarded to the proper user&lt;br /&gt;thread for execution.&lt;br /&gt;&lt;br /&gt;However listening on many sockets is a problem which needs&lt;br /&gt;a scalable solution. Almost every operating system on the&lt;br /&gt;planet has some solution to this problem. The problem is&lt;br /&gt;that they all have different solutions. So I decided to&lt;br /&gt;make a simple interface to those socket monitoring solutions.&lt;br /&gt;&lt;br /&gt;First my requirements. I will handle a great number of socket&lt;br /&gt;connections from each thread, I will have the ability to&lt;br /&gt;move socket connections from receive thread to another thread&lt;br /&gt;to dynamically adapt to the usage scenarios. However mostly&lt;br /&gt;the receive thread will wait for events on a set of socket&lt;br /&gt;connections, handle them as they arrive and go back waiting&lt;br /&gt;for more events. On the operating system side I aim at&lt;br /&gt;supporting Linux, OpenSolaris, Mac OS X, FreeBSD and Windows.&lt;br /&gt;&lt;br /&gt;One receive thread might be required to listen to socket&lt;br /&gt;connections from many clusters. So there is no real limit&lt;br /&gt;to the number of sockets a receive thread can handle. I&lt;br /&gt;decided however to put a compile time limit in there since&lt;br /&gt;e.g. epoll requires this at create time. This is currently&lt;br /&gt;set to 1024. So if more sockets are needed another receive&lt;br /&gt;thread is needed even if not needed from a performance&lt;br /&gt;point of view.&lt;br /&gt;&lt;br /&gt;The implementation aims to cover 5 different implementations.&lt;br /&gt;&lt;br /&gt;epoll&lt;br /&gt;-----&lt;br /&gt;epoll interface is a Linux-only interface which uses&lt;br /&gt;epoll_create to create an epoll file descriptor, then&lt;br /&gt;epoll_ctl is used to add/drop file descriptors to the epoll&lt;br /&gt;set. Finally epoll_wait is used to wait on the events to&lt;br /&gt;arrive. Socket connections remain in the epoll set as&lt;br /&gt;long as they are not explicitly removed or closed.&lt;br /&gt;&lt;br /&gt;poll&lt;br /&gt;----&lt;br /&gt;Poll is the standard which is there simply to make sure it&lt;br /&gt;works also on older platforms that have none of the other&lt;br /&gt;mechanisms supported. Here there is only one system call,&lt;br /&gt;the poll-call and all the state of the poll set needs to&lt;br /&gt;be taken care of by this implementation.&lt;br /&gt;&lt;br /&gt;kqueue&lt;br /&gt;------&lt;br /&gt;kqueue achieves more or less the same thing as epoll, it does&lt;br /&gt;so however with a more complex interface that can support a&lt;br /&gt;lot more things such as polling for completed processes and&lt;br /&gt;i-nodes and so forth. It has a kqueue-method to create the&lt;br /&gt;kqueue file descriptor and then a kevent call which is used&lt;br /&gt;both to add, drop and listen to events on the kqueue socket.&lt;br /&gt;kqueue exists in BSD OS:s such as FreeBSD and Mac OS X.&lt;br /&gt;&lt;br /&gt;eventports&lt;br /&gt;----------&lt;br /&gt;eventports is again a very similar implementation to epoll which&lt;br /&gt;has the calls port_create to create the eventport file descriptor.&lt;br /&gt;It has a port_associate call to add a socket to the eventport set.&lt;br /&gt;It has a port_dissociate call to drop a socket from the set.&lt;br /&gt;It has a port_getn call to wait on the events arriving. There is&lt;br /&gt;however a major difference in that after an event arriving in a&lt;br /&gt;port_getn call the socket is removed from the set and has to be&lt;br /&gt;added back. From an implementation point of view this mainly&lt;br /&gt;complicated my design of error handling.&lt;br /&gt;&lt;br /&gt;Windows IO Completion&lt;br /&gt;---------------------&lt;br /&gt;I have only skimmed this yet and it differs mainly in being a tad&lt;br /&gt;complex and also in that events from the set can be distributed to&lt;br /&gt;more than one thread. However this feature will not be used in this&lt;br /&gt;design, also I have currently not implemented this yet, I need to&lt;br /&gt;get all the bits together on building on Windows done first.&lt;br /&gt;&lt;br /&gt;Implementation&lt;br /&gt;--------------&lt;br /&gt;The implementation is done in C but I still wanted to have&lt;br /&gt;a clear object-oriented interface. To achieve this I&lt;br /&gt;created two header files ic_poll_set.h which declares all&lt;br /&gt;the public parts and ic_poll_set_int.h which defines the&lt;br /&gt;private and the public data structures used. This means that&lt;br /&gt;the internals of the IC_POLL_SET-object is hidden from the&lt;br /&gt;user of this interface.&lt;br /&gt;&lt;br /&gt;Here is the public part of the interface (the code is GPL:ed&lt;br /&gt;but it isn't released yet):&lt;br /&gt;&lt;br /&gt;Copyright (C) 2008 iClaustron AB, All rights reserved&lt;br /&gt;struct ic_poll_connection&lt;br /&gt;{&lt;br /&gt;  int fd;&lt;br /&gt;  guint32 index;&lt;br /&gt;  void *user_obj;&lt;br /&gt;  int ret_code;&lt;br /&gt;};&lt;br /&gt;typedef struct ic_poll_connection IC_POLL_CONNECTION;&lt;br /&gt;&lt;br /&gt;struct ic_poll_set;&lt;br /&gt;typedef struct ic_poll_set IC_POLL_SET;&lt;br /&gt;struct ic_poll_operations&lt;br /&gt;{&lt;br /&gt;  /*&lt;br /&gt;    The poll set implementation isn't multi-thread safe. It's intended to be&lt;br /&gt;    used within one thread, the intention is that one can have several&lt;br /&gt;    poll sets, but only one per thread. Thus no mutexes are needed to&lt;br /&gt;    protect the poll set.&lt;br /&gt;&lt;br /&gt;    ic_poll_set_add_connection is used to add a socket connection to the&lt;br /&gt;    poll set, it requires only the file descriptor and a user object of&lt;br /&gt;    any kind. The poll set implementation will ensure that this file&lt;br /&gt;    descriptor is checked together with the other file descriptors in&lt;br /&gt;    the poll set independent of the implementation in the underlying OS.&lt;br /&gt;&lt;br /&gt;    ic_poll_set_remove_connection is used to remove the file descriptor&lt;br /&gt;    from the poll set.&lt;br /&gt;&lt;br /&gt;    ic_check_poll_set is the method that goes to check which socket&lt;br /&gt;    connections are ready to receive.&lt;br /&gt;&lt;br /&gt;    ic_get_next_connection is used in a loop where it is called until it&lt;br /&gt;    returns NULL after a ic_check_poll_set call, the output from&lt;br /&gt;    ic_get_next_connection is prepared already at the time of the&lt;br /&gt;    ic_check_poll_set call. ic_get_next_connection will return a&lt;br /&gt;    IC_POLL_CONNECTION object. It is possible that ic_check_poll_set&lt;br /&gt;    can return without error whereas the IC_POLL_CONNECTION can still&lt;br /&gt;    have an error in the ret_code in the object. So it is important to&lt;br /&gt;    both check this return code as well as the return code from the&lt;br /&gt;    call to ic_check_poll_set (this is due to the implementation using&lt;br /&gt;    eventports on Solaris).&lt;br /&gt;&lt;br /&gt;    ic_free_poll_set is used to free the poll set, it will also if the&lt;br /&gt;    implementation so requires close any file descriptor of the poll&lt;br /&gt;    set.&lt;br /&gt;&lt;br /&gt;    ic_is_poll_set_full can be used to check if there is room for more&lt;br /&gt;    socket connections in the poll set. The poll set has a limited size&lt;br /&gt;    (currently set to 1024) set by a compile time parameter.&lt;br /&gt;  */&lt;br /&gt;  int (*ic_poll_set_add_connection)    (IC_POLL_SET *poll_set,&lt;br /&gt;                                        int fd,&lt;br /&gt;                                        void *user_obj);&lt;br /&gt;  int (*ic_poll_set_remove_connection) (IC_POLL_SET *poll_set,&lt;br /&gt;                                        int fd);&lt;br /&gt;  int (*ic_check_poll_set)             (IC_POLL_SET *poll_set,&lt;br /&gt;                                        int ms_time);&lt;br /&gt;  const IC_POLL_CONNECTION*&lt;br /&gt;      (*ic_get_next_connection)        (IC_POLL_SET *poll_set);&lt;br /&gt;  void (*ic_free_poll_set)             (IC_POLL_SET *poll_set);&lt;br /&gt;  gboolean (*ic_is_poll_set_full)      (IC_POLL_SET *poll_set);&lt;br /&gt;};&lt;br /&gt;typedef struct ic_poll_operations IC_POLL_OPERATIONS;&lt;br /&gt;&lt;br /&gt;/* Creates a new poll set */&lt;br /&gt;IC_POLL_SET* ic_create_poll_set();&lt;br /&gt;&lt;br /&gt;struct ic_poll_set&lt;br /&gt;{&lt;br /&gt;  IC_POLL_OPERATIONS poll_ops;&lt;br /&gt;};&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-4599467540372418892?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/4599467540372418892/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=4599467540372418892' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4599467540372418892'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4599467540372418892'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/11/poll-set-to-handle-poll-eventports.html' title='Poll set to handle poll, eventports, epoll, kqueue and Windows IO Completion'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6397405444525754218</id><published>2008-11-21T02:31:00.002+01:00</published><updated>2009-05-19T15:03:29.875+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL 5.4'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenSolaris'/><category scheme='http://www.blogger.com/atom/ns#' term='Solaris'/><category scheme='http://www.blogger.com/atom/ns#' term='DTrace'/><title type='text'>DTrace, opensolaris and MySQL Performance</title><content type='html'>Currently I'm working hard to find and remove scalability&lt;br /&gt;bottlenecks in the MySQL Server. MySQL was acquired by Sun&lt;br /&gt;10 months ago by now. Many people have in blogs wondered what&lt;br /&gt;the impact has been from this acquisition. My personal&lt;br /&gt;experience is that I now have a chance to work with Sun&lt;br /&gt;experts in DBMS performance. As usual it takes time when&lt;br /&gt;working on new challenges before the flow of inspiration&lt;br /&gt;starts flowing. However I've seen this flow of inspiration&lt;br /&gt;starting to come now, so the fruit of our joint work is&lt;br /&gt;starting to bear fruit. I now have a much better understanding&lt;br /&gt;of MySQL Server performance than I used to have. I know fairly&lt;br /&gt;well where the bottlenecks are and I've started looking&lt;br /&gt;into how they can be resolved.&lt;br /&gt;&lt;br /&gt;Another interesting thing with Sun is the innovations they have&lt;br /&gt;done in a number of areas. One such area is DTrace. This is a&lt;br /&gt;really interesting tool which I already used to analyse some&lt;br /&gt;behaviour of MySQL Cluster internals with some success. However&lt;br /&gt;to analyse other storage engines inside MySQL requires a bit more&lt;br /&gt;work on inserting DTrace probes at appropriate places.&lt;br /&gt;&lt;br /&gt;To work with DTrace obviously means that you need to work with&lt;br /&gt;an OS that supports DTrace. Solaris is such a one, I actually&lt;br /&gt;developed NDB Cluster (the storage engine for MySQL Cluster) on&lt;br /&gt;Solaris the first 5-6 years. So one would expect Solaris to be&lt;br /&gt;familiar to me, but working with Linux mainly for 6-7 years means&lt;br /&gt;that most of the Solaris memory is gone.&lt;br /&gt;&lt;br /&gt;So how go about developing on Solaris. I decided to install a virtual&lt;br /&gt;machine on my desktop. As a well-behaved Sun citizen I decided to&lt;br /&gt;opt for VirtualBox in my choice of VM. This was an interesting&lt;br /&gt;challenge, very similar to my previous experiences on installing&lt;br /&gt;a virtual machine. It's easy to get the VM up and running, but how&lt;br /&gt;do you communicate with it. I found some instructions on how to&lt;br /&gt;set-up IP links to a virtual machine but to make life harder I&lt;br /&gt;have a fixed IP address on my desktop so this complicated life&lt;br /&gt;quite a bit. Finally I learned a lot about how to set-up virtual&lt;br /&gt;IP links which I already have managed to forget about :)&lt;br /&gt;&lt;br /&gt;The next step is to get going on having a development environment&lt;br /&gt;for  opensolaris. I soon discovered that there was a package&lt;br /&gt;manager in opensolaris which could be used to get all the needed&lt;br /&gt;packages. However after downloading a number of packages I&lt;br /&gt;stumbled into some serious issues. I learned from this experience&lt;br /&gt;that usage of Developer Previews for OS's is even worse than newly&lt;br /&gt;released OS's which I already know by experience isn't for the&lt;br /&gt;fainthearted.&lt;br /&gt;&lt;br /&gt;So I decided to install a released opensolaris version instead&lt;br /&gt;(the OpenSolaris2008.05 version). After some googling I discovered&lt;br /&gt;a very helpful presentation at &lt;a href="www.suntechdays2008.com/down/1015/track3/T3S5_opensource_alex.pdf"&gt;opensolaris developer how-to&lt;/a&gt;&lt;br /&gt;which explained a lot about how to install a development&lt;br /&gt;environment for opensolaris.&lt;br /&gt;&lt;br /&gt;After installing opensolaris 2008.05, after following the&lt;br /&gt;instructions on how to install a development environment&lt;br /&gt;I am now equipped to develop DTrace probes and scripts and&lt;br /&gt;try them out on my desktop.&lt;br /&gt;&lt;br /&gt;I definitely like the idea that opensolaris is looking more&lt;br /&gt;like yet another Linux distribution since it makes it a&lt;br /&gt;lot simpler to work with it. I would prefer GNU developer&lt;br /&gt;tools to be there from scratch but I have the same issue&lt;br /&gt;with Ubuntu.&lt;br /&gt;&lt;br /&gt;That the system calls are different don't bother me as a&lt;br /&gt;programmer since different API's to similar things is&lt;br /&gt;something every programmer encounters if he's developing&lt;br /&gt;for a multi-platform environment. I even look forward to&lt;br /&gt;trying out a lot of Solaris system calls since there are&lt;br /&gt;lots of cool features on locking to CPU's, controlling&lt;br /&gt;CPU's for interrupts, resource groups, scheduling&lt;br /&gt;algorithms and so forth. I recently noted that most of&lt;br /&gt;these things are available on Linux as well. However&lt;br /&gt;I am still missing the programming API's to these&lt;br /&gt;features.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6397405444525754218?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6397405444525754218/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6397405444525754218' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6397405444525754218'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6397405444525754218'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/11/dtrace-opensolaris-and-mysql.html' title='DTrace, opensolaris and MySQL Performance'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-741638860272613416</id><published>2008-10-30T19:45:00.004+01:00</published><updated>2008-10-31T12:27:55.660+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='PARTTION BY'/><category scheme='http://www.blogger.com/atom/ns#' term='Partititon'/><category scheme='http://www.blogger.com/atom/ns#' term='MyISAM'/><category scheme='http://www.blogger.com/atom/ns#' term='cache index'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel MySQL'/><title type='text'>CACHE INDEX per partition for MyISAM</title><content type='html'>The newest development in the partitioning code&lt;br /&gt;is &lt;a href="http://forge.mysql.com/worklog/task.php?id=4571"&gt;WL#4571&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This new feature makes it possible to tie a&lt;br /&gt;partition using MyISAM to a specific cache index.&lt;br /&gt;The syntax for how to do is available in the&lt;br /&gt;above worklog entry.&lt;br /&gt;&lt;br /&gt;We found this feature to be useful for enabling&lt;br /&gt;higher performance of parallel ALTER TABLE&lt;br /&gt;(&lt;a href="http://forge.mysql.com/worklog/task.php?id=2550"&gt;WL#2550&lt;/a&gt;). When adding&lt;br /&gt;a primary key to a MyISAM table the key cache in&lt;br /&gt;MyISAM limited scalability of Parallel ALTER TABLE&lt;br /&gt;severely, so adding several key caches, essentially&lt;br /&gt;one per partition we can ensure that the ALTER TABLE&lt;br /&gt;can be fully parallelised (all other ALTER TABLE&lt;br /&gt;on MyISAM already scales perfectly).&lt;br /&gt;&lt;br /&gt;We also have some ideas on how to solve the base&lt;br /&gt;problem in how to make the key cache more scalable&lt;br /&gt;by dividing the mutex on the key cache into one&lt;br /&gt;mutex per a range of key cache pages.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-741638860272613416?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/741638860272613416/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=741638860272613416' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/741638860272613416'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/741638860272613416'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/10/cache-index-per-partition-for-myisam.html' title='CACHE INDEX per partition for MyISAM'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7414927154347183081</id><published>2008-10-30T19:26:00.003+01:00</published><updated>2009-05-19T15:03:53.816+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='partitioning'/><category scheme='http://www.blogger.com/atom/ns#' term='PARTTION BY'/><category scheme='http://www.blogger.com/atom/ns#' term='Partititon'/><title type='text'>New launchpad tree for PARTITION BY RANGE COLUMN_LIST(a,b)</title><content type='html'>A colleague of mine at Sun/MySQL showed me how to get&lt;br /&gt;statistics from my blog. This was an interesting read&lt;br /&gt;of all statistics. I noted that there was a great&lt;br /&gt;interest in partitioning related information and that&lt;br /&gt;the new partitioning feature mentioned in my blog&lt;br /&gt;2 years ago still attracts a lot of attention.&lt;br /&gt;&lt;br /&gt;So I thought it was a good idea to blog a bit more&lt;br /&gt;about what's going on in the partitioning&lt;br /&gt;development. I decided to check out how easy it is&lt;br /&gt;to externalize my development trees on launchpad.&lt;br /&gt;It turned out to be really easy so I simply&lt;br /&gt;put up the development tree for the new partitioning&lt;br /&gt;feature which I described in my last blog.&lt;br /&gt;&lt;br /&gt;&lt;a href="https://code.launchpad.net/~mikael-ronstrom/mysql-server/mysql-5.1-wl3352"&gt;Launchpad tree&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I also externalized the Worklog entry for this&lt;br /&gt;development, unfortunately not a very long&lt;br /&gt;description but I'll try to work on that.&lt;br /&gt;There is a new test case in the mysql-test/t&lt;br /&gt;directory called partition_column.test which&lt;br /&gt;shows how to use these new features (it might&lt;br /&gt;take some time before this link works).&lt;br /&gt;&lt;br /&gt;&lt;a href="http://forge.mysql.com/worklog/task.php?id=3352"&gt;Worklog description&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7414927154347183081?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7414927154347183081/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7414927154347183081' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7414927154347183081'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7414927154347183081'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/10/new-launchpad-tree-for-partition-by.html' title='New launchpad tree for PARTITION BY RANGE COLUMN_LIST(a,b)'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3802262217269715021</id><published>2008-10-07T19:17:00.005+02:00</published><updated>2008-10-07T19:32:08.689+02:00</updated><title type='text'>Further development on new partitioning feature</title><content type='html'>As mentioned in a blog 2 years ago I worked on a new&lt;br /&gt;&lt;a href="http://mikaelronstrom.blogspot.com/2006_09_01_archive.html"&gt;partitioning feature:&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I've been busy with many other things but now I've taken this&lt;br /&gt;work a step forward and will most likely set-up a preview tree&lt;br /&gt;of this feature in a short time.&lt;br /&gt;&lt;br /&gt;The new feature adds the possibility to perform partitioning&lt;br /&gt;on any type of column also for range and list partitioning&lt;br /&gt;(has been possible for KEY partitioning all the time). It also&lt;br /&gt;adds a new function to the MySQL Server and this function is&lt;br /&gt;also a monotonic function which means it gets a nice treatment&lt;br /&gt;of the partition pruning. This new function is TO_SECONDS which&lt;br /&gt;works very similarly to TO_DAYS.&lt;br /&gt;&lt;br /&gt;So here are couple of new cases of what one will be able to do:&lt;br /&gt;&lt;br /&gt;create table t1 (d date)&lt;br /&gt;partition by range column_list(d)&lt;br /&gt;( partition p0 values less than (column_list('1999-01-01')),&lt;br /&gt;  partition p1 values less than (column_list('2000-01-01')));&lt;br /&gt;&lt;br /&gt;create table t1 (a date)&lt;br /&gt;partition by range(to_seconds(a))&lt;br /&gt;(partition p0 values less than (to_seconds('2004-01-01')),&lt;br /&gt; partition p1 values less than (to_seconds('2005-01-01')));&lt;br /&gt;&lt;br /&gt;select * from t1 where a &lt;= '2003-12-31';&lt;br /&gt;&lt;br /&gt;This select will be discovered to only find values in p0 by&lt;br /&gt;the partition pruning optimisation step.&lt;br /&gt;&lt;br /&gt;create table t1 (a int, b int)&lt;br /&gt;partition by range column_list(a,b)&lt;br /&gt;(partition p2 values less than (column_list(99,99)),&lt;br /&gt; partition p1 values less than (column_list(99,999)));&lt;br /&gt;&lt;br /&gt;insert into t1 values (99,998);&lt;br /&gt;select * from t1 where a = 99 and b = 998;&lt;br /&gt;&lt;br /&gt;This select statement will discover that it can only&lt;br /&gt;be any records in the p1 partition and avoid&lt;br /&gt;scanning the p0 partition. Thus partitioning works&lt;br /&gt;in very much the same manner as a first step index.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3802262217269715021?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3802262217269715021/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3802262217269715021' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3802262217269715021'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3802262217269715021'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/10/further-development-on-new-partitioning.html' title='Further development on new partitioning feature'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-5044220040502828863</id><published>2008-10-02T23:36:00.003+02:00</published><updated>2008-10-03T00:06:31.134+02:00</updated><title type='text'>dbt2-0.37.37 uploaded and various other stuff</title><content type='html'>There was a small bug in the dbt2-0.37.36 version I uploaded which&lt;br /&gt;I have now fixed in the new dbt2-0.37.37 version.&lt;br /&gt;&lt;br /&gt;There has also been some interesting benchmark tests done where&lt;br /&gt;we have run DBT2 on a T5220 box (Niagara II chips). We can show&lt;br /&gt;the scalable performance benefits here as well. We've been able&lt;br /&gt;to run with 20 data nodes on 1 box (these boxes can run up to&lt;br /&gt;64 threads at a time) with scalable performance increase from&lt;br /&gt;4 nodes.&lt;br /&gt;&lt;br /&gt;We had a developer meeting a few weeks ago and there were lots of&lt;br /&gt;activities. Personally I had most fun seeing the demo of&lt;br /&gt;Parallel ALTER TABLE. We loaded a table with 10 million 70-80 byte&lt;br /&gt;rows. We had access to a machine with 64 GB of memory and&lt;br /&gt;16 cores. It was very interesting to run one SQL command and&lt;br /&gt;see the load in top of mysqld go to 1600%. Altering a 10 million&lt;br /&gt;row table in 2.5 seconds I thought was pretty good.&lt;br /&gt;&lt;br /&gt;Another cool demo was to see the online add node in MySQL Cluster.&lt;br /&gt;This is an interesting feature which I started thinking about&lt;br /&gt;in 1999, had a first design then, changed to a second variant&lt;br /&gt;in 2001 and changed again around 2005 and the final version that&lt;br /&gt;was implemented was the fourth version of the design. The nice&lt;br /&gt;thing is that the fourth version actually contains some nice&lt;br /&gt;innovations that neither of the earlier designs had. So cooking&lt;br /&gt;an idea for a long time can be really beneficial some times.&lt;br /&gt;For a very brief description of this work see Jonas Oreland's&lt;br /&gt;blog.&lt;br /&gt;&lt;br /&gt;Jonas and Pekka is also working on another cool optimisation&lt;br /&gt;of MySQL Cluster where the data node will become multithreaded.&lt;br /&gt;There will be up to 6 threads in the first released version of&lt;br /&gt;this. Jonas measured in a test today that one could do 370.000&lt;br /&gt;inserts per second on 1 8-core box with this feature (and this&lt;br /&gt;is still a fairly unstable version where there are still some&lt;br /&gt;performance issues remaining). We're getting close to measuring&lt;br /&gt;computer speed in MDO (MegaDatabaseOperations per second)&lt;br /&gt;instead of in MHz.&lt;br /&gt;&lt;br /&gt;Jonas and myself is also working on removing from MySQL Cluster&lt;br /&gt;"the single transporter mutex" which will improve the scalability&lt;br /&gt;of MySQL Servers using MySQL Cluster. We're working on this in&lt;br /&gt;parallel using the same basic design but with small variations&lt;br /&gt;on the details. Will be interesting to see which variant that&lt;br /&gt;works best.&lt;br /&gt;&lt;br /&gt;Finally Frazer has optimised the handling of large records in&lt;br /&gt;the data node to the extent that inserts of 5k records gets&lt;br /&gt;twice the speed. The interesting thing is that the benchmark&lt;br /&gt;for this hits the limit of Gigabit Ethernet already with 1&lt;br /&gt;CPU working at 80% which is quite interesting.&lt;br /&gt;&lt;br /&gt;So as you can see there is a lot of interesting things cooking&lt;br /&gt;at MySQL and then I haven't even mentioned the work we're&lt;br /&gt;doing together with other Sun folks on optimising MySQL. More&lt;br /&gt;on that later.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-5044220040502828863?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/5044220040502828863/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=5044220040502828863' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5044220040502828863'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5044220040502828863'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/10/dbt2-03737-uploaded-and-various-other.html' title='dbt2-0.37.37 uploaded and various other stuff'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7453008974275529498</id><published>2008-09-09T00:38:00.003+02:00</published><updated>2008-09-09T00:53:22.557+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='DBT2'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL Cluster'/><title type='text'>Linear Scalability of MySQL Cluster using DBT2</title><content type='html'>To achieve linear scalability of MySQL Cluster using the DBT2&lt;br /&gt;benchmark has been a goal of mine for a long time now. Last&lt;br /&gt;week I finally found the last issue that limited the scalability.&lt;br /&gt;As usual when you discovered the issue it was trivial (in this&lt;br /&gt;case it was fixed by inserting 3 0's in the NDB handler code).&lt;br /&gt;&lt;br /&gt;We can now achieve ~41k TPM on a 2-node cluster, ~81k on a&lt;br /&gt;4-node cluster and ~159k TPM on a 8-node cluster giving roughly&lt;br /&gt;97% improved performance by doubling number of nodes. So there&lt;br /&gt;is nothing limiting us now from achieving all the way up to&lt;br /&gt;1M TPM except lack of hardware :)&lt;br /&gt;&lt;br /&gt;I've learned a lot about what affects scalability and what&lt;br /&gt;affects performance of MySQL Cluster by performing those&lt;br /&gt;experiments and I'll continue writing up those experiences on&lt;br /&gt;my blog here. I have also uploaded a new DBT2 version where I&lt;br /&gt;added a lot of new features to the DBT2, improved performance&lt;br /&gt;of the benchmark itself and also ensured that running with many&lt;br /&gt;parallel DBT2 drivers do still provide correct results when&lt;br /&gt;adding the results together. It can be downloaded from&lt;br /&gt;www.iclaustron.com&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7453008974275529498?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7453008974275529498/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7453008974275529498' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7453008974275529498'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7453008974275529498'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/09/linear-scalability-of-mysql-cluster.html' title='Linear Scalability of MySQL Cluster using DBT2'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3642068002685427539</id><published>2008-08-21T11:55:00.002+02:00</published><updated>2008-08-21T12:17:26.180+02:00</updated><title type='text'>Some food for thoughts: How to make use of new SSD devices</title><content type='html'>The hardware guys are presenting new storage devices called&lt;br /&gt;SSD's based on flash memory. At the moment I think they are&lt;br /&gt;about 3-4 times cheaper than DRAM memory and the gap seems&lt;br /&gt;to be increasing. They're still far from the price of hard&lt;br /&gt;drives but also here the gap seems to be closing.&lt;br /&gt;&lt;br /&gt;So as I'm now an employee of Sun that actually puts together&lt;br /&gt;systems with this type of HW in it I get questioned what I&lt;br /&gt;as a DBMS developer can do with those devices.&lt;br /&gt;&lt;br /&gt;First some comments on performance. These new devices will be&lt;br /&gt;able to perform reads and writes of a few kilobytes large pages&lt;br /&gt;in about 25-100 microseconds compared to hard drives which&lt;br /&gt;takes about 3-10 milliseconds for the same thing.&lt;br /&gt;&lt;br /&gt;An obvious use is obviously to use them to speed up database&lt;br /&gt;logging, particularly in commit situations. However this&lt;br /&gt;doesn't really require any significant changes to the SW&lt;br /&gt;already out there. So I won't spend any more time on this use.&lt;br /&gt;&lt;br /&gt;Another use is for MySQL Cluster. MySQL Cluster stores most data&lt;br /&gt;in memory and can store non-indexed data on disk. So how can&lt;br /&gt;SSD devices be used to improve this.&lt;br /&gt;&lt;br /&gt;First some facts about performance of MySQL Cluster. In the data&lt;br /&gt;node where the data actually resides it takes about 10&lt;br /&gt;microseconds of processing time to perform a key lookup and a&lt;br /&gt;scan has about 20 microseconds of start-up costs whereafter each&lt;br /&gt;record takes 1-2 microseconds to fetch.&lt;br /&gt;&lt;br /&gt;So now for the idea. Let's assume we'll use an SSD device as swap&lt;br /&gt;memory. We would then purposely set the swap to be e.g. 10x&lt;br /&gt;larger than the memory. For this to work we need to be able to&lt;br /&gt;allocate memory from different swap pools, memory used for&lt;br /&gt;transaction state and things like this we don't want swapped out&lt;br /&gt;(working for Sun has an advantage since we can work with the OS&lt;br /&gt;guys directly, but naturally I hope Linux developers also take the&lt;br /&gt;same opportunity).&lt;br /&gt;&lt;br /&gt;So during a key lookup we need to get one page from the hash index&lt;br /&gt;and one page with the record in it. Guestimating a 90% hit rate in&lt;br /&gt;the hash index and 80% hit rate on the data page we find that we&lt;br /&gt;will about 0.3 swap misses per key lookup. If we assume 50&lt;br /&gt;microseconds for this it means that mean key lookup will increase&lt;br /&gt;from 10 microseconds to 25 microseconds. This should be&lt;br /&gt;acceptable, given that we can increase data size by a factor of&lt;br /&gt;about 10.&lt;br /&gt;&lt;br /&gt;A similar analysis can be made for scans as well, but I'm lazy so&lt;br /&gt;will leave it to you to perform :)&lt;br /&gt;&lt;br /&gt;So given todays sizes of memories and SSD's it should be possible&lt;br /&gt;to use systems with 64 GBytes of memory and 640 GB of SSD memory&lt;br /&gt;and clustering 8 of those with replication gives us a main memory&lt;br /&gt;based system for a reasonable price providing 2.5 TByte of user&lt;br /&gt;data in a highly available system with high degrees of parallelism&lt;br /&gt;in the system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3642068002685427539?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3642068002685427539/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3642068002685427539' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3642068002685427539'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3642068002685427539'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/08/some-food-for-thoughts-how-to-make-use.html' title='Some food for thoughts: How to make use of new SSD devices'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1453879727794365793</id><published>2008-08-21T11:35:00.002+02:00</published><updated>2008-08-21T11:50:39.762+02:00</updated><title type='text'>New partitioning features</title><content type='html'>As burtonator pointed out parallelism is an important&lt;br /&gt;feature that partitioning makes possible. So I thought&lt;br /&gt;it might be a good idea to mention a little bit what&lt;br /&gt;we're doing in the area of partitioning.&lt;br /&gt;&lt;br /&gt;It's quite correct that parallelism is one of the main&lt;br /&gt;advantages of partitioning (not the only one though since&lt;br /&gt;also partition pruning and dividing large indexes and&lt;br /&gt;being able to add and drop partitions efficiently are&lt;br /&gt;important as well). In 5.1 we focused on the maintenance&lt;br /&gt;features of partitioning but the intention to move on&lt;br /&gt;to parallelisation was more or less the main goal from&lt;br /&gt;the very start.&lt;br /&gt;&lt;br /&gt;This is why it's such extra fun to actually get going on&lt;br /&gt;this when one has worked on the foundation for this work&lt;br /&gt;for almost 4 years (partitioning development started out&lt;br /&gt;2004 H2 and most of the partitioning code in 5.1 was ready&lt;br /&gt;about two years later).&lt;br /&gt;&lt;br /&gt;There are also ideas to introduce parallelism for scans of&lt;br /&gt;large partitioned tables and also a few more maintenance&lt;br /&gt;features that are still missing.&lt;br /&gt;&lt;br /&gt;Another feature in the works for partitioning is the&lt;br /&gt;ability to use partition pruning on several fields. This&lt;br /&gt;will be possible for PARTITION BY RANGE and LIST. The&lt;br /&gt;syntax will look something like this:&lt;br /&gt;&lt;br /&gt;CREATE TABLE t1 (a varchar(20), b int)&lt;br /&gt;PARTITION BY RANGE (COLUMN_LIST(a,b))&lt;br /&gt;(PARTITION p0 VALUES LESS THAN (COLUMN_LIST("a", 1)),&lt;br /&gt; PARTITION p1 VALUES LESS THAN&lt;br /&gt;            (COLUMN_LIST(MAXVALUE, 4)));&lt;br /&gt;&lt;br /&gt;In this case it is possible to partition on any field type&lt;br /&gt;and it is also possible to do partition pruning on multiple&lt;br /&gt;fields in much the same way as it is for indexes.&lt;br /&gt;&lt;br /&gt;E.g.&lt;br /&gt;select * from t1 where a = "a";&lt;br /&gt;select * from t1 where a = "a" and b = 2;&lt;br /&gt;&lt;br /&gt;will both be able to use for partition pruning with the&lt;br /&gt;second obviously able to do more pruning then the first one.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1453879727794365793?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1453879727794365793/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1453879727794365793' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1453879727794365793'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1453879727794365793'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/08/new-partitioning-features.html' title='New partitioning features'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3294699787089563351</id><published>2008-08-20T23:00:00.003+02:00</published><updated>2008-08-20T23:07:26.641+02:00</updated><title type='text'>Multi-threaded ALTER TABLE</title><content type='html'>Today I achieved something which is a first in the MySQL&lt;br /&gt;server as far as I'm aware of. I managed to run a query&lt;br /&gt;with multiple threads. The query was:&lt;br /&gt;ALTER TABLE t1 ADD COLUMN b int;&lt;br /&gt;and the table had 4 partitions in it. So it used 4 threads&lt;br /&gt;that each thread handled the copying of data from old&lt;br /&gt;table to new table of one partition.&lt;br /&gt;&lt;br /&gt;Currently it's designed for use by partitioned tables but&lt;br /&gt;it should be very straightforward to do minor parallelisation&lt;br /&gt;also of non-partitioned tables by e.g. breaking up in a scan&lt;br /&gt;thread and a write thread.&lt;br /&gt;&lt;br /&gt;It's nice to get started on this track and see how one can&lt;br /&gt;make use of modern computers with a great deal of CPU power&lt;br /&gt;if one can parallelise the applications. As an example a&lt;br /&gt;dual socket box T5220 (2 Niagara II CPU's) can handle 128&lt;br /&gt;threads in parallel.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3294699787089563351?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3294699787089563351/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3294699787089563351' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3294699787089563351'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3294699787089563351'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/08/multi-threaded-alter-table.html' title='Multi-threaded ALTER TABLE'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-5882829051760933502</id><published>2008-08-01T20:40:00.003+02:00</published><updated>2008-08-01T20:56:23.293+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='iClaustron'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL Cluster'/><category scheme='http://www.blogger.com/atom/ns#' term='NDB'/><title type='text'>3: Thoughts on a new NDB API: Adaptive send algorithm</title><content type='html'>I thought a bit more on the adaptive send algorithm and kind of like&lt;br /&gt;the following approach:&lt;br /&gt;&lt;br /&gt;Keep track of how many sends we are at maximum allowed to wait&lt;br /&gt;until we send in any ways. This is the state of the adaptive send&lt;br /&gt;algorithm which is adapted through the following use of statistics&lt;br /&gt;(we call this state variable max_waits):&lt;br /&gt;&lt;br /&gt;For each send we calculate how long time has passed since the&lt;br /&gt;send that was sent max_waits sends ago. We also do the same for&lt;br /&gt;max_waits + 1. At certain intervals (e.g. every 10 milliseconds) we&lt;br /&gt;calculate the mean wait that a send would have to do, if this lies&lt;br /&gt;within half the desired maximum wait then we accept the current&lt;br /&gt;state, if also the mean value using max_waits + 1 is acceptable&lt;br /&gt;then we increase the state by one. If the state isn't acceptable&lt;br /&gt;we decrease it by one.&lt;br /&gt;&lt;br /&gt;In the actual decision making we will always send as soon as we&lt;br /&gt;notify that more than the maximum wait time has occurred so this&lt;br /&gt;means that the above algorithm is conservative. However the user&lt;br /&gt;should have the ability to control how long he accepts a wait&lt;br /&gt;through a configuration variable, thus increasing or decreasing&lt;br /&gt;send buffering at the expense of extra delays.&lt;br /&gt;&lt;br /&gt;This algorithm is applied on each socket and the actual decision&lt;br /&gt;making is done within the critical section and also the statistics&lt;br /&gt;calculation and from coding this it seems like the overhead should&lt;br /&gt;be manageable.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-5882829051760933502?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/5882829051760933502/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=5882829051760933502' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5882829051760933502'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/5882829051760933502'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/08/3-thoughts-on-new-ndb-api-adaptive-send.html' title='3: Thoughts on a new NDB API: Adaptive send algorithm'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6553661882577788978</id><published>2008-07-31T20:27:00.003+02:00</published><updated>2008-08-01T00:24:32.807+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='DBT2'/><category scheme='http://www.blogger.com/atom/ns#' term='MySQL Cluster'/><category scheme='http://www.blogger.com/atom/ns#' term='NDB'/><title type='text'>1: Making MySQL Cluster scale perfectly in the DBT2 benchmark: Initial discussion</title><content type='html'>Since 2006 H1 I've been working on benchmarking MySQL&lt;br /&gt;Cluster using the DBT2 test suite. Initially this meant&lt;br /&gt;a fair amount of work on the test suite itself and also&lt;br /&gt;a set of scripts to start and stop NDB data nodes, MySQL&lt;br /&gt;Servers and all the other processes of the DBT2 test.&lt;br /&gt;(These scripts and the DBT2 tests I'm using is available&lt;br /&gt;for download at &lt;a href&gt;www.iclaustron.com&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;Initially I worked with an early version of MySQL Cluster&lt;br /&gt;based on version 5.1 and this meant that I hit a number&lt;br /&gt;of the performance bugs that had appeared there in the&lt;br /&gt;development process. Nowadays the stability is really good&lt;br /&gt;so in the most case I've spent my time focusing on what&lt;br /&gt;is required to use in the operating system and the&lt;br /&gt;benchmark application for optimum scalability.&lt;br /&gt;&lt;br /&gt;Early on I discovered some basic features that were required&lt;br /&gt;to get optimum performance of MySQL Cluster in those cases.&lt;br /&gt;One of them is to simply use partitioning properly. In the&lt;br /&gt;case of DBT2 most tables (everyone except the ITEM table) can&lt;br /&gt;be partitioned on the Warehouse id. So the new feature I&lt;br /&gt;developed as part of 5.1 came in handy here. It's possible to&lt;br /&gt;use both PARTITION BY KEY (warehouse_id) or PARTITION BY&lt;br /&gt;HASH (warehouse_id). Personally I prefer PARTITION BY HASH&lt;br /&gt;since it spreads the warehouses perfectly amongst the data&lt;br /&gt;nodes. However in 5.1 this isn't a fully supported so one has&lt;br /&gt;to start the MySQL Server using the flag --new to use this&lt;br /&gt;feature with MySQL Cluster.&lt;br /&gt;&lt;br /&gt;The second one was the ability to use the transaction&lt;br /&gt;coordinator on the same node as the warehouse the&lt;br /&gt;transaction is handling. This was handled by a new&lt;br /&gt;feature introducted in MySQL Cluster Carrier Grade&lt;br /&gt;Edition 6.3 whereby the transaction coordinator is&lt;br /&gt;started on the node where the first query is targeted.&lt;br /&gt;This works perfectly for DBT2 and for many other&lt;br /&gt;applications and it's fairly easy to change your&lt;br /&gt;application if it doesn't fit immediately.&lt;br /&gt;&lt;br /&gt;The next feature was to ensure that sending uses as&lt;br /&gt;big buffers as possible and also to avoid wake-up&lt;br /&gt;costs. Both those features meant changes to the&lt;br /&gt;scheduler in the data nodes of the MySQL Cluster.&lt;br /&gt;These changes works very well in most cases where&lt;br /&gt;there is sufficient CPU resources for the data nodes.&lt;br /&gt;This feature was also introduced in MySQL Cluster CGE&lt;br /&gt;version 6.3.&lt;br /&gt;&lt;br /&gt;Another feature which is very important to achieve&lt;br /&gt;optimum scalability is to ensure that the MySQL Server&lt;br /&gt;starts scans only on the data nodes where it will&lt;br /&gt;actually find the data. This is done through the use&lt;br /&gt;of partition pruning as introduced in MySQL version&lt;br /&gt;5.1. Unfortunately there was a late bug introduced&lt;br /&gt;which I recently discovered which gave decreased&lt;br /&gt;scalability for DBT2 (this is bug#37934 which contains&lt;br /&gt;a patch which fixes the bug, it hasn't been pushed yet&lt;br /&gt;to any 6.3 version).&lt;br /&gt;&lt;br /&gt;With these features there were still a number of scalability&lt;br /&gt;issues remaining in DBT2. One was the obvious one that the&lt;br /&gt;ITEM table is spread on all data nodes and thus reads of the&lt;br /&gt;ITEM table will use network sockets that isn't so "hot".&lt;br /&gt;There are two solutions to this, one is that MySQL Cluster&lt;br /&gt;implements some tables as fully replicated on all data nodes.&lt;br /&gt;This might arrive some time in the future, the other variant&lt;br /&gt;uses standard MySQL techniques. One places the table in&lt;br /&gt;another storage engine, e.g. InnoDB, and uses replication to&lt;br /&gt;spread the updates to all the MySQL Servers in the cluster.&lt;br /&gt;This technique should be a technique that can be applied to&lt;br /&gt;many web applications where there are tables that need to be&lt;br /&gt;in MySQL Cluster to handle availability issues and that the&lt;br /&gt;data is required to be updated through proper transactions, but&lt;br /&gt;there are also other tables which can be updated in a lazy&lt;br /&gt;manner.&lt;br /&gt;&lt;br /&gt;Finally there is one more remaining issue and this is when the&lt;br /&gt;MySQL Server doesn't work on partitioned data. That is in the&lt;br /&gt;case of DBT2 if all MySQL Servers can access data in a certain&lt;br /&gt;node group then the data nodes will have more network sockets to&lt;br /&gt;work with which will increase cost of networking. This limits&lt;br /&gt;scalability as well.&lt;br /&gt;&lt;br /&gt;In the case of DBT2 this can be avoided by using a spread&lt;br /&gt;parameter that ensures that a certain MySQL Server only uses a&lt;br /&gt;certain node group in the MySQL Cluster. In a generic application&lt;br /&gt;this would be handled by an intelligent load balancer that&lt;br /&gt;ensures that MySQL Servers works on different partitions of&lt;br /&gt;the data in the application.&lt;br /&gt;&lt;br /&gt;What I will present in future blogs is some data on how much the&lt;br /&gt;effects mentioned above have on the scalability of the DBT2&lt;br /&gt;benchmark for MySQL Cluster.&lt;br /&gt;&lt;br /&gt;What is more surprising is that there is also a number of other&lt;br /&gt;issues related to the use of the operating system which aren't&lt;br /&gt;obvious at all. I will present those as well and what those mean&lt;br /&gt;in terms of scalability for MySQL Cluster using DBT2.&lt;br /&gt;&lt;br /&gt;Finally in a real application there will seldom be a perfect&lt;br /&gt;scalability occuring, so in any real application it's also&lt;br /&gt;important to minimize the impact of scalability issues. The&lt;br /&gt;main technology to use here is cluster interconnects and I&lt;br /&gt;will show how the use of cluster interconnects affects&lt;br /&gt;scalability issues in MySQL Cluster.&lt;br /&gt;&lt;br /&gt;Note numbers from these DBT2 are merely used to be used here to&lt;br /&gt;compare different configurations of MySQL Cluster.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6553661882577788978?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6553661882577788978/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6553661882577788978' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6553661882577788978'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6553661882577788978'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/07/1-making-mysql-cluster-scale-perfectly.html' title='1: Making MySQL Cluster scale perfectly in the DBT2 benchmark: Initial discussion'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-6069209879988440301</id><published>2008-07-30T17:02:00.003+02:00</published><updated>2008-07-30T17:23:35.516+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MySQL'/><category scheme='http://www.blogger.com/atom/ns#' term='iClaustron'/><category scheme='http://www.blogger.com/atom/ns#' term='NDB'/><title type='text'>2: Thoughts on a new NDB API: Send part</title><content type='html'>In the current API when sending one takes the Transporter mutex and&lt;br /&gt;then sends all the signals generated towards one or many nodes.&lt;br /&gt;There is also some handling of adaptive sends, however this adaptive&lt;br /&gt;algorithm takes care of all nodes, thus waiting for sending is global&lt;br /&gt;on all nodes.&lt;br /&gt;&lt;br /&gt;The new design uses one mutex for the sending, however this mutex only&lt;br /&gt;controls the sending part of one socket. Also the time for holding the&lt;br /&gt;mutex is just enough to check the state, no send operations are done&lt;br /&gt;while holding the mutex.&lt;br /&gt;&lt;br /&gt;The new adaptive algorithm will keep track of the last sent messages on&lt;br /&gt;this socket and in principle the idea is that if it's at least a 90-99%&lt;br /&gt;probability that it is a good idea to wait, then it will wait (unless&lt;br /&gt;the application has provided the force send flag). It will do so by&lt;br /&gt;keeping track of the last few messages sent.&lt;br /&gt;&lt;br /&gt;So in principle the data structure protected by the mutex is:&lt;br /&gt;struct ic_send_node_mutex&lt;br /&gt;{&lt;br /&gt;  IC_SEND_THREAD_MUTEX *send_thread_mutex;&lt;br /&gt;  Mutex mutex;&lt;br /&gt;  boolean send_active;&lt;br /&gt;  IC_COMM_BUFFER *first_cb;&lt;br /&gt;  IC_COMM_BUFFER *last_cb;&lt;br /&gt;  uint32 queued_bytes;&lt;br /&gt;  Timer first_buffered_timer;&lt;br /&gt;  Timer last_sent_timers[8];&lt;br /&gt;  uint32 last_sent_timer_index;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;For each socket there is a specific send thread, this thread is mostly&lt;br /&gt;sleeping, waiting for someone to wake it up from its sleep. One reason&lt;br /&gt;to wake it up is if one thread has started sending and other threads&lt;br /&gt;have provided so much work that it needs to offload this sending to&lt;br /&gt;a specific thread (the idea is that the sending is normally done by&lt;br /&gt;an application thread which is involved in user activity and we cannot&lt;br /&gt;keep this thread for longer than a few sends, thus we need to make it&lt;br /&gt;possible to offload send activity to a specific send thread when a high&lt;br /&gt;load appears. The send thread could also be awakened to send buffered&lt;br /&gt;messages that has timed out.&lt;br /&gt;&lt;br /&gt;The flag send_active is true whenever a thread is actively sending,&lt;br /&gt;and thus a thread that needs to send when this flag is set can&lt;br /&gt;simply return immediately, if it's not true then it can set the flag&lt;br /&gt;and start sending.&lt;br /&gt;&lt;br /&gt;It would probably be possible to handle this without a mutex, but the&lt;br /&gt;contention on this mutex should be small enough and also there is some&lt;br /&gt;wakeup logic that makes sense for a mutex.&lt;br /&gt;&lt;br /&gt;The application thread can prepare the NDB Protocol messages completely&lt;br /&gt;before acquiring the mutex, the only activity which sometimes happens&lt;br /&gt;inside the mutex is reading the time for handling of the adaptive&lt;br /&gt;algorithm.&lt;br /&gt;&lt;br /&gt;Sends normally goes to a NDB Data node but could also go to another&lt;br /&gt;Client node and could even go to another thread in the same process.&lt;br /&gt;This is important to handle parallelisation, thus to parallelise it&lt;br /&gt;is sufficient to send a number of messages to other nodes and/or&lt;br /&gt;threads. Each message can kick of at least one new thread.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-6069209879988440301?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/6069209879988440301/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=6069209879988440301' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6069209879988440301'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/6069209879988440301'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/07/2-thoughts-on-new-ndb-api-send-part.html' title='2: Thoughts on a new NDB API: Send part'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-4650864656496413941</id><published>2008-07-29T20:45:00.004+02:00</published><updated>2008-07-29T21:21:46.352+02:00</updated><title type='text'>1. Thoughts on a new NDB API, Baseline thoughts</title><content type='html'>I spent some time during my vacation thinking about some&lt;br /&gt;new ideas. I designed the first version of the NDB API&lt;br /&gt;about 10 years ago and obviously in those days the maximum&lt;br /&gt;number of CPU's in most systems was 2 so it wasn't a big&lt;br /&gt;problem having a single mutex protecting send and receive&lt;br /&gt;in the NDB API (The NDB API is the low level API used by the&lt;br /&gt;storage engine NDB which is the storage engine in MySQL&lt;br /&gt;Cluster).&lt;br /&gt;&lt;br /&gt;Another design criteria I made when designing the NDB API&lt;br /&gt;was that most developers want to use a synchronous API.&lt;br /&gt;Thus the asynchronous API was made afterwards and didn't&lt;br /&gt;cover all operations. Most developers still develop using&lt;br /&gt;synchronous API's, however most of the use for the NDB&lt;br /&gt;API is for specialised applications such as Telco servers,&lt;br /&gt;storage engine code, LDAP servers. Also I'm thinking in&lt;br /&gt;even using it inside an operating system kernel to design&lt;br /&gt;a clustered file system.&lt;br /&gt;&lt;br /&gt;Thus today it seems like a better idea to use an asynchronous&lt;br /&gt;API as the base and then put the synchronous API on top of&lt;br /&gt;this.&lt;br /&gt;&lt;br /&gt;When designing the original NDB API it was sufficient to think&lt;br /&gt;of simple key lookups, later it was advanced with also handling&lt;br /&gt;scans of tables and indexes. However current design problems&lt;br /&gt;are related to parallelising SQL queries and also there are&lt;br /&gt;implementations of things such as BLOB's that actually require&lt;br /&gt;multiple sequential and parallel operations. Thus in the new&lt;br /&gt;design it's necessary to consider the possibility of starting&lt;br /&gt;complex operations involving multiple threads (sometimes even&lt;br /&gt;multiple processes), multiple operations in sequence and in&lt;br /&gt;parallel.&lt;br /&gt;&lt;br /&gt;These ideas will be fed into the existing NDB API. It will also&lt;br /&gt;be used in the iClaustron project where I aim to build something&lt;br /&gt;that can be used as a clustered file system. iClaustron is both&lt;br /&gt;designed with the aim to at some point in time be a useful thing,&lt;br /&gt;but at the same time I use it as my personal playground where I&lt;br /&gt;can test new ideas and see how my ideas turns out when turned&lt;br /&gt;into source code.&lt;br /&gt;&lt;br /&gt;The original NDB API was designed in C++ as all the rest of the&lt;br /&gt;MySQL Cluster code. Within the data nodes I think we've found&lt;br /&gt;a good compromise of what to use in the C++ language and what&lt;br /&gt;not to use. However in general I found that debates around what&lt;br /&gt;should be used in C++ tends to take an improportionate amount of&lt;br /&gt;time compared to the value of those discussions. So for that&lt;br /&gt;reason I decided to use C as the language of choice for iClaustron.&lt;br /&gt;Actually there was more reasons for this, it makes it a lot easier&lt;br /&gt;to use the code inside an operating system kernel such as Linux&lt;br /&gt;or FreeBSD and second it makes it easier to write layers to other&lt;br /&gt;languages such as Python, Perl,...&lt;br /&gt;&lt;br /&gt;Most of the thoughts on this new NDB API has been in my mind for&lt;br /&gt;more than 2 years (actually some of the thoughts have already been&lt;br /&gt;implemented in NDB already), however during my vacation I had&lt;br /&gt;some fun in designing out all the details I hadn't considered&lt;br /&gt;previously.&lt;br /&gt;&lt;br /&gt;It's my view of a nice vacation to relax on a beach or walking in the&lt;br /&gt;mountains while inventing some new ideas based on an interesting&lt;br /&gt;problems. I cheated by solving a Sudoku as well this vacation but in&lt;br /&gt;general I like mind games that are related to what I do for a living.&lt;br /&gt;Inventing the idea is the fun part of innovation, then comes the&lt;br /&gt;hard part of actually doling out all the details, writing code and&lt;br /&gt;testing code and selling the ideas. This is the work part.&lt;br /&gt;&lt;br /&gt;I will follow this posting with a high level view on the ideas as far&lt;br /&gt;they've been developed so far. In parallel I'll also "dump" the ideas&lt;br /&gt;into code format. I like to think of my coding as a "brain dump", I&lt;br /&gt;have a fairly unusual way of writing code. I think about the problem&lt;br /&gt;for a long time and when I'm satisfied I write the code for it. I then&lt;br /&gt;write all the code with only a very minimal set of compilation and&lt;br /&gt;test cases. The main idea of coding in this phase is still design, so&lt;br /&gt;in principal I write a design in the form of code. This also means&lt;br /&gt;that I try to write as much comments as possible since I know that I&lt;br /&gt;will otherwise forget my ideas. Working for MySQL has made me much&lt;br /&gt;more aware of software engineering issues as well, so today I do also&lt;br /&gt;a bit of thinking on software engineering as well in the design.&lt;br /&gt;&lt;br /&gt;An architecture for the design is obviously very important, and the&lt;br /&gt;architecture has borrowed heavily from the way the Linux kernel is&lt;br /&gt;designed with lots of interfaces similar to the VFS interface in&lt;br /&gt;Linux using a struct of a set of function pointers.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-4650864656496413941?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/4650864656496413941/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=4650864656496413941' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4650864656496413941'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/4650864656496413941'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/07/1-thoughts-on-new-ndb-api-baseline.html' title='1. Thoughts on a new NDB API, Baseline thoughts'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-1403345673164572451</id><published>2008-06-06T16:30:00.002+02:00</published><updated>2008-06-06T16:39:10.100+02:00</updated><title type='text'>DTrace probes in MySQL and MySQL Cluster</title><content type='html'>I've worked on DTrace probes for a while now. It's&lt;br /&gt;a really interesting tool. I've worked on MySQL Cluster&lt;br /&gt;code since 1996 but this is the most advanced tool&lt;br /&gt;I've used to see exactly what's going on inside the&lt;br /&gt;MySQL Server and the data nodes.&lt;br /&gt;&lt;br /&gt;I'm still at an early stage of using these DTrace probes&lt;br /&gt;and there is still some work before they are publishable&lt;br /&gt;but one can see very well what's going on inside the&lt;br /&gt;processes in real-time.&lt;br /&gt;&lt;br /&gt;My first finding was that I quickly discovered that CPU&lt;br /&gt;percentage that is reported at 1% in prstat in Solaris&lt;br /&gt;actually means that it uses 64% of a CPU thread 1% is the&lt;br /&gt;percentage of the total CPU resources, this is different&lt;br /&gt;to what I'm used to from top.&lt;br /&gt;&lt;br /&gt;The benchmark I'm analysing is the same DBT2 I've used in&lt;br /&gt;a fairly long line of analysis on MySQL Cluster performance&lt;br /&gt;over the last 2 years. This benchmark can be downloaded&lt;br /&gt;from www.iclaustron.com/downloads.html. It's a DBT2 based&lt;br /&gt;on version 0.37 with lots of additions to make it work with&lt;br /&gt;running multiple MySQL Server instances as is the case &lt;br /&gt;with MySQL Cluster. Currently I'm running this on Solaris so&lt;br /&gt;there will soon be a new release with some fixes needed to&lt;br /&gt;run this benchmark on Solaris.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-1403345673164572451?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/1403345673164572451/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=1403345673164572451' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1403345673164572451'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/1403345673164572451'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/06/dtrace-probes-in-mysql-and-mysql.html' title='DTrace probes in MySQL and MySQL Cluster'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7327140451849290590</id><published>2008-03-26T03:52:00.004+01:00</published><updated>2008-03-26T14:00:03.297+01:00</updated><title type='text'>Visited Hadoop Conference</title><content type='html'>NOTE: Any comments in this blog entry is based on my personal thoughts after visiting the Hadoop conference and doesn't represent any current plans within MySQL.&lt;br /&gt;&lt;br /&gt;I visited the Hadoop conference today which was a very interesting event. The room was filled to its limit, people were even standing up in lack of chairs. Probably around 300 people or so.&lt;br /&gt;&lt;br /&gt;It was interesting to see the wide scope of web-scale problems that could be attacked using Hadoop. The major disruptive feature in Hadoop is the MapReduce solution to solving parallel data analysis problems.&lt;br /&gt;&lt;br /&gt;One piece that I started thinking of was how one could introduce the MapReduce into SQL. One presentation of HIVE showed an interesting approach of how to solve this problem. I thought a bit on how one could integrate a MapReduce solution in MySQL and there are certainly a lot of problems to solve but I got a few interesting ideas.&lt;br /&gt;&lt;br /&gt;The concept of being able to query both business data stored in a database and web-based logs and other type of massive amounts of data is certainly an interesting problem to consider.&lt;br /&gt;&lt;br /&gt;In principle what one can add by introducing MapReduce into MySQL is the ability to handle streaming queries (queries that use dataflows as input table(s) and dataflows as output table).&lt;br /&gt;&lt;br /&gt;However the actual implementation of Hadoop and HBase still were very much in their infancies so availability and reliability were far away from always on and also performance wasn't yet a focus.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7327140451849290590?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7327140451849290590/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7327140451849290590' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7327140451849290590'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7327140451849290590'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/03/visited-hadoop-conference.html' title='Visited Hadoop Conference'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3885123557270056458</id><published>2008-03-25T03:38:00.006+01:00</published><updated>2008-03-28T03:14:00.662+01:00</updated><title type='text'>MySQL Architecture Workshop</title><content type='html'>We had a workshop in Stockholm in early March to discuss what can be done to innovate MySQL in a number of areas. Most of the work here will not be useful code in yet a year or two and a lot longer before it'll be used in Enterprise Ready binaries. Obviously there is no guarantee that this early work will reach production binaries. This work is part of an aim at advancing the MySQL Architecture in the next few years.&lt;br /&gt;&lt;br /&gt;One interesting topic we discussed was Pushdown of Query Fragments to Storage Engine.&lt;br /&gt;&lt;br /&gt;A Query Fragment is a piece of an SQL query, for example in a 3-way join any join of 2 tables in this query is a Query Fragment, also the full query is a Query Fragment. As part of this interface the storage engine can decide to perform its own optimising using a new interface or it could rely on the MySQL Server to handle this optimisation. If the Storage Engine decides it can handle the Query Fragment and the optimiser decides to use this Query Fragment then the execution of this Query Fragment will be executed using the traditional Storage Engine API as if the Query Fragment was a normal table.&lt;br /&gt;&lt;br /&gt;There are many engines that could make use of this new interface. Another interesting use of this interface is to implement parallel query support for the MySQL Server using this new interface. We hope to build a prototype of this sometime this year.&lt;br /&gt;&lt;br /&gt;Please provide comments on this development on this blog, the development is in such an early phase that input is very welcome.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3885123557270056458?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3885123557270056458/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3885123557270056458' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3885123557270056458'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3885123557270056458'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/03/mysql-architecture-workshop.html' title='MySQL Architecture Workshop'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7894229631394957227</id><published>2008-03-25T03:05:00.004+01:00</published><updated>2008-03-28T11:41:22.517+01:00</updated><title type='text'>Visiting Internal Sun Technology Conference</title><content type='html'>My first real chance to meet up with my new colleagues at Sun was an internal technology conference at Sun. It was interesting to listen to what's cooking within Sun.&lt;br /&gt;&lt;br /&gt;We got a presentation of Data Centers and their impact on the environment and it immediately triggered me to start thinking of how we can interact with power save functions from the MySQL Code. It was also interesting to see slides on how computer architecture is developing, this can be put into thinking about how the MySQL architecture should progress over the next few years.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7894229631394957227?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7894229631394957227/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7894229631394957227' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7894229631394957227'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7894229631394957227'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/03/visiting-internal-sun-technology.html' title='Visiting Internal Sun Technology Conference'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-2244881989248288756</id><published>2008-03-25T02:48:00.002+01:00</published><updated>2008-03-28T04:46:15.178+01:00</updated><title type='text'>Visiting Family History Technology Workshop at BYU</title><content type='html'>On the 13th of March I attended an interesting workshop on techhnology for Genealogy. My interest in this is based on interest in genealogy itself (my family tree contains currently about 3000 persons from various parts of Sweden down to some farmers in northern Sweden born around 1400) and my interest in technology and in particular how MySQL and MySQL Cluster can be used for genealogy applications. Being an LDS myself also adds to my interest in the subject.&lt;br /&gt;&lt;br /&gt;The LDS church has developed a Web API FamilySearchAPI where genealogists through their genealogy software can work on a common database where they can add, edit information about our ancestors. The system handling this system currently contains 2.2 PB of data and is going to grow significantly as images and more genealogy information is added.&lt;br /&gt;&lt;br /&gt;There were quite a few interesting discussions on how to link information between the source information (scanned images of historical documents), transscribed information from sources and derived family trees. The most complex problem in this application is the fuzziness of the base data and that different genealogists can have many different opinion about how to interpret the fuzzy base data. Thus in order to solve the problem one has to handle quality of genealogists somehow in the model.&lt;br /&gt;&lt;br /&gt;From a database point of view this application requires a huge system with large clusters of information, it contains one part which is the base data (the scanned images) and this is typically stored in a large clustered file system containing many petabytes of data. Then the derived data is smaller but given that all versions need to be stored will still be a really huge data set and this is a fairly traditional relational database with large amounts of relations between data.&lt;br /&gt;&lt;br /&gt;So what I take home from the workshop is ideas on what MySQL and MySQL Cluster should support in 3-5 years from now to be able to work in applications like this one.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-2244881989248288756?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/2244881989248288756/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=2244881989248288756' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2244881989248288756'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/2244881989248288756'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/03/visiting-family-history-technology.html' title='Visiting Family History Technology Workshop at BYU'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-3081678119376712029</id><published>2008-03-25T02:29:00.006+01:00</published><updated>2008-03-25T14:46:10.862+01:00</updated><title type='text'>Speeding up MySQL by 36% on the T2000</title><content type='html'>This post will focus on the performance tuning work that we've been working on since December 2007 on the Sun T2000 server. We got a nice speedup of 36% with fairly small efforts and we've got good hope we can improve performance a great deal more. This effort is part of a new effort at MySQL to improve performance both on Solaris and Linux platforms and to some extent Windows as well. This report focuses on T2000 using Solaris.&lt;br /&gt;&lt;br /&gt;T1000 and T2000 are the first CoolThreads servers from Sun with the UltraSPARC T1 processors. The T1 is very energy efficient, which is extremely important to modern datacenters. On the other hand, leveraging the massive amount of thread-level parallelism (32 concurrent threads) provided by the CoolThreads servers is the key to getting good performance. As the CoolThreads servers are used by many Sun customers to run web facing workloads, making sure that MySQL runs well on this platform is important to Sun and MySQL customers, and also to the success of the CoolThreads servers and MySQL.&lt;br /&gt;&lt;br /&gt;Note: This work was started long before it was known that MySQL was to be acquired by Sun Microsystems. The actual work done for this tuning was done by Rayson in the performance team at MySQL.&lt;br /&gt;&lt;br /&gt;The workload that we used was sysbench, which is a very simple benchmark. In particular, we only ran read-only OLTP sysbench to perform this tuning work. The reason behind this is that if MySQL does not scale well with a simple read-only OLTP workload, then it would not scale well with more complex workloads, yet using a more complex workload would need more time to setup and run.&lt;br /&gt;&lt;br /&gt;This is a list of things that we tried.&lt;br /&gt;&lt;br /&gt;1) Hardware setup and software versions used&lt;br /&gt;============================================&lt;br /&gt;The compiler version:&lt;br /&gt;&gt; cc -V&lt;br /&gt;cc: Sun C 5.9 SunOS_sparc Build47_dlight 2007/05/22&lt;br /&gt;usage: cc [ options] files. Use 'cc -flags' for details&lt;br /&gt;&lt;br /&gt;Solaris version:&lt;br /&gt;&gt; cat /etc/release&lt;br /&gt;Solaris 10 11/06 s10s_u3wos_10 SPARC&lt;br /&gt;Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.&lt;br /&gt;Use is subject to license terms.&lt;br /&gt;Assembled 14 November 2006&lt;br /&gt;&lt;br /&gt;For each run, 5 results were collected, and we discarded the best and the worst results, and then averaged the remaining 3, and sysbench was invoked as follow:&lt;br /&gt;&gt; ./sysbench --test=oltp --num-threads=32 --max-time=60 --max-requests=0 --oltp-read-only=on run&lt;br /&gt;&lt;br /&gt;Using default configuration of MySQL 5.0.45 and read-only OLTP sysbench 0.4.8 on a Sun T2000 running at 1GHz, the throughput measured was 1209 transactions per second.&lt;br /&gt;&lt;br /&gt;2) Compiling with -fast&lt;br /&gt;=======================&lt;br /&gt;Since the workload is CPU intensive with very few I/O operations, we knew that compiler optimizations would be very beneficial to performance. As Sun used the -fast flag for compiling other CPU intensive benchmarks (e.g. SPEC CPU), using -fast was the first thing we tried; this was done by setting CFLAGS and CXXFLAGS to -fast before we ran the configure script.&lt;br /&gt;&lt;br /&gt;The throughput measured was 1241 transactions per second, or an improvement of 2.6%.&lt;br /&gt;&lt;br /&gt;3) Fixing headers for Sun Studio&lt;br /&gt;================================&lt;br /&gt;As using a higher optimization level gave us a small but nice improvement, we then looked for other opportunities from compiler optimizations. The first thing we noticed was that there were compiler directives that were not recognized by Sun Studio. And inlining was disabled as well.&lt;br /&gt;&lt;br /&gt;As the Sun Studio compiler supports inlining, we enabled it in InnoDB by modifying the header file: univ.i&lt;br /&gt;&lt;br /&gt;The throughput went up 3.1% to 1279 transactions per second.&lt;br /&gt;&lt;br /&gt;We also enabled prefetching by using "sparc_prefetch_read_many()" and "sparc_prefetch_write_many()". In fact there was a small performance degradation, the throughput decreased by -0.47% to 1273 transactions per second. Since we do enable prefetching on Linux when gcc is used as the build compiler, we believe that the Niagara has enough MLP (Memory Level Parallelism), which does not need a lot of help from prefetching. However, we will see if this could benefit other SPARC servers (UltraSPARC IV+ and SPARC64 come in mind), or x64 servers running Solaris (when Sun Studio is used as the build compiler).&lt;br /&gt;&lt;br /&gt;4) Locks in MySQL&lt;br /&gt;=================&lt;br /&gt;We then use plockstat to locate contented mutex locks in the system. Surprising, memory management in libc was accounted for a lot of the lock contentions. Since the default malloc/free is not&lt;br /&gt;optimized for threaded applications, we switched to mtmalloc. mtmalloc could be used without recompiling or relinking. We simply set the LD_PRELOAD environment variable in the shell that was used to start the MySQL server to interpose malloc/free calls.&lt;br /&gt;&lt;br /&gt;&gt; setenv LD_PRELOAD /usr/lib/libmtmalloc.so&lt;br /&gt;&lt;br /&gt;The gain was 8.1% to 1376 transactions per second.&lt;br /&gt;&lt;br /&gt;5) Caching Memory Inside MySQL&lt;br /&gt;==============================&lt;br /&gt;After we switched to mtmalloc, we still found that there were memory allocation and free patterns that were not efficient. We modified the code so that memory is cached inside MySQL instead of repeatedly allocated and freed. The idea is that we could trade memory usage for performance, but since most memory implementations cache memory when freed by the application instead of returning back to the operating system, with MySQL caching the memory would not only speed up the code, but also would not have impact on memory usage.&lt;br /&gt;&lt;br /&gt;Using DTrace, we found that there were over 20 places where malloc and free were called repeatedly. We picked one of the hot spots and modified the code.&lt;br /&gt;&lt;br /&gt;The change above gave us 1.5% to 1396 transactions per second.&lt;br /&gt;&lt;br /&gt;6) Using Largepages&lt;br /&gt;===================&lt;br /&gt;Using largepages on the UltraSPARC T1 platform can be beneficial to performance, as the TLBs in the T1 processor are shared by the 32 hardware threads.&lt;br /&gt;&lt;br /&gt;We use the environment variable MPSSHEAP to tell the operating system that we wanted to use largepages for the memory heap:&lt;br /&gt;&lt;br /&gt;&gt; setenv LD_PRELOAD mpss.so.1&lt;br /&gt;&gt; setenv MPSSHEAP 4M&lt;br /&gt;&lt;br /&gt;This change gave us a gain of 4.2% in throughput to 1455 transactions per second.&lt;br /&gt;&lt;br /&gt;7) Removing strdup() calls&lt;br /&gt;==========================&lt;br /&gt;Later on, we also found that there was an unnecessary strdup/free pattern in the code in mf_cache.c. Since the character string was not modified in the code, we removed the strdup call and simply passed the pointer to the string instead.&lt;br /&gt;&lt;br /&gt;This change gave us a gain of 0.34% to 1460 transactions per second.&lt;br /&gt;&lt;br /&gt;8) Feedback Profile and Link Time Optimizations&lt;br /&gt;===============================================&lt;br /&gt;We then compiled the MySQL server with feedback profile compiler optimization and link time optimization. We also trained MySQL in a training run, and then we recompile so that the compiler&lt;br /&gt;could use the information (execution behavior) collected during the training run. The compiler flags used: -xipo -xprofile, -xlinkopt -fast&lt;br /&gt;&lt;br /&gt;The combination of the compiler flags gave us a gain of 10.5% to 1614 transactions per second.&lt;br /&gt;&lt;br /&gt;9) Configuration File Tuning&lt;br /&gt;============================&lt;br /&gt;While tuning values in the configuration file is the most common way to get higher performance for MySQL, we did not spend a lot of time on it, however. The reason is that we were more interested in finding the bottlenecks in the code. Nevertheless, we did use a few flags:&lt;br /&gt;&lt;br /&gt;&gt; cat my.cnf&lt;br /&gt;[server]&lt;br /&gt;query_cache_size=0&lt;br /&gt;innodb_thread_concurrency=0&lt;br /&gt;innodb_buffer_pool_size=100M&lt;br /&gt;innodb_additional_mem_pool_size=20M&lt;br /&gt;&lt;br /&gt;And the final throughput was 1649 transactions per second.&lt;br /&gt;&lt;br /&gt;10) Things That Did Not Work as Expected&lt;br /&gt;========================================&lt;br /&gt;We also tried to use atomic instructions and ISM (Intimate Shared Memory), but both of them did not give us performance improvements.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Conclusion (for now)&lt;br /&gt;====================&lt;br /&gt;This was the initial work done to optimize MySQL on the Sun CoolThreads platform, and we got 36% better throughput than the default installation. As MySQL is now part of Sun, I expect that working with Sun engineers would allow MySQL to get even better performance and throughput.&lt;br /&gt;&lt;br /&gt;Currently, caching memory inside MySQL looks promising. We got 1.5% improvement by only modifying one place inside MySQL. Since there are quite a few places that we could apply this optimization, there is still room for further performance improvement!&lt;br /&gt;&lt;br /&gt;Finally, I should mention that some of the optimizations above also improved MySQL on x64 Linux Solaris. I will update everyone here in the near future. :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-3081678119376712029?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/3081678119376712029/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=3081678119376712029' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3081678119376712029'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/3081678119376712029'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/03/speeding-up-mysql-by-36-on-t2000.html' title='Speeding up MySQL by 36% on the T2000'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-7205932230087978794</id><published>2008-03-25T02:03:00.007+01:00</published><updated>2008-03-25T04:06:27.845+01:00</updated><title type='text'>Performance Guide for MySQL Cluster@MySQL Users Conference</title><content type='html'>A new MySQL Users Conference is coming up again. MySQL was acquired recently by Sun Microsystems and thus innovation within will happen at an even faster rate than previously. The Users Conference will contain a lot of interesting presentations on how to develop your MySQL Applications. So come to Santa Clara 15-17 April to take part of the development and discuss with many MySQLers how MySQL will be developed in the next few years. I've prepared a set of blogs that I will publish over the next few days to give you an idea of what's cooking within MySQL and I hope some of these blogs can persuade you to come there and give your opinion on where the future development should be heading.&lt;br /&gt;&lt;br /&gt;Personally I'll add my contribution to the talks at the MySQL Users Conference what to think about when building a high performance application based on MySQL Cluster. MySQL Cluster technology has matured over the last few years and is being used in more and more application categories. I even visited a conference on Family History Technology at BYU where I bumped into Matt Garner from FindMyPast (&lt;a href="http://www.findmypast.com"&gt;&lt;/a&gt;), he told me about how they had used MySQL Cluster for their Data Mining application and sustained a continous flow of 75.000 queries per second.&lt;br /&gt;&lt;br /&gt;In my talk I'm planning to cover how partitioning your application data can improve performance, how the use of cluster interconnects can improve response time by as much as a factor of 8, when to use the native NDB API's and when to use SQL, and how to use some new features recently developed.&lt;br /&gt;&lt;br /&gt;The MySQL Cluster development has been very focused on developing a feature set for the Telecom space for a few years, the last year development has started focusing more on general features to ensure we get improved performance also on complex SQL queries. Also development of improved features for usage of computers with high number of cores and execution threads (e.g. Niagara processor from Sun) and a number of other performance improvements are developed.&lt;br /&gt;&lt;br /&gt;The talk will be very much focused on how you as an application developer can make use of the enormous performance capabilities a MySQL Cluster provides you with. I also hope to be able to present some impressing benchmark numbers using a large cluster Intel has made available to our use.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-7205932230087978794?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/7205932230087978794/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=7205932230087978794' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7205932230087978794'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/7205932230087978794'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2008/03/performance-guide-for-mysql.html' title='Performance Guide for MySQL Cluster@MySQL Users Conference'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-8269796984438540298</id><published>2007-05-05T10:07:00.000+02:00</published><updated>2007-05-05T10:28:09.621+02:00</updated><title type='text'>Performance White Papers on MySQL Cluster</title><content type='html'>I've been working on some very interesting benchmarking using the&lt;br /&gt;DBT2 test suite developed by OSDL for MySQL Cluster. As part of this&lt;br /&gt;work I've made significant additions to the DBT2 test to enable clustered&lt;br /&gt;test runs. I've also developed a set of scripts to enable easy start and&lt;br /&gt;stop of MySQL Cluster processes.&lt;br /&gt;&lt;br /&gt;The benchmarks include comparisons of various connect methods using&lt;br /&gt;Ethernet and Dolphin Express cards. It also discusses improvements using&lt;br /&gt;the latest version of the Intel Core2 architecture.&lt;br /&gt;&lt;br /&gt;As part of this work I discovered a couple of essential performance&lt;br /&gt;optimisations and scalability optimisations. All these improvements&lt;br /&gt;are currently being integrated in MySQL Cluster Carrier Grade Edition.&lt;br /&gt;To enable those wanting "bleeding edge"-access I've also made the&lt;br /&gt;benchmark version available on www.iclaustron.com&lt;br /&gt;&lt;br /&gt;A short white paper and the full white paper can be downloaded from&lt;br /&gt;www.dolphinics.com and a MySQL-focused version of the white paper&lt;br /&gt;can be downloaded from www.mysql.com. www.iclaustron.com contains&lt;br /&gt;the exact links for all material available from various places.&lt;br /&gt;&lt;br /&gt;The full white paper contains also recommendations of HW architectures&lt;br /&gt;to use for optimal MySQL Cluster performance and scalability.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-8269796984438540298?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/8269796984438540298/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=8269796984438540298' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8269796984438540298'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/8269796984438540298'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2007/05/performance-white-papers-on-mysql.html' title='Performance White Papers on MySQL Cluster'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-117615600882009561</id><published>2007-04-09T23:39:00.000+02:00</published><updated>2007-04-10T00:05:24.156+02:00</updated><title type='text'>Performance Tuning of MySQL Cluster</title><content type='html'>As you probably have noticed my blog has been a bit quiet lately. I've&lt;br /&gt;been very busy with some very interesting developments. I've been&lt;br /&gt;working very hard on benchmarking of MySQL Cluster together with&lt;br /&gt;Dolphin and Intel. There will be a lot of material coming out from this&lt;br /&gt;the next couple of weeks. I've prepared a couple of white papers on&lt;br /&gt;how MySQL Cluster can scale to new heights.&lt;br /&gt;&lt;br /&gt;I'll have a presentation at the MySQL Users Conference&lt;br /&gt;http://www.mysqlconf.com&lt;br /&gt;where I'll describe all the interesting tidbits of how to tune MySQL&lt;br /&gt;Cluster performance. This will include both choice of HW, use of&lt;br /&gt;configuration parameters, which particular new features to especially&lt;br /&gt;look out for and so forth.&lt;br /&gt;&lt;br /&gt;If you want to prepare for this then download the white papers that&lt;br /&gt;will be available from MySQL and from Dolphin&lt;br /&gt;http://dev.mysql.com&lt;br /&gt;http://www.dolphinics.com&lt;br /&gt;They should be available there in about a week or so.&lt;br /&gt;&lt;br /&gt;Then come and listen to my session at the Users conference at&lt;br /&gt;5.30 on tuesday 24 april. If you still have questions or want to know&lt;br /&gt;even more then come and talk to me, I'll be around at many of&lt;br /&gt;the MySQL Cluster presentations, at the MySQL and the Dolphin&lt;br /&gt;booths in the exhibition hall.&lt;br /&gt;&lt;br /&gt;It's been a really interesting project to work on and it's great to be able&lt;br /&gt;to show all these new results that show how one can make use of&lt;br /&gt;MySQL Cluster in a really scalable way.&lt;br /&gt;&lt;br /&gt;For those that wish to try the benchmarks out themselves there will&lt;br /&gt;also be a large number of scripts made available to simplify set-up&lt;br /&gt;of MySQL Cluster for large clusters and a fairly heavily revised version&lt;br /&gt;of DBT2 that can be used to run large benchmarks using MySQL&lt;br /&gt;Cluster. More on this later, check out the blog next week for more&lt;br /&gt;info on this.&lt;br /&gt;&lt;br /&gt;For those of you that want to know the latest news on partitioning&lt;br /&gt;as well I will also make a presentation of this at the Users conference.&lt;br /&gt;It will include a description of partitioning in 5.1, how to make use&lt;br /&gt;of partitioning for better scalability in MySQL Cluster and finally also&lt;br /&gt;some notes about some new cool developments that are ready to&lt;br /&gt;be put into the 5.2 version of MySQL.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-117615600882009561?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/117615600882009561/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=117615600882009561' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/117615600882009561'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/117615600882009561'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2007/04/performance-tuning-of-mysql-cluster.html' title='Performance Tuning of MySQL Cluster'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-116430998888460243</id><published>2006-11-23T20:13:00.000+01:00</published><updated>2006-11-24T15:50:41.836+01:00</updated><title type='text'>Webinar on MySQL Cluster using Dolphin SuperSockets</title><content type='html'>My blogging hasn't been so active lately. Mostly due to that I've been busy on&lt;br /&gt;other things. One of the things that have kept me busy the last few months&lt;br /&gt;is a project which I highly enjoy. I've been performing a benchmark study&lt;br /&gt;of MySQL Cluster using Dolphin SuperSockets. Performance is one of my&lt;br /&gt;favourite topics and a parallel database like MySQL Cluster has a wide array&lt;br /&gt;of performance challenges that makes it very interesting to optimize it.&lt;br /&gt;&lt;br /&gt;I will present the results in two webinars on the 30th Nov and 13 dec. The&lt;br /&gt;webinars will also provide some input to the features in Dolphin&lt;br /&gt;SuperSockets and MySQL Cluster that enables high performance and&lt;br /&gt;real-time characteristics. With these changes to MySQL Cluster and using&lt;br /&gt;the Dolphin SuperSockets MySQL Cluster becomes even more adapted for&lt;br /&gt;all types of real-time applications.&lt;br /&gt;See:&lt;br /&gt;http://www.mysql.com/news-and-events/web-seminars/&lt;br /&gt;&lt;br /&gt;Performing this work has been an interesting enterprise in finding out how&lt;br /&gt;to best make use of the Dolphin hardware using MySQL Cluster. I found a&lt;br /&gt;number of interesting ways where 1+1 = 3, meaning I've found optimisations&lt;br /&gt;that can be done in MySQL Cluster that are especially effective if using&lt;br /&gt;Dolphin SuperSockets. So as a result of this some very interesting&lt;br /&gt;achievements have been made.&lt;br /&gt;&lt;br /&gt; - A completely interrupt-free execution of ndbd nodes in MySQL Cluster&lt;br /&gt;  using Dolphin SuperSockets.&lt;br /&gt; - Real-time features added to MySQL Cluster enabling much faster response&lt;br /&gt;    times.&lt;br /&gt; - Possibility to lock threads to CPU's in MySQL Cluster enabling a higher level&lt;br /&gt;    of control over the execution environment.&lt;br /&gt; - Possibility to lock pages in main memory removing any risk of swapping&lt;br /&gt; - Possibility to choose between polling and interrupt-driven mechanisms in&lt;br /&gt;    ndbd kernel&lt;br /&gt;&lt;br /&gt;The combination of MySQL Cluster and Dolphin SuperSockets becomes a truly&lt;br /&gt;amazing real-time machine. With those added features in place and using&lt;br /&gt;Dolphin SuperSockets I've also seen how MySQL Cluster can take yet another&lt;br /&gt;step on its on-line recovery features. Using those real-time features it is&lt;br /&gt;possible to get node failover times down to around 10 milliseconds.&lt;br /&gt;MySQL Cluster was already before market leading in this respect, with this&lt;br /&gt;feature the gap to the competitors is bound to increase.&lt;br /&gt;&lt;br /&gt;Most of the benchmark work have been focused on the DBT2 benchmark. Most&lt;br /&gt;benchmarks I've done in the past have been focused on applications written&lt;br /&gt;directly for the NDB API. So it's been interesting to see what one needs to do&lt;br /&gt;to make the MySQL Server be really fast.&lt;br /&gt;&lt;br /&gt;In order to run DBT2 with MySQL Cluster at first I had to adapt the DBT2&lt;br /&gt;benchmark for:&lt;br /&gt; - Parallel load of data&lt;br /&gt; - Parallel MySQL Servers while running the benchmark&lt;br /&gt; - Using MySQL Cluster features such as HASH indexes, PARTITIONING and&lt;br /&gt;    Disk Data for MySQL Cluster.&lt;br /&gt;&lt;br /&gt;I also got tired of remembering all the -i -t -h and so forth in the various&lt;br /&gt;scripts and used more real names for the parameters.&lt;br /&gt;&lt;br /&gt;There was also a number of performance bugs in DBT2. DBT2 is implementing&lt;br /&gt;the TPC-C specification and in a number of places the SQL queries were made&lt;br /&gt;such that there was a large number of unnecessary extra record fetches in some&lt;br /&gt;queries.&lt;br /&gt;&lt;br /&gt;I will soon upload the changes to DBT2 to SourceForge if anyone wants to use&lt;br /&gt;the same  benchmark.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-116430998888460243?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/116430998888460243/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=116430998888460243' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/116430998888460243'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/116430998888460243'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2006/11/webinar-on-mysql-cluster-using-dolphin.html' title='Webinar on MySQL Cluster using Dolphin SuperSockets'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-116430879111977896</id><published>2006-11-23T19:46:00.000+01:00</published><updated>2006-11-23T20:06:31.176+01:00</updated><title type='text'>Webinar on Partitioning</title><content type='html'>As mentioned in an earlier post the partitioning in 5.1 has reached a level of&lt;br /&gt;stability so that it can now be put to some heavier test. To spread further&lt;br /&gt;insights of the new partitioning feature I'll deliver two webinars next week&lt;br /&gt;and the week after that (29 nov and 5 Dec).&lt;br /&gt;&lt;br /&gt;You'll find a reference to both from the MySQL Home Page.&lt;br /&gt;http://www.mysql.com/&lt;br /&gt;&lt;br /&gt;The first one will give an introduction to partitioning in MySQL and&lt;br /&gt;describe the variants of partitioning that will be supported, which&lt;br /&gt;management variants that are possible and so forth.&lt;br /&gt;&lt;br /&gt;The second webinar is a follow-up that will do some repetition to&lt;br /&gt;ensure it can be viewed stand-alone but will mainly dive a little &lt;br /&gt;deeper into various areas of partitioning amongst other how it&lt;br /&gt;relates to MySQL Cluster.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-116430879111977896?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/116430879111977896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=116430879111977896' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/116430879111977896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/116430879111977896'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2006/11/webinar-on-partitioning.html' title='Webinar on Partitioning'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-115903964428950453</id><published>2006-09-23T21:08:00.000+02:00</published><updated>2006-09-23T21:27:24.300+02:00</updated><title type='text'>State of MySQL partitioning</title><content type='html'>It's been some time since I last blogged. I've been quite busy erasing all the&lt;br /&gt;bugs from the partitioning implementation in MySQL 5.1. At the moment&lt;br /&gt;there is 1 bug left in review and a few on its way into the main clone. The&lt;br /&gt;rest of the bugs are fixed and already in the 5.1 clone. So the next 5.1&lt;br /&gt;release (5.1.12) will have partitioning ready for tough tests. So if you have&lt;br /&gt;been waiting for partitioning to stabilise it's time to try it out now with your&lt;br /&gt;application and see how it works.&lt;br /&gt;&lt;br /&gt;I watched an interesting animation my daughter made about how partition&lt;br /&gt;pruning using dynamic PowerPoint slides. Really interesting to see what can&lt;br /&gt;be done if one knows how to handle these types of tools. She's quickly&lt;br /&gt;becoming our family authority on presentations.&lt;br /&gt;&lt;br /&gt;Lately we've also started working on some extensions for the partitioning&lt;br /&gt;hopefully ready for 5.2. We have introduced a new partitioning syntax like this:&lt;br /&gt;&lt;br /&gt;CREATE TABLE t1 (a char(10), b date)&lt;br /&gt;PARTITION BY RANGE (COLUMNS(b,a))&lt;br /&gt;(PARTITION p0 VALUES LESS THAN ('1999-01-01', "abc"),&lt;br /&gt; PARTITION p1 VALUES LESS THAN ('2001-01-01', MINVALUE),&lt;br /&gt; PARTITION p2 VALUES LESS THAN ('2003-01-01', MAXVALUE));&lt;br /&gt;&lt;br /&gt;The nice thing with this syntax is that it can be handled partition pruning with&lt;br /&gt;ranges in a very efficient manner. So the query:&lt;br /&gt;SELECT * FROM t1 WHERE b &lt;= '1999-06-01' AND b &gt;= '1999-02-01';&lt;br /&gt;can be optimised to only scan partition p1.&lt;br /&gt;&lt;br /&gt;We are also working on indexes that are partitioned independent of the base&lt;br /&gt;table and also a couple of other features. As usual what actually goes into the&lt;br /&gt;next release is uncertain.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-115903964428950453?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/115903964428950453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=115903964428950453' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/115903964428950453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/115903964428950453'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2006/09/state-of-mysql-partitioning.html' title='State of MySQL partitioning'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-115211159472825345</id><published>2006-07-05T15:14:00.004+02:00</published><updated>2006-07-05T16:59:54.753+02:00</updated><title type='text'>PARTITION by a date column</title><content type='html'>One of the most common usage of partitioning is where one wants to partition&lt;br /&gt;by date. One might want to have one partition per year or per month or per&lt;br /&gt;week or per day. This blog entry shows how to handle this requirement using&lt;br /&gt;MySQL 5.1.&lt;br /&gt;&lt;br /&gt;The most common method to partition in this case is by range.&lt;br /&gt;Partitioning in 5.1 uses a function on one or more fields of the table. In 5.1&lt;br /&gt;there are some requirements on these fields if unique indexes or primary keys&lt;br /&gt;also exist in the table. The reason is that 5.1 doesn't have support for global&lt;br /&gt;indexes. Development of support for this have however started so should be in&lt;br /&gt;some future release of MySQL.&lt;br /&gt;&lt;br /&gt;In 5.1 functions have to return an integer. There are two functions that has&lt;br /&gt;special support in the MySQL server. These are TO_DAYS() and YEAR(). Both&lt;br /&gt;of these functions take a DATE or DATETIME argument and return an integer.&lt;br /&gt;YEAR() returns the year and TO_DAYS() returns the number of days passed&lt;br /&gt;since a particular start date.&lt;br /&gt;&lt;br /&gt;The MySQL optimizer has special support for these two partition functions. It&lt;br /&gt;knows that those functions are strictly increasing and use this knowledge to&lt;br /&gt;discover that queries such as:&lt;br /&gt;&lt;br /&gt;SELECT * from t1 WHERE a &lt;= '1991-01-31' AND a &gt;= '1991-01-01';&lt;br /&gt;with a partition function  PARTITION BY RANGE (to_days(a)) can be mapped&lt;br /&gt;to a range of partition function values starting at&lt;br /&gt;TO_DAYS('1991-01-01') and ending at TO_DAYS("1999-01-31")&lt;br /&gt;&lt;br /&gt;Thus the MySQL Server can map TO_DAYS('1991-01-01') to a starting partition&lt;br /&gt;and TO_DAYS('1991-01-31') to an ending partition. Thus we only need to scan&lt;br /&gt;partitions in a range of partitions.&lt;br /&gt;&lt;br /&gt;Most functions don't have this nice mapping from value range to partition&lt;br /&gt;range. The functions TO_DAYS(date) and YEAR(date) are known by the&lt;br /&gt;MySQL optimizer to have this attribute and they will thus be better for range&lt;br /&gt;optimisations. Also a partition function on a field which is an integer field&lt;br /&gt;where the function is the field by itself will have this characteristic. Other&lt;br /&gt;functions won't, theoretically many more can be handled but this requires&lt;br /&gt;special care of overflow handling to be correct and this will be added in&lt;br /&gt;some future MySQL release.&lt;br /&gt;&lt;br /&gt;So with this knowledge let's set up a that does partition by month.&lt;br /&gt;&lt;br /&gt;CREATE TABLE t1 (a date)&lt;br /&gt;PARTITION BY RANGE(TO_DAYS(a))&lt;br /&gt;(PARTITION p3xx VALUES LESS THAN (TO_DAYS('2004-01-01'),&lt;br /&gt; PARTITION p401 VALUES LESS THAN (TO_DAYS('2004-02-01'),&lt;br /&gt; PARTITION p402 VALUES LESS THAN (TO_DAYS('2004-03-01'),&lt;br /&gt; PARTITION p403 VALUES LESS THAN (TO_DAYS('2004-04-01'),&lt;br /&gt; PARTITION p404 VALUES LESS THAN (TO_DAYS('2004-05-01'),&lt;br /&gt; PARTITION p405 VALUES LESS THAN (TO_DAYS('2004-06-01'),&lt;br /&gt; PARTITION p406 VALUES LESS THAN (TO_DAYS('2004-07-01'),&lt;br /&gt; PARTITION p407 VALUES LESS THAN (TO_DAYS('2004-08-01'),&lt;br /&gt; PARTITION p408 VALUES LESS THAN (TO_DAYS('2004-09-01'),&lt;br /&gt; PARTITION p409 VALUES LESS THAN (TO_DAYS('2004-10-01'),&lt;br /&gt; PARTITION p410 VALUES LESS THAN (TO_DAYS('2004-11-01'),&lt;br /&gt; PARTITION p411 VALUES LESS THAN (TO_DAYS('2004-12-01'),&lt;br /&gt; PARTITION p412 VALUES LESS THAN (TO_DAYS('2005-01-01'),&lt;br /&gt; PARTITION p501 VALUES LESS THAN (TO_DAYS('2005-02-01'),&lt;br /&gt; PARTITION p502 VALUES LESS THAN (TO_DAYS('2005-03-01'),&lt;br /&gt; PARTITION p503 VALUES LESS THAN (TO_DAYS('2005-04-01'),&lt;br /&gt; PARTITION p504 VALUES LESS THAN (TO_DAYS('2005-05-01'),&lt;br /&gt; PARTITION p505 VALUES LESS THAN (TO_DAYS('2005-06-01'),&lt;br /&gt; PARTITION p506 VALUES LESS THAN (TO_DAYS('2005-07-01'),&lt;br /&gt; PARTITION p507 VALUES LESS THAN (TO_DAYS('2005-08-01'),&lt;br /&gt; PARTITION p508 VALUES LESS THAN (TO_DAYS('2005-09-01'),&lt;br /&gt; PARTITION p509 VALUES LESS THAN (TO_DAYS('2005-10-01'),&lt;br /&gt; PARTITION p510 VALUES LESS THAN (TO_DAYS('2005-11-01'),&lt;br /&gt; PARTITION p511 VALUES LESS THAN (TO_DAYS('2005-12-01'),&lt;br /&gt; PARTITION p512 VALUES LESS THAN (TO_DAYS('2006-01-01'),&lt;br /&gt; PARTITION p601 VALUES LESS THAN (TO_DAYS('2006-02-01'),&lt;br /&gt; PARTITION p602 VALUES LESS THAN (TO_DAYS('2006-03-01'),&lt;br /&gt; PARTITION p603 VALUES LESS THAN (TO_DAYS('2006-04-01'),&lt;br /&gt; PARTITION p604 VALUES LESS THAN (TO_DAYS('2006-05-01'),&lt;br /&gt; PARTITION p605 VALUES LESS THAN (TO_DAYS('2006-06-01'),&lt;br /&gt; PARTITION p606 VALUES LESS THAN (TO_DAYS('2006-07-01'),&lt;br /&gt; PARTITION p607 VALUES LESS THAN (TO_DAYS('2006-08-01'));&lt;br /&gt;&lt;br /&gt;Then load the table with data. Now you might want to see the&lt;br /&gt;data from Q3 2004. So you issue the query:&lt;br /&gt;SELECT * from t1&lt;br /&gt;WHERE a &gt;= '2004-07-01' AND a &lt;= '2004-09-30';&lt;br /&gt;This should now only scan partition p407, p408, p409. You can&lt;br /&gt;check this by using EXPLAIN PARTITIONS on the query:&lt;br /&gt;EXPLAIN PARTITIONS SELECT * from t1&lt;br /&gt;WHERE a &gt;= '2004-07-01' AND a &lt;= '2004-09-30';&lt;br /&gt;&lt;br /&gt;You can also get similar results with more complicated expressions.&lt;br /&gt;Assume we want to summarize on all measured Q3's so far.&lt;br /&gt;SELECT * from t1&lt;br /&gt;WHERE (a &gt;= '2004-07-01' AND a &lt;= '2004-09-30') OR&lt;br /&gt;           (a &gt;= '2005-07-01' AND a &lt;= '2005-09-30');&lt;br /&gt;&lt;br /&gt;Using EXPLAIN PARTITIONS we'll discover the expected result that this&lt;br /&gt;will only scan partitions p407, p408, p409, p507, p508 and p509.&lt;br /&gt;&lt;br /&gt;When july comes to its end it is then time to add a new partition for&lt;br /&gt;august 2006 which we do with a quick command:&lt;br /&gt;ALTER TABLE t1 ADD PARTITION&lt;br /&gt;(PARTITION p608 VALUES LESS THAN (TO_DAYS('2006-09-01'));&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-115211159472825345?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/115211159472825345/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=115211159472825345' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/115211159472825345'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/115211159472825345'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2006/07/partition-by-date-column_115211159472825345.html' title='PARTITION by a date column'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-114906633153190273</id><published>2006-05-31T10:36:00.000+02:00</published><updated>2006-05-31T11:05:31.550+02:00</updated><title type='text'>EXPLAIN to understand partition pruning</title><content type='html'>As part of the partitioning development in MySQL 5.1 we've added the ability to&lt;br /&gt;check which partitions of a table that is actually accessed in a particular query.&lt;br /&gt;As partitions in a sense can be a sort of index this is an important feature to&lt;br /&gt;help understand performance impact of a query.&lt;br /&gt;&lt;br /&gt;The method to use this feature is the normal EXPLAIN command with an&lt;br /&gt;added keyword PARTITIONS. So e.g.&lt;br /&gt;EXPLAIN PARTITIONS select * from t1;&lt;br /&gt;&lt;br /&gt;So a slightly more useful example would be&lt;br /&gt;CREATE TABLE t1 (a int)&lt;br /&gt;PARTITION BY RANGE (a)&lt;br /&gt;(PARTITION p0 VALUES LESS THAN (10),&lt;br /&gt; PARTITION p1 VALUES LESS THAN (20),&lt;br /&gt; PARTITION p2 VALUES LESS THAN (30));&lt;br /&gt;&lt;br /&gt;Now if we do an equal query we should only need to access one partition:&lt;br /&gt;This will be verified by the command:&lt;br /&gt;EXPLAIN PARTITIONS select * from t1 WHERE a = 1;&lt;br /&gt;/* Result in p0 being displayed in the list of partitions */&lt;br /&gt;&lt;br /&gt;A range query will also be pruned nicely in this case (a is a function that is&lt;br /&gt;increasing and thus range optimisations can be performed, YEAR(date) and&lt;br /&gt;FUNC_TO_DAYS(date) are two other functions that are known to be&lt;br /&gt;monotonically increasing.&lt;br /&gt;&lt;br /&gt;EXPLAIN PARTITIONS select * from t1 WHERE a &lt;= 1 AND a&gt;= 12;&lt;br /&gt;/* Result in the range being mapped to p0, p1 */&lt;br /&gt;&lt;br /&gt;LIST partitions will be pruned in the same cases as RANGE for range-pruning&lt;br /&gt;of partitions.&lt;br /&gt;&lt;br /&gt;HASH partitioning has no natural concept for ranges since different values&lt;br /&gt;map more or less randomly into partitions. We do however apply an&lt;br /&gt;optimisation for short ranges such that the following will happen.&lt;br /&gt;&lt;br /&gt;CREATE TABLE t1 (a int)&lt;br /&gt;PARTITION BY HASH (a)&lt;br /&gt;PARTITIONS 10;&lt;br /&gt;&lt;br /&gt;EXPLAIN PARTITIONS select * from t1 WHERE a &lt; 3 AND a &gt; 0; &lt;br /&gt;In this case the range consists of only two values 1 and 2. Thus we simply map&lt;br /&gt;the interval to a = 1 OR a = 2 and here we get p1 and p2 as the partitions to&lt;br /&gt;use.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-114906633153190273?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/114906633153190273/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=114906633153190273' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/114906633153190273'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/114906633153190273'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2006/05/explain-to-understand-partition.html' title='EXPLAIN to understand partition pruning'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-114906101331339643</id><published>2006-05-31T08:26:00.000+02:00</published><updated>2006-05-31T09:36:53.333+02:00</updated><title type='text'>Information Schemas for Partitions</title><content type='html'>As part of the work in developing partitioning support for 5.1 a new&lt;br /&gt;information schema table has been added. This table can be used to&lt;br /&gt;retrieve information about properties of individual partitions.&lt;br /&gt;&lt;br /&gt;To query this table you can issue a query like:&lt;br /&gt;SELECT * FROM information_schema.partitions WHERE&lt;br /&gt;table_schema = "database_name" AND table_name = "name_of_table";&lt;br /&gt;&lt;br /&gt;The result of this particular query will be one record per partition in&lt;br /&gt;the table with info about the properties of these partitions.&lt;br /&gt;&lt;br /&gt;A query on a non-partitioned table will produce a similar output&lt;br /&gt;although most fields will be NULL. The information_schema.partitions&lt;br /&gt;table is not yet implemented for MySQL Cluster so for MySQL Cluster&lt;br /&gt;tables the output will be all NULLs on the partition specific information.&lt;br /&gt;&lt;br /&gt;Below follows a short description of the fields in this information&lt;br /&gt;schema table:&lt;br /&gt;1) TABLE_CATALOG: this field is always NULL&lt;br /&gt;2) TABLE_SCHEMA: This field contains the database name of the table&lt;br /&gt;3) TABLE_NAME: Table name&lt;br /&gt;4) PARTITION_NAME: Name of the partition&lt;br /&gt;5) SUBPARTITION_NAME: Name of subpartition if one exists otherwise&lt;br /&gt;NULL&lt;br /&gt;6) PARTITION_ORDINAL_POSITION: All partitions are ordered in the&lt;br /&gt;same order as they were defined, this order can change as management&lt;br /&gt;of partitions add, drop and reorganize partitions. This number is the&lt;br /&gt;current order with number 1 as the number of the first partition&lt;br /&gt;7) SUBPARTITION_ORDINAL_POSITION: Order of subpartitions within a&lt;br /&gt;partition, starts at 1&lt;br /&gt;8) PARTITION_METHOD: Any of the partitioning variants: RANGE, LIST,&lt;br /&gt;HASH, LINEAR HASH, KEY, LINEAR KEY&lt;br /&gt;9) SUBPARTITION_METHOD: Any of the subpartitioning variants: HASH,&lt;br /&gt;LINEAR HASH, KEY, LINEAR KEY&lt;br /&gt;10) PARTITION_EXPRESSION: This is the expression for the partition&lt;br /&gt;function as expressed when creating partitioning on the table through&lt;br /&gt;CREATE TABLE or ALTER TABLE.&lt;br /&gt;11) SUBPARTITION_EXPRESSION: Same for the subpartition function&lt;br /&gt;12) PARTITION_DESCRIPTION: This is used for RANGE and LIST partitions:&lt;br /&gt;RANGE: Contains the value defined in VALUES LESS THAN. This is an&lt;br /&gt;integer value, so if the CREATE TABLE contained a constant expression&lt;br /&gt;this contains the evaluated expression, thus an integer value&lt;br /&gt;LIST: The values defined in VALUES IN. This is a comma-separated list of&lt;br /&gt;integer values.&lt;br /&gt;13) TABLE_ROWS:  Although its name indicates that it is the number of&lt;br /&gt;rows in the table, it is actually the number of rows in the partition.&lt;br /&gt;14) AVG_ROW_LENGTH, DATA_LENGTH, MAX_DATA_LENGTH,&lt;br /&gt; INDEX_LENGTH, DATA_FREE, CREATE_TIME, UPDATE_TIME, CHECK_TIME,&lt;br /&gt; CHECKSUM:&lt;br /&gt;All these fields are to be interpreted in the same as for a normal table&lt;br /&gt;except that the value is the value for the partition and not for the table.&lt;br /&gt;23) PARTITION_COMMENT: Comment on the partition&lt;br /&gt;24) NODEGROUP: This is the nodegroup of the partition. This is only&lt;br /&gt;relevant for MySQL Cluster.&lt;br /&gt;25) TABLESPACE_NAME: This is the tablespace name of the partition.&lt;br /&gt;This is currently not relevant for any storage engine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-114906101331339643?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/114906101331339643/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=114906101331339643' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/114906101331339643'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/114906101331339643'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2006/05/information-schemas-for-partitions.html' title='Information Schemas for Partitions'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-14455177.post-114348444040113294</id><published>2006-03-27T20:25:00.000+02:00</published><updated>2006-03-27T20:34:00.403+02:00</updated><title type='text'>Partition and Scanning</title><content type='html'>If a query does a SELECT the optimizer will discover which partitions&lt;br /&gt;that has to be scanned. In 5.1 these scans will still be made&lt;br /&gt;sequentially on one partition at a time. However only those partitions&lt;br /&gt;actually touched will be scanned which can improve performance on&lt;br /&gt;certain queries by a magnitude.&lt;br /&gt;&lt;br /&gt;The aim is to support parallel scan in a future release as it is to&lt;br /&gt;support parallel sort on the first table selected by the optimizer.&lt;br /&gt;This is an important long-term goal of partitioning to open up the&lt;br /&gt;MySQL architecture for many performance improvements on&lt;br /&gt;handling large data sizes. In 5.1 we have achieved quite a few of&lt;br /&gt;those goals but expect to see more goals achieved as new versions&lt;br /&gt;of MySQL hits the street burning :)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/14455177-114348444040113294?l=mikaelronstrom.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://mikaelronstrom.blogspot.com/feeds/114348444040113294/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=14455177&amp;postID=114348444040113294' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/114348444040113294'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/14455177/posts/default/114348444040113294'/><link rel='alternate' type='text/html' href='http://mikaelronstrom.blogspot.com/2006/03/partition-and-scanning.html' title='Partition and Scanning'/><author><name>Mikael Ronstrom</name><uri>http://www.blogger.com/profile/07134215866292829917</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry></feed>
