My blogging hasn't been so active lately. Mostly due to that I've been busy on
other things. One of the things that have kept me busy the last few months
is a project which I highly enjoy. I've been performing a benchmark study
of MySQL Cluster using Dolphin SuperSockets. Performance is one of my
favourite topics and a parallel database like MySQL Cluster has a wide array
of performance challenges that makes it very interesting to optimize it.
I will present the results in two webinars on the 30th Nov and 13 dec. The
webinars will also provide some input to the features in Dolphin
SuperSockets and MySQL Cluster that enables high performance and
real-time characteristics. With these changes to MySQL Cluster and using
the Dolphin SuperSockets MySQL Cluster becomes even more adapted for
all types of real-time applications.
See:
http://www.mysql.com/news-and-events/web-seminars/
Performing this work has been an interesting enterprise in finding out how
to best make use of the Dolphin hardware using MySQL Cluster. I found a
number of interesting ways where 1+1 = 3, meaning I've found optimisations
that can be done in MySQL Cluster that are especially effective if using
Dolphin SuperSockets. So as a result of this some very interesting
achievements have been made.
- A completely interrupt-free execution of ndbd nodes in MySQL Cluster
using Dolphin SuperSockets.
- Real-time features added to MySQL Cluster enabling much faster response
times.
- Possibility to lock threads to CPU's in MySQL Cluster enabling a higher level
of control over the execution environment.
- Possibility to lock pages in main memory removing any risk of swapping
- Possibility to choose between polling and interrupt-driven mechanisms in
ndbd kernel
The combination of MySQL Cluster and Dolphin SuperSockets becomes a truly
amazing real-time machine. With those added features in place and using
Dolphin SuperSockets I've also seen how MySQL Cluster can take yet another
step on its on-line recovery features. Using those real-time features it is
possible to get node failover times down to around 10 milliseconds.
MySQL Cluster was already before market leading in this respect, with this
feature the gap to the competitors is bound to increase.
Most of the benchmark work have been focused on the DBT2 benchmark. Most
benchmarks I've done in the past have been focused on applications written
directly for the NDB API. So it's been interesting to see what one needs to do
to make the MySQL Server be really fast.
In order to run DBT2 with MySQL Cluster at first I had to adapt the DBT2
benchmark for:
- Parallel load of data
- Parallel MySQL Servers while running the benchmark
- Using MySQL Cluster features such as HASH indexes, PARTITIONING and
Disk Data for MySQL Cluster.
I also got tired of remembering all the -i -t -h and so forth in the various
scripts and used more real names for the parameters.
There was also a number of performance bugs in DBT2. DBT2 is implementing
the TPC-C specification and in a number of places the SQL queries were made
such that there was a large number of unnecessary extra record fetches in some
queries.
I will soon upload the changes to DBT2 to SourceForge if anyone wants to use
the same benchmark.
My name is Mikael Ronstrom and I work for Hopsworks AB as Head of Data. I also assist companies working with NDB Cluster as self-employed consultant. I am a member of The Church of Jesus Christ of Latter Day Saints. I can be contacted at mikael dot ronstrom at gmail dot com for NDB consultancy services. The statements and opinions expressed on this blog are my own and do not necessarily represent those of Hopsworks AB.
Thursday, November 23, 2006
Webinar on Partitioning
As mentioned in an earlier post the partitioning in 5.1 has reached a level of
stability so that it can now be put to some heavier test. To spread further
insights of the new partitioning feature I'll deliver two webinars next week
and the week after that (29 nov and 5 Dec).
You'll find a reference to both from the MySQL Home Page.
http://www.mysql.com/
The first one will give an introduction to partitioning in MySQL and
describe the variants of partitioning that will be supported, which
management variants that are possible and so forth.
The second webinar is a follow-up that will do some repetition to
ensure it can be viewed stand-alone but will mainly dive a little
deeper into various areas of partitioning amongst other how it
relates to MySQL Cluster.
stability so that it can now be put to some heavier test. To spread further
insights of the new partitioning feature I'll deliver two webinars next week
and the week after that (29 nov and 5 Dec).
You'll find a reference to both from the MySQL Home Page.
http://www.mysql.com/
The first one will give an introduction to partitioning in MySQL and
describe the variants of partitioning that will be supported, which
management variants that are possible and so forth.
The second webinar is a follow-up that will do some repetition to
ensure it can be viewed stand-alone but will mainly dive a little
deeper into various areas of partitioning amongst other how it
relates to MySQL Cluster.
Saturday, September 23, 2006
State of MySQL partitioning
It's been some time since I last blogged. I've been quite busy erasing all the
bugs from the partitioning implementation in MySQL 5.1. At the moment
there is 1 bug left in review and a few on its way into the main clone. The
rest of the bugs are fixed and already in the 5.1 clone. So the next 5.1
release (5.1.12) will have partitioning ready for tough tests. So if you have
been waiting for partitioning to stabilise it's time to try it out now with your
application and see how it works.
I watched an interesting animation my daughter made about how partition
pruning using dynamic PowerPoint slides. Really interesting to see what can
be done if one knows how to handle these types of tools. She's quickly
becoming our family authority on presentations.
Lately we've also started working on some extensions for the partitioning
hopefully ready for 5.2. We have introduced a new partitioning syntax like this:
CREATE TABLE t1 (a char(10), b date)
PARTITION BY RANGE (COLUMNS(b,a))
(PARTITION p0 VALUES LESS THAN ('1999-01-01', "abc"),
PARTITION p1 VALUES LESS THAN ('2001-01-01', MINVALUE),
PARTITION p2 VALUES LESS THAN ('2003-01-01', MAXVALUE));
The nice thing with this syntax is that it can be handled partition pruning with
ranges in a very efficient manner. So the query:
SELECT * FROM t1 WHERE b <= '1999-06-01' AND b >= '1999-02-01';
can be optimised to only scan partition p1.
We are also working on indexes that are partitioned independent of the base
table and also a couple of other features. As usual what actually goes into the
next release is uncertain.
bugs from the partitioning implementation in MySQL 5.1. At the moment
there is 1 bug left in review and a few on its way into the main clone. The
rest of the bugs are fixed and already in the 5.1 clone. So the next 5.1
release (5.1.12) will have partitioning ready for tough tests. So if you have
been waiting for partitioning to stabilise it's time to try it out now with your
application and see how it works.
I watched an interesting animation my daughter made about how partition
pruning using dynamic PowerPoint slides. Really interesting to see what can
be done if one knows how to handle these types of tools. She's quickly
becoming our family authority on presentations.
Lately we've also started working on some extensions for the partitioning
hopefully ready for 5.2. We have introduced a new partitioning syntax like this:
CREATE TABLE t1 (a char(10), b date)
PARTITION BY RANGE (COLUMNS(b,a))
(PARTITION p0 VALUES LESS THAN ('1999-01-01', "abc"),
PARTITION p1 VALUES LESS THAN ('2001-01-01', MINVALUE),
PARTITION p2 VALUES LESS THAN ('2003-01-01', MAXVALUE));
The nice thing with this syntax is that it can be handled partition pruning with
ranges in a very efficient manner. So the query:
SELECT * FROM t1 WHERE b <= '1999-06-01' AND b >= '1999-02-01';
can be optimised to only scan partition p1.
We are also working on indexes that are partitioned independent of the base
table and also a couple of other features. As usual what actually goes into the
next release is uncertain.
Wednesday, July 05, 2006
PARTITION by a date column
One of the most common usage of partitioning is where one wants to partition
by date. One might want to have one partition per year or per month or per
week or per day. This blog entry shows how to handle this requirement using
MySQL 5.1.
The most common method to partition in this case is by range.
Partitioning in 5.1 uses a function on one or more fields of the table. In 5.1
there are some requirements on these fields if unique indexes or primary keys
also exist in the table. The reason is that 5.1 doesn't have support for global
indexes. Development of support for this have however started so should be in
some future release of MySQL.
In 5.1 functions have to return an integer. There are two functions that has
special support in the MySQL server. These are TO_DAYS() and YEAR(). Both
of these functions take a DATE or DATETIME argument and return an integer.
YEAR() returns the year and TO_DAYS() returns the number of days passed
since a particular start date.
The MySQL optimizer has special support for these two partition functions. It
knows that those functions are strictly increasing and use this knowledge to
discover that queries such as:
SELECT * from t1 WHERE a <= '1991-01-31' AND a >= '1991-01-01';
with a partition function PARTITION BY RANGE (to_days(a)) can be mapped
to a range of partition function values starting at
TO_DAYS('1991-01-01') and ending at TO_DAYS("1999-01-31")
Thus the MySQL Server can map TO_DAYS('1991-01-01') to a starting partition
and TO_DAYS('1991-01-31') to an ending partition. Thus we only need to scan
partitions in a range of partitions.
Most functions don't have this nice mapping from value range to partition
range. The functions TO_DAYS(date) and YEAR(date) are known by the
MySQL optimizer to have this attribute and they will thus be better for range
optimisations. Also a partition function on a field which is an integer field
where the function is the field by itself will have this characteristic. Other
functions won't, theoretically many more can be handled but this requires
special care of overflow handling to be correct and this will be added in
some future MySQL release.
So with this knowledge let's set up a that does partition by month.
CREATE TABLE t1 (a date)
PARTITION BY RANGE(TO_DAYS(a))
(PARTITION p3xx VALUES LESS THAN (TO_DAYS('2004-01-01'),
PARTITION p401 VALUES LESS THAN (TO_DAYS('2004-02-01'),
PARTITION p402 VALUES LESS THAN (TO_DAYS('2004-03-01'),
PARTITION p403 VALUES LESS THAN (TO_DAYS('2004-04-01'),
PARTITION p404 VALUES LESS THAN (TO_DAYS('2004-05-01'),
PARTITION p405 VALUES LESS THAN (TO_DAYS('2004-06-01'),
PARTITION p406 VALUES LESS THAN (TO_DAYS('2004-07-01'),
PARTITION p407 VALUES LESS THAN (TO_DAYS('2004-08-01'),
PARTITION p408 VALUES LESS THAN (TO_DAYS('2004-09-01'),
PARTITION p409 VALUES LESS THAN (TO_DAYS('2004-10-01'),
PARTITION p410 VALUES LESS THAN (TO_DAYS('2004-11-01'),
PARTITION p411 VALUES LESS THAN (TO_DAYS('2004-12-01'),
PARTITION p412 VALUES LESS THAN (TO_DAYS('2005-01-01'),
PARTITION p501 VALUES LESS THAN (TO_DAYS('2005-02-01'),
PARTITION p502 VALUES LESS THAN (TO_DAYS('2005-03-01'),
PARTITION p503 VALUES LESS THAN (TO_DAYS('2005-04-01'),
PARTITION p504 VALUES LESS THAN (TO_DAYS('2005-05-01'),
PARTITION p505 VALUES LESS THAN (TO_DAYS('2005-06-01'),
PARTITION p506 VALUES LESS THAN (TO_DAYS('2005-07-01'),
PARTITION p507 VALUES LESS THAN (TO_DAYS('2005-08-01'),
PARTITION p508 VALUES LESS THAN (TO_DAYS('2005-09-01'),
PARTITION p509 VALUES LESS THAN (TO_DAYS('2005-10-01'),
PARTITION p510 VALUES LESS THAN (TO_DAYS('2005-11-01'),
PARTITION p511 VALUES LESS THAN (TO_DAYS('2005-12-01'),
PARTITION p512 VALUES LESS THAN (TO_DAYS('2006-01-01'),
PARTITION p601 VALUES LESS THAN (TO_DAYS('2006-02-01'),
PARTITION p602 VALUES LESS THAN (TO_DAYS('2006-03-01'),
PARTITION p603 VALUES LESS THAN (TO_DAYS('2006-04-01'),
PARTITION p604 VALUES LESS THAN (TO_DAYS('2006-05-01'),
PARTITION p605 VALUES LESS THAN (TO_DAYS('2006-06-01'),
PARTITION p606 VALUES LESS THAN (TO_DAYS('2006-07-01'),
PARTITION p607 VALUES LESS THAN (TO_DAYS('2006-08-01'));
Then load the table with data. Now you might want to see the
data from Q3 2004. So you issue the query:
SELECT * from t1
WHERE a >= '2004-07-01' AND a <= '2004-09-30';
This should now only scan partition p407, p408, p409. You can
check this by using EXPLAIN PARTITIONS on the query:
EXPLAIN PARTITIONS SELECT * from t1
WHERE a >= '2004-07-01' AND a <= '2004-09-30';
You can also get similar results with more complicated expressions.
Assume we want to summarize on all measured Q3's so far.
SELECT * from t1
WHERE (a >= '2004-07-01' AND a <= '2004-09-30') OR
(a >= '2005-07-01' AND a <= '2005-09-30');
Using EXPLAIN PARTITIONS we'll discover the expected result that this
will only scan partitions p407, p408, p409, p507, p508 and p509.
When july comes to its end it is then time to add a new partition for
august 2006 which we do with a quick command:
ALTER TABLE t1 ADD PARTITION
(PARTITION p608 VALUES LESS THAN (TO_DAYS('2006-09-01'));
by date. One might want to have one partition per year or per month or per
week or per day. This blog entry shows how to handle this requirement using
MySQL 5.1.
The most common method to partition in this case is by range.
Partitioning in 5.1 uses a function on one or more fields of the table. In 5.1
there are some requirements on these fields if unique indexes or primary keys
also exist in the table. The reason is that 5.1 doesn't have support for global
indexes. Development of support for this have however started so should be in
some future release of MySQL.
In 5.1 functions have to return an integer. There are two functions that has
special support in the MySQL server. These are TO_DAYS() and YEAR(). Both
of these functions take a DATE or DATETIME argument and return an integer.
YEAR() returns the year and TO_DAYS() returns the number of days passed
since a particular start date.
The MySQL optimizer has special support for these two partition functions. It
knows that those functions are strictly increasing and use this knowledge to
discover that queries such as:
SELECT * from t1 WHERE a <= '1991-01-31' AND a >= '1991-01-01';
with a partition function PARTITION BY RANGE (to_days(a)) can be mapped
to a range of partition function values starting at
TO_DAYS('1991-01-01') and ending at TO_DAYS("1999-01-31")
Thus the MySQL Server can map TO_DAYS('1991-01-01') to a starting partition
and TO_DAYS('1991-01-31') to an ending partition. Thus we only need to scan
partitions in a range of partitions.
Most functions don't have this nice mapping from value range to partition
range. The functions TO_DAYS(date) and YEAR(date) are known by the
MySQL optimizer to have this attribute and they will thus be better for range
optimisations. Also a partition function on a field which is an integer field
where the function is the field by itself will have this characteristic. Other
functions won't, theoretically many more can be handled but this requires
special care of overflow handling to be correct and this will be added in
some future MySQL release.
So with this knowledge let's set up a that does partition by month.
CREATE TABLE t1 (a date)
PARTITION BY RANGE(TO_DAYS(a))
(PARTITION p3xx VALUES LESS THAN (TO_DAYS('2004-01-01'),
PARTITION p401 VALUES LESS THAN (TO_DAYS('2004-02-01'),
PARTITION p402 VALUES LESS THAN (TO_DAYS('2004-03-01'),
PARTITION p403 VALUES LESS THAN (TO_DAYS('2004-04-01'),
PARTITION p404 VALUES LESS THAN (TO_DAYS('2004-05-01'),
PARTITION p405 VALUES LESS THAN (TO_DAYS('2004-06-01'),
PARTITION p406 VALUES LESS THAN (TO_DAYS('2004-07-01'),
PARTITION p407 VALUES LESS THAN (TO_DAYS('2004-08-01'),
PARTITION p408 VALUES LESS THAN (TO_DAYS('2004-09-01'),
PARTITION p409 VALUES LESS THAN (TO_DAYS('2004-10-01'),
PARTITION p410 VALUES LESS THAN (TO_DAYS('2004-11-01'),
PARTITION p411 VALUES LESS THAN (TO_DAYS('2004-12-01'),
PARTITION p412 VALUES LESS THAN (TO_DAYS('2005-01-01'),
PARTITION p501 VALUES LESS THAN (TO_DAYS('2005-02-01'),
PARTITION p502 VALUES LESS THAN (TO_DAYS('2005-03-01'),
PARTITION p503 VALUES LESS THAN (TO_DAYS('2005-04-01'),
PARTITION p504 VALUES LESS THAN (TO_DAYS('2005-05-01'),
PARTITION p505 VALUES LESS THAN (TO_DAYS('2005-06-01'),
PARTITION p506 VALUES LESS THAN (TO_DAYS('2005-07-01'),
PARTITION p507 VALUES LESS THAN (TO_DAYS('2005-08-01'),
PARTITION p508 VALUES LESS THAN (TO_DAYS('2005-09-01'),
PARTITION p509 VALUES LESS THAN (TO_DAYS('2005-10-01'),
PARTITION p510 VALUES LESS THAN (TO_DAYS('2005-11-01'),
PARTITION p511 VALUES LESS THAN (TO_DAYS('2005-12-01'),
PARTITION p512 VALUES LESS THAN (TO_DAYS('2006-01-01'),
PARTITION p601 VALUES LESS THAN (TO_DAYS('2006-02-01'),
PARTITION p602 VALUES LESS THAN (TO_DAYS('2006-03-01'),
PARTITION p603 VALUES LESS THAN (TO_DAYS('2006-04-01'),
PARTITION p604 VALUES LESS THAN (TO_DAYS('2006-05-01'),
PARTITION p605 VALUES LESS THAN (TO_DAYS('2006-06-01'),
PARTITION p606 VALUES LESS THAN (TO_DAYS('2006-07-01'),
PARTITION p607 VALUES LESS THAN (TO_DAYS('2006-08-01'));
Then load the table with data. Now you might want to see the
data from Q3 2004. So you issue the query:
SELECT * from t1
WHERE a >= '2004-07-01' AND a <= '2004-09-30';
This should now only scan partition p407, p408, p409. You can
check this by using EXPLAIN PARTITIONS on the query:
EXPLAIN PARTITIONS SELECT * from t1
WHERE a >= '2004-07-01' AND a <= '2004-09-30';
You can also get similar results with more complicated expressions.
Assume we want to summarize on all measured Q3's so far.
SELECT * from t1
WHERE (a >= '2004-07-01' AND a <= '2004-09-30') OR
(a >= '2005-07-01' AND a <= '2005-09-30');
Using EXPLAIN PARTITIONS we'll discover the expected result that this
will only scan partitions p407, p408, p409, p507, p508 and p509.
When july comes to its end it is then time to add a new partition for
august 2006 which we do with a quick command:
ALTER TABLE t1 ADD PARTITION
(PARTITION p608 VALUES LESS THAN (TO_DAYS('2006-09-01'));
Wednesday, May 31, 2006
EXPLAIN to understand partition pruning
As part of the partitioning development in MySQL 5.1 we've added the ability to
check which partitions of a table that is actually accessed in a particular query.
As partitions in a sense can be a sort of index this is an important feature to
help understand performance impact of a query.
The method to use this feature is the normal EXPLAIN command with an
added keyword PARTITIONS. So e.g.
EXPLAIN PARTITIONS select * from t1;
So a slightly more useful example would be
CREATE TABLE t1 (a int)
PARTITION BY RANGE (a)
(PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (20),
PARTITION p2 VALUES LESS THAN (30));
Now if we do an equal query we should only need to access one partition:
This will be verified by the command:
EXPLAIN PARTITIONS select * from t1 WHERE a = 1;
/* Result in p0 being displayed in the list of partitions */
A range query will also be pruned nicely in this case (a is a function that is
increasing and thus range optimisations can be performed, YEAR(date) and
FUNC_TO_DAYS(date) are two other functions that are known to be
monotonically increasing.
EXPLAIN PARTITIONS select * from t1 WHERE a <= 1 AND a>= 12;
/* Result in the range being mapped to p0, p1 */
LIST partitions will be pruned in the same cases as RANGE for range-pruning
of partitions.
HASH partitioning has no natural concept for ranges since different values
map more or less randomly into partitions. We do however apply an
optimisation for short ranges such that the following will happen.
CREATE TABLE t1 (a int)
PARTITION BY HASH (a)
PARTITIONS 10;
EXPLAIN PARTITIONS select * from t1 WHERE a < 3 AND a > 0;
In this case the range consists of only two values 1 and 2. Thus we simply map
the interval to a = 1 OR a = 2 and here we get p1 and p2 as the partitions to
use.
check which partitions of a table that is actually accessed in a particular query.
As partitions in a sense can be a sort of index this is an important feature to
help understand performance impact of a query.
The method to use this feature is the normal EXPLAIN command with an
added keyword PARTITIONS. So e.g.
EXPLAIN PARTITIONS select * from t1;
So a slightly more useful example would be
CREATE TABLE t1 (a int)
PARTITION BY RANGE (a)
(PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (20),
PARTITION p2 VALUES LESS THAN (30));
Now if we do an equal query we should only need to access one partition:
This will be verified by the command:
EXPLAIN PARTITIONS select * from t1 WHERE a = 1;
/* Result in p0 being displayed in the list of partitions */
A range query will also be pruned nicely in this case (a is a function that is
increasing and thus range optimisations can be performed, YEAR(date) and
FUNC_TO_DAYS(date) are two other functions that are known to be
monotonically increasing.
EXPLAIN PARTITIONS select * from t1 WHERE a <= 1 AND a>= 12;
/* Result in the range being mapped to p0, p1 */
LIST partitions will be pruned in the same cases as RANGE for range-pruning
of partitions.
HASH partitioning has no natural concept for ranges since different values
map more or less randomly into partitions. We do however apply an
optimisation for short ranges such that the following will happen.
CREATE TABLE t1 (a int)
PARTITION BY HASH (a)
PARTITIONS 10;
EXPLAIN PARTITIONS select * from t1 WHERE a < 3 AND a > 0;
In this case the range consists of only two values 1 and 2. Thus we simply map
the interval to a = 1 OR a = 2 and here we get p1 and p2 as the partitions to
use.
Information Schemas for Partitions
As part of the work in developing partitioning support for 5.1 a new
information schema table has been added. This table can be used to
retrieve information about properties of individual partitions.
To query this table you can issue a query like:
SELECT * FROM information_schema.partitions WHERE
table_schema = "database_name" AND table_name = "name_of_table";
The result of this particular query will be one record per partition in
the table with info about the properties of these partitions.
A query on a non-partitioned table will produce a similar output
although most fields will be NULL. The information_schema.partitions
table is not yet implemented for MySQL Cluster so for MySQL Cluster
tables the output will be all NULLs on the partition specific information.
Below follows a short description of the fields in this information
schema table:
1) TABLE_CATALOG: this field is always NULL
2) TABLE_SCHEMA: This field contains the database name of the table
3) TABLE_NAME: Table name
4) PARTITION_NAME: Name of the partition
5) SUBPARTITION_NAME: Name of subpartition if one exists otherwise
NULL
6) PARTITION_ORDINAL_POSITION: All partitions are ordered in the
same order as they were defined, this order can change as management
of partitions add, drop and reorganize partitions. This number is the
current order with number 1 as the number of the first partition
7) SUBPARTITION_ORDINAL_POSITION: Order of subpartitions within a
partition, starts at 1
8) PARTITION_METHOD: Any of the partitioning variants: RANGE, LIST,
HASH, LINEAR HASH, KEY, LINEAR KEY
9) SUBPARTITION_METHOD: Any of the subpartitioning variants: HASH,
LINEAR HASH, KEY, LINEAR KEY
10) PARTITION_EXPRESSION: This is the expression for the partition
function as expressed when creating partitioning on the table through
CREATE TABLE or ALTER TABLE.
11) SUBPARTITION_EXPRESSION: Same for the subpartition function
12) PARTITION_DESCRIPTION: This is used for RANGE and LIST partitions:
RANGE: Contains the value defined in VALUES LESS THAN. This is an
integer value, so if the CREATE TABLE contained a constant expression
this contains the evaluated expression, thus an integer value
LIST: The values defined in VALUES IN. This is a comma-separated list of
integer values.
13) TABLE_ROWS: Although its name indicates that it is the number of
rows in the table, it is actually the number of rows in the partition.
14) AVG_ROW_LENGTH, DATA_LENGTH, MAX_DATA_LENGTH,
INDEX_LENGTH, DATA_FREE, CREATE_TIME, UPDATE_TIME, CHECK_TIME,
CHECKSUM:
All these fields are to be interpreted in the same as for a normal table
except that the value is the value for the partition and not for the table.
23) PARTITION_COMMENT: Comment on the partition
24) NODEGROUP: This is the nodegroup of the partition. This is only
relevant for MySQL Cluster.
25) TABLESPACE_NAME: This is the tablespace name of the partition.
This is currently not relevant for any storage engine.
information schema table has been added. This table can be used to
retrieve information about properties of individual partitions.
To query this table you can issue a query like:
SELECT * FROM information_schema.partitions WHERE
table_schema = "database_name" AND table_name = "name_of_table";
The result of this particular query will be one record per partition in
the table with info about the properties of these partitions.
A query on a non-partitioned table will produce a similar output
although most fields will be NULL. The information_schema.partitions
table is not yet implemented for MySQL Cluster so for MySQL Cluster
tables the output will be all NULLs on the partition specific information.
Below follows a short description of the fields in this information
schema table:
1) TABLE_CATALOG: this field is always NULL
2) TABLE_SCHEMA: This field contains the database name of the table
3) TABLE_NAME: Table name
4) PARTITION_NAME: Name of the partition
5) SUBPARTITION_NAME: Name of subpartition if one exists otherwise
NULL
6) PARTITION_ORDINAL_POSITION: All partitions are ordered in the
same order as they were defined, this order can change as management
of partitions add, drop and reorganize partitions. This number is the
current order with number 1 as the number of the first partition
7) SUBPARTITION_ORDINAL_POSITION: Order of subpartitions within a
partition, starts at 1
8) PARTITION_METHOD: Any of the partitioning variants: RANGE, LIST,
HASH, LINEAR HASH, KEY, LINEAR KEY
9) SUBPARTITION_METHOD: Any of the subpartitioning variants: HASH,
LINEAR HASH, KEY, LINEAR KEY
10) PARTITION_EXPRESSION: This is the expression for the partition
function as expressed when creating partitioning on the table through
CREATE TABLE or ALTER TABLE.
11) SUBPARTITION_EXPRESSION: Same for the subpartition function
12) PARTITION_DESCRIPTION: This is used for RANGE and LIST partitions:
RANGE: Contains the value defined in VALUES LESS THAN. This is an
integer value, so if the CREATE TABLE contained a constant expression
this contains the evaluated expression, thus an integer value
LIST: The values defined in VALUES IN. This is a comma-separated list of
integer values.
13) TABLE_ROWS: Although its name indicates that it is the number of
rows in the table, it is actually the number of rows in the partition.
14) AVG_ROW_LENGTH, DATA_LENGTH, MAX_DATA_LENGTH,
INDEX_LENGTH, DATA_FREE, CREATE_TIME, UPDATE_TIME, CHECK_TIME,
CHECKSUM:
All these fields are to be interpreted in the same as for a normal table
except that the value is the value for the partition and not for the table.
23) PARTITION_COMMENT: Comment on the partition
24) NODEGROUP: This is the nodegroup of the partition. This is only
relevant for MySQL Cluster.
25) TABLESPACE_NAME: This is the tablespace name of the partition.
This is currently not relevant for any storage engine.
Monday, March 27, 2006
Partition and Scanning
If a query does a SELECT the optimizer will discover which partitions
that has to be scanned. In 5.1 these scans will still be made
sequentially on one partition at a time. However only those partitions
actually touched will be scanned which can improve performance on
certain queries by a magnitude.
The aim is to support parallel scan in a future release as it is to
support parallel sort on the first table selected by the optimizer.
This is an important long-term goal of partitioning to open up the
MySQL architecture for many performance improvements on
handling large data sizes. In 5.1 we have achieved quite a few of
those goals but expect to see more goals achieved as new versions
of MySQL hits the street burning :)
that has to be scanned. In 5.1 these scans will still be made
sequentially on one partition at a time. However only those partitions
actually touched will be scanned which can improve performance on
certain queries by a magnitude.
The aim is to support parallel scan in a future release as it is to
support parallel sort on the first table selected by the optimizer.
This is an important long-term goal of partitioning to open up the
MySQL architecture for many performance improvements on
handling large data sizes. In 5.1 we have achieved quite a few of
those goals but expect to see more goals achieved as new versions
of MySQL hits the street burning :)
Partitioning and Locking
In MySQL tables are locked as part of the MySQL Server code during each
statement. For a SELECT statement each table in the SELECT statement is
locked using a read lock. For UPDATE/INSERT/DELETE statements a write
lock is used. For ALTER TABLE a special lock that excludes writers but
allows readers to be processed is taken.
However what these locks actually do is dependent on the underlying
engine. For MyISAM, a read lock means no updater is allowed during the
statement and a write lock means that no other writers is allowed and
also no other readers.
For other engines such as NDB and InnoDB that employs row-level locking
internally these locks are mostly ignored except for the special ALTER
TABLE lock that ensures that no writers is allowed during the ALTER TABLE
statement.
(There is quite a lot more details to locking in MySQL that is not covered in
this blog)
Using partitioned tables, there is one lock per partition instead of one lock
per table. However for each statement a lock is taken on all partitions and
thus the behaviour is the same as if there was one lock using one table
and this lock is only held during the time of a statement and is dependent
on the storage engine used by the partitions.
It is quite certain that there will be changes of lock behaviour in 5.2. It is
quite likely that less partitions will be locked to ensure that only those
partitions touched are actually locked.
statement. For a SELECT statement each table in the SELECT statement is
locked using a read lock. For UPDATE/INSERT/DELETE statements a write
lock is used. For ALTER TABLE a special lock that excludes writers but
allows readers to be processed is taken.
However what these locks actually do is dependent on the underlying
engine. For MyISAM, a read lock means no updater is allowed during the
statement and a write lock means that no other writers is allowed and
also no other readers.
For other engines such as NDB and InnoDB that employs row-level locking
internally these locks are mostly ignored except for the special ALTER
TABLE lock that ensures that no writers is allowed during the ALTER TABLE
statement.
(There is quite a lot more details to locking in MySQL that is not covered in
this blog)
Using partitioned tables, there is one lock per partition instead of one lock
per table. However for each statement a lock is taken on all partitions and
thus the behaviour is the same as if there was one lock using one table
and this lock is only held during the time of a statement and is dependent
on the storage engine used by the partitions.
It is quite certain that there will be changes of lock behaviour in 5.2. It is
quite likely that less partitions will be locked to ensure that only those
partitions touched are actually locked.
Saturday, March 04, 2006
Partitioning at UC 2006
I've been posted as a speaker at the MySQL Users Conference at the end of
April.
http://www.mysqluc.com
I will present the partitioning development that has happened in MySQL for
5.1. There's been a lot of work to accomplish this but it's nice to see the
result now that it's nearing completion.
At first when I started this development my main objective of doing this
was to prepare MySQL for parallel query support (which it actually has
support for since 5.0 in MySQL Cluster).
However while developing support for this I've learned a lot about other
nice uses for partitioning. Other open source DBMS's have focused mainly
on the performance aspects of partitioning. I think this has been solved
very well in this 5.1 release, this particular part needed some intricate
understanding of the optimiser which I did not possess on a sufficiently
detailed level, but with some aid and with a small effort the optimiser
was turned into a partitioning optimiser.
However most of the work for partitioning has actually been spent on the
management part of partitioning. The ability to add, drop and reorganize
partitions in a fast and efficient manner. Now with the implementation it
is possible to momentarily add and drop partitions for range and list
partitioning. It is also possible to add and drop partitions for hash
partitions, normally this means that the entire table needs reorganisation.
However we've added the keyword LINEAR to all hashing variants. This
makes it possible also for HASH/KEY partitions to have partitions added
and dropped with minimal work. So if the table contains 16 partitions with
LINEAR KEY partitioning it's only necessary to reorganise 4 of those 16
partitions to grow the number of partitions to 20. The linear scheme makes
distribution of records to be unbalanced in scenario's where the number
of partitions isn't in the form 2**n. So it's a choice one does when
manageability is important.
Given that MySQL is known for its simplicity, ease of use, we felt that it was
more important to bring a version with the all the manageability and with
sufficent performance advantages. I'm glad to say now that we achieved
both. It's got both manageability and major performance improvements.
Unfortunately we weren't able to bring all the goals I had with partition
management for MySQL Cluster into 5.1. This is especially displeasing
to me being the founder of the cluster technology in MySQL. However we
look forward to bringing out this work in future releases of MySQL, it's
a tremendously interesting problem to reorganise a clustered table while
users are reading, updating, taking backup's and performing a whole lot
of various other actions on the table simultaneously with this change.
This work will also form the foundation of the support for adding and
dropping nodes in MySQL Cluster.
One idea that I'm particularly pleased with is the concept of REORGANIZE
partitions. I was contemplating how to handle MERGE of partitions, JOIN
of partitions and also simply changing the partition boundaries. I was
very pleased to find one stone for all those birds.
So please try out the new partitioning feature such that we can quickly
bring it to a stable state and move on to the next set of cool features for
versions to come.
April.
http://www.mysqluc.com
I will present the partitioning development that has happened in MySQL for
5.1. There's been a lot of work to accomplish this but it's nice to see the
result now that it's nearing completion.
At first when I started this development my main objective of doing this
was to prepare MySQL for parallel query support (which it actually has
support for since 5.0 in MySQL Cluster).
However while developing support for this I've learned a lot about other
nice uses for partitioning. Other open source DBMS's have focused mainly
on the performance aspects of partitioning. I think this has been solved
very well in this 5.1 release, this particular part needed some intricate
understanding of the optimiser which I did not possess on a sufficiently
detailed level, but with some aid and with a small effort the optimiser
was turned into a partitioning optimiser.
However most of the work for partitioning has actually been spent on the
management part of partitioning. The ability to add, drop and reorganize
partitions in a fast and efficient manner. Now with the implementation it
is possible to momentarily add and drop partitions for range and list
partitioning. It is also possible to add and drop partitions for hash
partitions, normally this means that the entire table needs reorganisation.
However we've added the keyword LINEAR to all hashing variants. This
makes it possible also for HASH/KEY partitions to have partitions added
and dropped with minimal work. So if the table contains 16 partitions with
LINEAR KEY partitioning it's only necessary to reorganise 4 of those 16
partitions to grow the number of partitions to 20. The linear scheme makes
distribution of records to be unbalanced in scenario's where the number
of partitions isn't in the form 2**n. So it's a choice one does when
manageability is important.
Given that MySQL is known for its simplicity, ease of use, we felt that it was
more important to bring a version with the all the manageability and with
sufficent performance advantages. I'm glad to say now that we achieved
both. It's got both manageability and major performance improvements.
Unfortunately we weren't able to bring all the goals I had with partition
management for MySQL Cluster into 5.1. This is especially displeasing
to me being the founder of the cluster technology in MySQL. However we
look forward to bringing out this work in future releases of MySQL, it's
a tremendously interesting problem to reorganise a clustered table while
users are reading, updating, taking backup's and performing a whole lot
of various other actions on the table simultaneously with this change.
This work will also form the foundation of the support for adding and
dropping nodes in MySQL Cluster.
One idea that I'm particularly pleased with is the concept of REORGANIZE
partitions. I was contemplating how to handle MERGE of partitions, JOIN
of partitions and also simply changing the partition boundaries. I was
very pleased to find one stone for all those birds.
So please try out the new partitioning feature such that we can quickly
bring it to a stable state and move on to the next set of cool features for
versions to come.
Thursday, February 16, 2006
How to define a table that uses disk data in MySQL Cluster
At first before creating a table that uses non-indexed fields on
disk in MySQL Cluster, it is necessary to create a LOGFILE
GROUP and a TABLESPACE.
CREATE LOGFILE GROUP lg1
ADD UNDOFILE 'undofile.dat'
INITIAL_SIZE 16M
UNDO_BUFFER_SIZE = 1M
ENGINE = NDB;
This creates a file on each node for storing UNDO log records.
CREATE TABLESPACE ts1
ADD DATAFILE 'datafile.dat'
USE LOGFILE GROUP lg1
INITIAL_SIZE 12M
ENGINE NDB;
This creates a file on each node for storing the disk data parts
of a disk-based table. All tables in the TABLESPACE is connected
to the LOGFILE GROUP previously defined.
Now we are ready to define the actual disk-based table.
CREATE TABLE t1 (a int, b int, c int, d int, e int,
primary key(a), index(a,b))
TABLESPACE ts1 STORAGE DISK
engine=ndb;
This defines a table where a and b will be in memory since they are
part of indexes. c,d and e will be stored as they are not part of
indexes.
If later one wants to add a new file to the tablespace one does this
by:
ALTER TABLESPACE ts1
ADD DATAFILE 'datafile02.dat'
INITIAL_SIZE 4M
ENGINE=NDB;
and similarly can be done for the LOGFILE GROUP
ALTER LOGFILE GROUP lg1
ADD UNDOFILE 'undofile02.dat'
INITIAL_SIZE = 4M
ENGINE=NDB;
Tablespaces and logfile groups can be dropped as well but only when
all objects in them are dropped. So to drop everything after the above
changes one does.
DROP TABLE t1;
ALTER TABLESPACE ts1
DROP DATAFILE 'datafile.dat'
ENGINE=NDB;
ALTER TABLESPACE ts1
DROP DATAFILE 'datafile02.dat'
ENGINE=NDB;
DROP TABLESPACE ts1
ENGINE=NDB;
DROP LOGFILE GROUP lg1
ENGINE=NDB;
disk in MySQL Cluster, it is necessary to create a LOGFILE
GROUP and a TABLESPACE.
CREATE LOGFILE GROUP lg1
ADD UNDOFILE 'undofile.dat'
INITIAL_SIZE 16M
UNDO_BUFFER_SIZE = 1M
ENGINE = NDB;
This creates a file on each node for storing UNDO log records.
CREATE TABLESPACE ts1
ADD DATAFILE 'datafile.dat'
USE LOGFILE GROUP lg1
INITIAL_SIZE 12M
ENGINE NDB;
This creates a file on each node for storing the disk data parts
of a disk-based table. All tables in the TABLESPACE is connected
to the LOGFILE GROUP previously defined.
Now we are ready to define the actual disk-based table.
CREATE TABLE t1 (a int, b int, c int, d int, e int,
primary key(a), index(a,b))
TABLESPACE ts1 STORAGE DISK
engine=ndb;
This defines a table where a and b will be in memory since they are
part of indexes. c,d and e will be stored as they are not part of
indexes.
If later one wants to add a new file to the tablespace one does this
by:
ALTER TABLESPACE ts1
ADD DATAFILE 'datafile02.dat'
INITIAL_SIZE 4M
ENGINE=NDB;
and similarly can be done for the LOGFILE GROUP
ALTER LOGFILE GROUP lg1
ADD UNDOFILE 'undofile02.dat'
INITIAL_SIZE = 4M
ENGINE=NDB;
Tablespaces and logfile groups can be dropped as well but only when
all objects in them are dropped. So to drop everything after the above
changes one does.
DROP TABLE t1;
ALTER TABLESPACE ts1
DROP DATAFILE 'datafile.dat'
ENGINE=NDB;
ALTER TABLESPACE ts1
DROP DATAFILE 'datafile02.dat'
ENGINE=NDB;
DROP TABLESPACE ts1
ENGINE=NDB;
DROP LOGFILE GROUP lg1
ENGINE=NDB;
Tuesday, February 14, 2006
5.1.6 release crammed with goodies
The new 5.1.6 release comes crammed with goodies for MySQL Cluster:
1) It is now possible to replicate between clusters.
2) Non-indexed attributes can now be stored on disk.
3) Disk-stored attributes uses a new page cache, data is stored in
tablespaces and UNDO log records for disk-stored attributes are
stored in LOGFILE GROUP's.
4) The default partitioning scheme has been changed to even distribution
and the previous linear hashing scheme can still be used by using
PARTITION BY LINEAR KEY().
5) Automatic partitioning can still be used, but NDB tables can also be
partitioned by the user. All the partitioning defined in the MySQL manual can
also be used to partition MySQL Cluster tables.
6) With user defined partitioning it is possible to use heterogenous machines
by using NODEGROUP when defining a partition. This defines in which
NODEGROUP a partition is to be stored.
7) ndb_restore can handle restoring backup's created in one cluster with
user-defined partition and later restored in a cluster with different set of
nodegroups.
1) It is now possible to replicate between clusters.
2) Non-indexed attributes can now be stored on disk.
3) Disk-stored attributes uses a new page cache, data is stored in
tablespaces and UNDO log records for disk-stored attributes are
stored in LOGFILE GROUP's.
4) The default partitioning scheme has been changed to even distribution
and the previous linear hashing scheme can still be used by using
PARTITION BY LINEAR KEY().
5) Automatic partitioning can still be used, but NDB tables can also be
partitioned by the user. All the partitioning defined in the MySQL manual can
also be used to partition MySQL Cluster tables.
6) With user defined partitioning it is possible to use heterogenous machines
by using NODEGROUP when defining a partition. This defines in which
NODEGROUP a partition is to be stored.
7) ndb_restore can handle restoring backup's created in one cluster with
user-defined partition and later restored in a cluster with different set of
nodegroups.
Friday, February 10, 2006
Cluster disk data presentation at VLDB 2005
At VLDB2005 in Trondheim, Norway I did a presentation that explained
the architecture of the implementation of disk data in MySQL Cluster.
It also explains the thinking process for the new node recovery
algorithm introduced as part of this work.
Presentation:
http://www.vldb2005.org/program/slides/wed/s1108-ronstrom.pdf
Paper:
http://www.vldb2005.org/program/paper/wed/p1108-ronstrom.pdf
the architecture of the implementation of disk data in MySQL Cluster.
It also explains the thinking process for the new node recovery
algorithm introduced as part of this work.
Presentation:
http://www.vldb2005.org/program/slides/wed/s1108-ronstrom.pdf
Paper:
http://www.vldb2005.org/program/paper/wed/p1108-ronstrom.pdf
Partition Pruning uncovered
The partitioning development for 5.1 is close to its completion (apart from the
stabilisation phase). An important part of this development is the ability to
prune away partitions not used in the query. Here follows some guidelines how
the optimiser handles this at the moment.
For most functions the optimizer can only handle equality conditions thus:
where partition_field = 2
or
where partiiton_field IN (2,3,4)
In this case the optimiser can use this independent of what the partition
function is. It will do so by actually applying the partition function on the
equal value thus:
partition_function(2) = partition 3
partition_function(3) = partition 5
partition_function(4) = partition 3
It will set a bit in partition_used bitmap to indicate which partitions are used
to indicate to the executer which partitions to scan.
So this will work for all partition functions.
Range optimisations will work for the following partition functions using
RANGE or LIST partitioning
PARTITION BY X(f(partition_field)) where f(partition_field) is:
f(partition_field) = partition_field
f(partition_field) = year(field)
f(partition_field) = FUNC_TO_DAYS(partition_field)
(if partition_field is of date type or converted to in last two examples)
Range optimisations will also work for all partition functions and all
partitioning in the following case: When the range is either of type a list
of values (1,3,5) or where the range is a short range e.g. BETWEEN (3,5)
which is simply converted to (3,4,5).
For Range partitions the actual range of partitions is found by applying
partition_function(start_range_value) = partition 3
partition_function(end_range_value) = partition 5
=> Scan partition 3,4 and 5
For LIST partitions one will do it slightly different
partition_function(start_range_value) = x
partition_function(end_range_value) = y
(The partition functions used here must be increasing or decreasing)
Then scan all lists values from x to y (it is kept in a sorted list to enable quick
fetch of a partition given a value) and for each list value found mark the
partition as used.
For Range types that will work for all partition types the partition function will
be evaluated for each value in the interval and its corresponding partition will
be marked for use.
These optimisations apply also to subpartition so you can have conditions on
only subpartition fields and you will get pruning for subpartitions only and vice
versa for partitions and also when conditions on both at the same time.
So to use an example
CREATE TABLE t1 (a date)
PARTITION BY RANGE(year(a))
(PARTITION p0 VALUES LESS THAN (1995), ,,,
PARTITION p9 VALUES LESS THAN (2004));
SELECT * from t1 WHERE a >= '1995-01-01 AND a <= '1997-12-31';
Here we have a range optimised function =>
year(start_range_value) = year('1995-01-01') = 1995 => p0
year(end_range_value) = year('1997-12-31') = 1997 => p2
Thus p0 - > p2 will be scanned, thus p0, p1 and p2.
If we replace RANGE by LIST only and all else is kept constant then
we will scan list values from 1995 to 1997 and find that p0, p1 and p2
is to be scanned.
For HASH partitioned variants there will not be any optimisations in this
case due to that the interval is to wide since the data type of the equal
condition is date and there more than a thousand dates in the range.
However if the user makes a slight change to the data model and instead
use the following:
CREATE TABLE t1 (a_year int)
PARTITION BY HASH(any_function(a_year))
PARTITIONS 10;
SELECT * from t1 WHERE a_year >= 1995 AND a_year <= 1997;
Then the range [1995,1997] will always be good enough for partition pruning.
And thus a maximum of 3 partitions will be scanned in the query.
stabilisation phase). An important part of this development is the ability to
prune away partitions not used in the query. Here follows some guidelines how
the optimiser handles this at the moment.
For most functions the optimizer can only handle equality conditions thus:
where partition_field = 2
or
where partiiton_field IN (2,3,4)
In this case the optimiser can use this independent of what the partition
function is. It will do so by actually applying the partition function on the
equal value thus:
partition_function(2) = partition 3
partition_function(3) = partition 5
partition_function(4) = partition 3
It will set a bit in partition_used bitmap to indicate which partitions are used
to indicate to the executer which partitions to scan.
So this will work for all partition functions.
Range optimisations will work for the following partition functions using
RANGE or LIST partitioning
PARTITION BY X(f(partition_field)) where f(partition_field) is:
f(partition_field) = partition_field
f(partition_field) = year(field)
f(partition_field) = FUNC_TO_DAYS(partition_field)
(if partition_field is of date type or converted to in last two examples)
Range optimisations will also work for all partition functions and all
partitioning in the following case: When the range is either of type a list
of values (1,3,5) or where the range is a short range e.g. BETWEEN (3,5)
which is simply converted to (3,4,5).
For Range partitions the actual range of partitions is found by applying
partition_function(start_range_value) = partition 3
partition_function(end_range_value) = partition 5
=> Scan partition 3,4 and 5
For LIST partitions one will do it slightly different
partition_function(start_range_value) = x
partition_function(end_range_value) = y
(The partition functions used here must be increasing or decreasing)
Then scan all lists values from x to y (it is kept in a sorted list to enable quick
fetch of a partition given a value) and for each list value found mark the
partition as used.
For Range types that will work for all partition types the partition function will
be evaluated for each value in the interval and its corresponding partition will
be marked for use.
These optimisations apply also to subpartition so you can have conditions on
only subpartition fields and you will get pruning for subpartitions only and vice
versa for partitions and also when conditions on both at the same time.
So to use an example
CREATE TABLE t1 (a date)
PARTITION BY RANGE(year(a))
(PARTITION p0 VALUES LESS THAN (1995), ,,,
PARTITION p9 VALUES LESS THAN (2004));
SELECT * from t1 WHERE a >= '1995-01-01 AND a <= '1997-12-31';
Here we have a range optimised function =>
year(start_range_value) = year('1995-01-01') = 1995 => p0
year(end_range_value) = year('1997-12-31') = 1997 => p2
Thus p0 - > p2 will be scanned, thus p0, p1 and p2.
If we replace RANGE by LIST only and all else is kept constant then
we will scan list values from 1995 to 1997 and find that p0, p1 and p2
is to be scanned.
For HASH partitioned variants there will not be any optimisations in this
case due to that the interval is to wide since the data type of the equal
condition is date and there more than a thousand dates in the range.
However if the user makes a slight change to the data model and instead
use the following:
CREATE TABLE t1 (a_year int)
PARTITION BY HASH(any_function(a_year))
PARTITIONS 10;
SELECT * from t1 WHERE a_year >= 1995 AND a_year <= 1997;
Then the range [1995,1997] will always be good enough for partition pruning.
And thus a maximum of 3 partitions will be scanned in the query.