Thursday, August 21, 2008

New partitioning features

As burtonator pointed out parallelism is an important
feature that partitioning makes possible. So I thought
it might be a good idea to mention a little bit what
we're doing in the area of partitioning.

It's quite correct that parallelism is one of the main
advantages of partitioning (not the only one though since
also partition pruning and dividing large indexes and
being able to add and drop partitions efficiently are
important as well). In 5.1 we focused on the maintenance
features of partitioning but the intention to move on
to parallelisation was more or less the main goal from
the very start.

This is why it's such extra fun to actually get going on
this when one has worked on the foundation for this work
for almost 4 years (partitioning development started out
2004 H2 and most of the partitioning code in 5.1 was ready
about two years later).

There are also ideas to introduce parallelism for scans of
large partitioned tables and also a few more maintenance
features that are still missing.

Another feature in the works for partitioning is the
ability to use partition pruning on several fields. This
will be possible for PARTITION BY RANGE and LIST. The
syntax will look something like this:

CREATE TABLE t1 (a varchar(20), b int)
PARTITION BY RANGE (COLUMN_LIST(a,b))
(PARTITION p0 VALUES LESS THAN (COLUMN_LIST("a", 1)),
PARTITION p1 VALUES LESS THAN
(COLUMN_LIST(MAXVALUE, 4)));

In this case it is possible to partition on any field type
and it is also possible to do partition pruning on multiple
fields in much the same way as it is for indexes.

E.g.
select * from t1 where a = "a";
select * from t1 where a = "a" and b = 2;

will both be able to use for partition pruning with the
second obviously able to do more pruning then the first one.

2 comments:

Unknown said...

Nice Mikael.

I've found the partitioning feature in MySQL 5.1 very useful for data warehousing when handling massive amounts of data (hundreds of millions of rows).

Unknown said...

I want run a parallel inserts on a table. Then how can I implement partitioning on my table.