Tuesday, August 04, 2009

Partition by column_list ready for alpha testers

I got time to spend on a really old worklog I completed coding
already october 2005. I blogged about it in July 2006 and
interestingly enough it's still the second most read blog
entry on my blog (probably related to search engines in some
way).

I have merged it with the azalea tree (this is an internal code
name for our development tree, name is likely to change). This
tree contains subquery optimisations, Batched join and some more
optimisations.

I have fixed a whole bunch of bugs that always shows up in early
code. The code quality is still alpha but at least you won't find
10 bugs per hour :)


Here
you can find the launch pad tree for this code.

There are two important additions made possible by this tree.
1) New function to_seconds that is recognized by range optimiser
to enable partition pruning when partitioning like:
partition by range (to_seconds(time))
2) New partitioning functionality that makes it possible to
perform partition pruning over multiple fields.

Most of the bugs I have fixed had to do with this partition pruning
of multiple fields. The routine to discover which partitions are
needed is called find_used_partitions (in sql/opt_range.cc) and this
function is called recursively over a key tree. A key tree can be
very complex and more or less have AND of key parts using next_key_part
pointer and OR condition using left and right pointers. These left and
right pointers can however show up a little here and there in the tree
so one has to be very careful about how variables are assigned, saved
and restored. I havent' worked so much with recursive functions so this
is an interesting adventure.

Here's my latest addition of a test case to give you an idea of how it
works and also what works right now.

create table t1 (a int, b char(10), c varchar(5), d int)
partition by range column_list(a,b,c)
subpartition by key (c,d)
subpartitions 3
( partition p0 values less than (column_list(1,'abc','abc')),
partition p1 values less than (column_list(2,'abc','abc')),
partition p2 values less than (column_list(3,'abc','abc')),
partition p3 values less than (column_list(4,'abc','abc')));

insert into t1 values (1,'a','b',1),(2,'a','b',2),(3,'a','b',3);
insert into t1 values (1,'b','c',1),(2,'b','c',2),(3,'b','c',3);
insert into t1 values (1,'c','d',1),(2,'c','d',2),(3,'c','d',3);
insert into t1 values (1,'d','e',1),(2,'d','e',2),(3,'d','e',3);
select * from t1 where (a = 1 AND b < 'd' AND (c = 'b' OR (c = 'c' AND d = 1)) OR
(a = 1 AND b >= 'a' AND (c = 'c' OR (c = 'd' AND d = 2))));

So in the above select statement we are performing partition
pruning over 3 fields and subpartition pruning over 2 fields
and there are 5 different ranges in the query.

So please go ahead and try this new tree out and see if it
works for you.

No comments: