Thursday, October 30, 2008

CACHE INDEX per partition for MyISAM

The newest development in the partitioning code
is WL#4571.

This new feature makes it possible to tie a
partition using MyISAM to a specific cache index.
The syntax for how to do is available in the
above worklog entry.

We found this feature to be useful for enabling
higher performance of parallel ALTER TABLE
(WL#2550). When adding
a primary key to a MyISAM table the key cache in
MyISAM limited scalability of Parallel ALTER TABLE
severely, so adding several key caches, essentially
one per partition we can ensure that the ALTER TABLE
can be fully parallelised (all other ALTER TABLE
on MyISAM already scales perfectly).

We also have some ideas on how to solve the base
problem in how to make the key cache more scalable
by dividing the mutex on the key cache into one
mutex per a range of key cache pages.

New launchpad tree for PARTITION BY RANGE COLUMN_LIST(a,b)

A colleague of mine at Sun/MySQL showed me how to get
statistics from my blog. This was an interesting read
of all statistics. I noted that there was a great
interest in partitioning related information and that
the new partitioning feature mentioned in my blog
2 years ago still attracts a lot of attention.

So I thought it was a good idea to blog a bit more
about what's going on in the partitioning
development. I decided to check out how easy it is
to externalize my development trees on launchpad.
It turned out to be really easy so I simply
put up the development tree for the new partitioning
feature which I described in my last blog.

Launchpad tree

I also externalized the Worklog entry for this
development, unfortunately not a very long
description but I'll try to work on that.
There is a new test case in the mysql-test/t
directory called partition_column.test which
shows how to use these new features (it might
take some time before this link works).

Worklog description

Tuesday, October 07, 2008

Further development on new partitioning feature

As mentioned in a blog 2 years ago I worked on a new
partitioning feature:

I've been busy with many other things but now I've taken this
work a step forward and will most likely set-up a preview tree
of this feature in a short time.

The new feature adds the possibility to perform partitioning
on any type of column also for range and list partitioning
(has been possible for KEY partitioning all the time). It also
adds a new function to the MySQL Server and this function is
also a monotonic function which means it gets a nice treatment
of the partition pruning. This new function is TO_SECONDS which
works very similarly to TO_DAYS.

So here are couple of new cases of what one will be able to do:

create table t1 (d date)
partition by range column_list(d)
( partition p0 values less than (column_list('1999-01-01')),
partition p1 values less than (column_list('2000-01-01')));

create table t1 (a date)
partition by range(to_seconds(a))
(partition p0 values less than (to_seconds('2004-01-01')),
partition p1 values less than (to_seconds('2005-01-01')));

select * from t1 where a <= '2003-12-31';

This select will be discovered to only find values in p0 by
the partition pruning optimisation step.

create table t1 (a int, b int)
partition by range column_list(a,b)
(partition p2 values less than (column_list(99,99)),
partition p1 values less than (column_list(99,999)));

insert into t1 values (99,998);
select * from t1 where a = 99 and b = 998;

This select statement will discover that it can only
be any records in the p1 partition and avoid
scanning the p0 partition. Thus partitioning works
in very much the same manner as a first step index.

Thursday, October 02, 2008

dbt2-0.37.37 uploaded and various other stuff

There was a small bug in the dbt2-0.37.36 version I uploaded which
I have now fixed in the new dbt2-0.37.37 version.

There has also been some interesting benchmark tests done where
we have run DBT2 on a T5220 box (Niagara II chips). We can show
the scalable performance benefits here as well. We've been able
to run with 20 data nodes on 1 box (these boxes can run up to
64 threads at a time) with scalable performance increase from
4 nodes.

We had a developer meeting a few weeks ago and there were lots of
activities. Personally I had most fun seeing the demo of
Parallel ALTER TABLE. We loaded a table with 10 million 70-80 byte
rows. We had access to a machine with 64 GB of memory and
16 cores. It was very interesting to run one SQL command and
see the load in top of mysqld go to 1600%. Altering a 10 million
row table in 2.5 seconds I thought was pretty good.

Another cool demo was to see the online add node in MySQL Cluster.
This is an interesting feature which I started thinking about
in 1999, had a first design then, changed to a second variant
in 2001 and changed again around 2005 and the final version that
was implemented was the fourth version of the design. The nice
thing is that the fourth version actually contains some nice
innovations that neither of the earlier designs had. So cooking
an idea for a long time can be really beneficial some times.
For a very brief description of this work see Jonas Oreland's
blog.

Jonas and Pekka is also working on another cool optimisation
of MySQL Cluster where the data node will become multithreaded.
There will be up to 6 threads in the first released version of
this. Jonas measured in a test today that one could do 370.000
inserts per second on 1 8-core box with this feature (and this
is still a fairly unstable version where there are still some
performance issues remaining). We're getting close to measuring
computer speed in MDO (MegaDatabaseOperations per second)
instead of in MHz.

Jonas and myself is also working on removing from MySQL Cluster
"the single transporter mutex" which will improve the scalability
of MySQL Servers using MySQL Cluster. We're working on this in
parallel using the same basic design but with small variations
on the details. Will be interesting to see which variant that
works best.

Finally Frazer has optimised the handling of large records in
the data node to the extent that inserts of 5k records gets
twice the speed. The interesting thing is that the benchmark
for this hits the limit of Gigabit Ethernet already with 1
CPU working at 80% which is quite interesting.

So as you can see there is a lot of interesting things cooking
at MySQL and then I haven't even mentioned the work we're
doing together with other Sun folks on optimising MySQL. More
on that later.