The hardware guys are presenting new storage devices called
SSD's based on flash memory. At the moment I think they are
about 3-4 times cheaper than DRAM memory and the gap seems
to be increasing. They're still far from the price of hard
drives but also here the gap seems to be closing.
So as I'm now an employee of Sun that actually puts together
systems with this type of HW in it I get questioned what I
as a DBMS developer can do with those devices.
First some comments on performance. These new devices will be
able to perform reads and writes of a few kilobytes large pages
in about 25-100 microseconds compared to hard drives which
takes about 3-10 milliseconds for the same thing.
An obvious use is obviously to use them to speed up database
logging, particularly in commit situations. However this
doesn't really require any significant changes to the SW
already out there. So I won't spend any more time on this use.
Another use is for MySQL Cluster. MySQL Cluster stores most data
in memory and can store non-indexed data on disk. So how can
SSD devices be used to improve this.
First some facts about performance of MySQL Cluster. In the data
node where the data actually resides it takes about 10
microseconds of processing time to perform a key lookup and a
scan has about 20 microseconds of start-up costs whereafter each
record takes 1-2 microseconds to fetch.
So now for the idea. Let's assume we'll use an SSD device as swap
memory. We would then purposely set the swap to be e.g. 10x
larger than the memory. For this to work we need to be able to
allocate memory from different swap pools, memory used for
transaction state and things like this we don't want swapped out
(working for Sun has an advantage since we can work with the OS
guys directly, but naturally I hope Linux developers also take the
So during a key lookup we need to get one page from the hash index
and one page with the record in it. Guestimating a 90% hit rate in
the hash index and 80% hit rate on the data page we find that we
will about 0.3 swap misses per key lookup. If we assume 50
microseconds for this it means that mean key lookup will increase
from 10 microseconds to 25 microseconds. This should be
acceptable, given that we can increase data size by a factor of
A similar analysis can be made for scans as well, but I'm lazy so
will leave it to you to perform :)
So given todays sizes of memories and SSD's it should be possible
to use systems with 64 GBytes of memory and 640 GB of SSD memory
and clustering 8 of those with replication gives us a main memory
based system for a reasonable price providing 2.5 TByte of user
data in a highly available system with high degrees of parallelism
in the system.