Tuesday, August 01, 2023

Modernising a Distributed Hash implementation

 As part of writing my Ph.D thesis about 30 years ago I invented a new distributed hash algorithm called LH^3. The idea is that I apply the hashing in 3 levels. The first level uses the hash to find the table partition where the key is stored, the second level uses the hash to find the page where the key is stored and the final step uses the hash to find the hash bucket where the key is stored.

The algorithm is based on linear hashing and distributed linear hashing developed by Witold Litwin that I had the privilege at the time to have many interesting discussions with. My professor Tore Risch had collaborated a lot with Witold Litwin. I also took the idea of storing the hash value in the hash bucket to avoid having to compare every key from Mikael Pettersson, another researcher at University of Linköping.

The basic idea is described in my Ph.D thesis. The implementation in MySQL Cluster and in RonDB (fork of MySQL Cluster) is still very much similar to this. This hash table is one of the reasons of the good performance of RonDB, it makes sure that the hash lookup normally only hits one CPU cache miss during the hash search.

At Hopsworks we are moving the implementation of RonDB forward with a new generation of developers, in this particular work I am collaborating with Axel Svensson. The best method to learn the code is as usual to rewrite the code. RonDB has access to more memory allocation interfaces compared to MySQL Cluster, so I thought this could be useful.

Interestingly going through the requirements on memory allocations with a fresh mind more or less comes to the same conclusions as 30 years ago. So after 30 years of developing the product one can rediscover the basic ideas underlying the product.

The original implementation made it possible to perform scan operations using the hash index. However this led to a 3x increase of complexity of the implementation. Luckily nowadays one can also scan using the row storage. Thus in RonDB we have removed the possibility to scan using the hash index. This opens up for rewriting the hash index with much less complexity.

A hash implementation thus consists of the following parts, a dynamic array to find the page, a method to handle the memory layout of the page, a method to handle the individual hash buckets and finally a method to handle overflow buckets.

What we found is that the dynamic array can be much more efficiently implemented using the new memory allocation interfaces. The overflow buckets can potentially be handled with other techniques other than just overflow buckets, one could also handle them using recursive hashing.

What we have found is that the idea of using pages and hash buckets inside those pages is still a very good idea for a hash table that must be very adaptable to both increasing sizes and decreasing sizes.

Modern CPUs have new instructions to handle parallel execution of searches, this can be used to speed up the lookup in the hash buckets.

On top of this the hash function used in RonDB is MD5, this is replaced with a new hash function XX3_HASH64 that is about 30x faster.

A new requirement in RonDB compared to MySQL Cluster is that we work with applications that constantly create and drop tables and also the number of tables can be substantial and thus there could be many very small tables. This means that a small table could make use of an alternative and much simpler implementation to save memory.

This is work in progress, it serves a number of purposes, it is a nice way to learn the RonDB code base for new developers, it means that we can save memory for hash indexes, it means that we can make the implementation even more optimised, it simplifies the code thus making it easier to support it and it makes use of the new modern CPU instructions to substantially speed up the hash index lookups.