Wednesday, December 22, 2021

Merry Christmas from the RonDB team

 This year we bring a new christmas present in the form of a new release of RonDB 21.04.

It is packed with improvements, our focus has been on extending support for more platforms while at the same time increasing the quality of RonDB.

Normally the RonDB 21.04.2 would have been released in October 2021. However we had a number of support issues where we had crashes due to running very large transactions. RonDB is designed for OLTP with small to moderate sizes of transactions. However some applications makes use of foreign keys that use ON DELETE CASCADE or ON UPDATE CASCADE and these transactions can easily become hundreds of thousands of operations.

This meant changing the handling of transactions, since this was a rather large change in a stable release we wanted to ensure that we didn't introduce any quality issues. We used this opportunity to make an extensive effort in fixing all sorts of other bugs at the same time.

The new RonDB release have been tested with transaction sizes up to a number of million row operations in one transaction. We still recommend to keep transaction sizes at moderate levels since very large transactions will make heavy use of CPU and memory resources during commit and abort processing. In addition very large transactions will lock large parts of the database, thus making it more difficult for other transactions. Generally an OLTP database behaves much better if transaction sizes are kept small.

RonDB development is very much focused on supporting cloud operations. This means that our focus is on supporting Linux for production installations. Quite a few cloud vendors are now supporting ARM64 VMs in addition to the traditional Intel and AMD x86 VMs. Also Apple released a set of new ARM64 laptops lately.

Our development platform is both Mac OS X and Linux, thus it makes sense to also release RonDB on Mac OS X.

Thus we took the opportunity in RonDB 21.04.2 to provide support for ARM64 as a new platform to use for RonDB. This support covers both Linux and Mac OS X. The ARM64 support is still in beta state.

In addition we test RonDB extensively on Windows using WSL 2, the Windows subsystem to run Linux on top of Windows. Thus our Linux tarballs should work just fine to test also on Windows platforms through WSL 2.

RonDB 21.04.2 contains a large set of bug fixes that can be found in details in the RonDB documentation at https://docs.rondb.com. With these changes RonDB 21.04 contains around 100 bug fixes on top of the stable release of MySQL NDB Cluster 8.0.23 and around 15 new features.

Even more releases are developed in RonDB 21.10 and upcoming new versions of RonDB. These versions will be released when they are ready for more general consumption, but the development can be tracked on RonDB's git. If you want early access to the binary tarballs of RonDB 21.04.2 you can visit the git homepage of RonDB

Early next year we will return with benchmarks of RonDB that shows all the qualities of a LATS database.  These benchmarks will show all four qualities of a LATS database, thus low L(atency), high A(vailability), high T(hroughput) and S(calable storage).

So finally a Merry Christmas and a Happy New Year from the RonDB team.

Wednesday, October 20, 2021

Running Linux in Windows 11

 Most people that know me, also knows how I really don't like working with Windows. I actually worked with Microsoft products already in the 1980s, so I haven't been able to stay away from them completely.

Developing MySQL on a Windows platform haven't been so easy since building MySQL on Windows is difficult and also there are a number of issues in running even the simple test suites of MySQL.

At the same time the world is full of developers with only Windows as their development platform. So when I read about the possibility to run Linux inside Windows 11 I thought it was time to test if development of RonDB could now be done on Windows.

My 9-year old laptop needed replacement, so I went ahead and found a nice Windows laptop at a fair price. I wanted to test the very latest developments that I could find in Windows 11. Since the laptop was delivered with Windows 10 I did an unusual thing, I wanted to upgrade the OS :)

This wasn't exactly easy, took a few hours, but eventually after upgrading Windows 10 a few times and searching for a Windows 11 download I eventually found such a one using Google on the Microsoft website. After a couple of upgrades I was at the latest Windows 11 release.

Installing Linux was quite easy, it was one simple command. I installed an Ubuntu variant.

Most of the installation went ok. The graphics installation didn't work, but the installation of terminal software was good enough to test at least the testing part. For development I use graphical Vim, so this needs to wait for a working version of the graphical parts of Linux (or using a Windows editor, but likely not since they tend to add extra line feeds on the line).

Downloading the RonDB git tree went fine. Compiling RonDB required installing a number of extra packages, but that is normal and will happen also in standard Linux (build essentials, cmake, openssl dev, ncurses, bison, ..).

Running the MTR test suite also went almost fine. I had to install zip and unzip as well for this to work.

Running the MTR test suite takes about an hour and here I found a few challenges. First I had to find the parallelism it could survive. I was hoping on parallelism of 12 (which in reality for RonDB means 6 parallel tests running). But in reality it was only stable with a parallelism of 6.

However since I wasn't sitting by the Windows laptop while the test was running the screen saver decided to interfere (although I had configured it to go into screen save mode after 4 hours). Unfortunately the screen saver decided that the Linux VM should be put to sleep and this meant that all test cases running when the screen saver hit in failed. Seems like this is a new issue in WSL 2 not existing in WSL 1.

However I think that I am still happy with what I saw. Running an operating system inside another and making it feel like Linux is a part of Windows isn't an easy task. So here I must give some kudos to the development team. So if they continue working on this integration I think that I am going to get good use of my new laptop.

I must admit that I don't fully understand how they have solved the issue of running Linux inside Windows. But it definitely looks like the Linux kernel makes use of Windows services to implement the Linux services. Running top in an idle system is interesting, there is only a few init processes and a bash process. So obviously all the Linux kernel processes are missing and presumably implemented inside Windows in some fashion.

The best part is that the Linux VM configuration isn't static. The Linux VM could make use of all 16 CPUs in the laptop, but it could also allow Windows to grab most of them. So obviously the scheduler can handle both Linux and Windows programs.

Memory-wise the Linux VM defaults to being able to grow to a maximum of 80% of the memory in the laptop. However in my case top in Linux constantly stated that it saw 13.5 GB of memory in a machine with 32 GB of memory. I saw some post on internet stating that Linux can return memory to Windows if it is no longer needed. Not sure I saw this, but it is a new feature, so feel confident it will be there eventually.

So at least working with RonDB on Windows 11 is going to be possible. How exactly this will pan out I will write about in future worklogs. At least it is now possible that I can do some development in Windows, it was more than 30 years ago I last had a development machine with a Microsoft OS, so to me, Linux on Windows is definitely making Windows as a platform a lot more interesting.

My development environments have shifted a bit over the years. It started with a mix of Microsoft OSs and Unix and some proprietary OSs in the 80s. In the 90s I was mostly working in Solaris on Sun workstations. Early 2000's I was working with Linux as development machine. But since 2003 I have been working mostly on Mac OS X (and of course lots of test machines on all sorts of platforms).

Tuesday, September 28, 2021

Memory Management in RonDB

 Most of the memory allocated in RonDB is handled by the global memory manager. Exceptions are architecture objects and some fixed size data structures. In this presentation we will focus on the parts handled by the global memory manager.

In the global memory manager we have 13 different memory regions as shown in the figure below:



- DataMemory

- DiskPageBufferCache

- RedoBuffer

- UndoBuffer

- JobBuffer

- SendBuffers

- BackupSchemaMemory

- TransactionMemory

- ReplicationMemory

- SchemaMemory

- SchemaTransactionMemory

- QueryMemory

- DiskOperationRecords

One could divide those regions into a set of qualities. We have a set of regions that are fixed in size, another set of regions are critical and cannot handle failure to allocate memory, a set of regions have no natural upper limit and are unlimited in size, there is also a set of regions that are flexible in size that can work together to achieve the best use of memory. We can also divide regions based on whether the memory is short term or long term. Each region can belong to multiple categories.

To handle these qualities of the regions we have priorities on each memory region, this priority can be affected by the amount of memory that the resource has allocated.

Fixed regions have a fixed size, this is used for database objects, the Redo log Buffer, the Undo log buffer, the DataMemory and the DiskPageBufferCache (the page cache for disk pages). There is code to ensure that we queue up when those resources are no longer available. DataMemory is a bit special and we will describe it separately below.

Critical regions are regions where a request to allocate memory would cause a crash. This relates to the job buffer which is used for internal messages inside a node, it also relates to send buffers which are used for messages to other nodes. DataMemory is a critical region during recovery, if we fail to allocate memory for database objects during recovery we would not be able to recover the database. Thus DataMemory is a critical region in the startup phase, but not during normal operation. DiskOperationRecords are also a critical resource since otherwise we cannot maintain the disk data columns. Finally we also treat BackupSchemaMemory as critical since not being able to perform a backup would make it very hard to manage RonDB.

Unlimited regions have no natural upper limit, thus as long as memory is available at the right priority level, the memory region can continue to grow. The regions in this category is BackupSchemaMemory, QueryMemory and SchemaTransactionMemory. QueryMemory is memory used to handle complex SQL queries such as large join queries. SchemaTransactionMemory can grow indefinitely, but the meta data operations try avoid growing too big.

Flexible regions are regions that can grow indefinitely but that have to set limits on its own growth to ensure that other flexible regions are also allowed to grow. Thus one flexible resource isn't allowed to grab all the shared memory resources. There are limits to how much memory a resource can grab before its priority is significantly lowered.

Flexible regions are TransactionMemory, ReplicationMemory, SchemaMemory, QueryMemory, SchemaTransactionMemory, SendBuffers, BackupSchemaMemory, DiskOperationRecords, 

Finally we have short term versus long term memory regions. A short term memory region allocation is of smaller signifance compared to a long term memory region. In particular this relates to SchemaMemory. SchemaMemory contains metadata about tables, indexes, columns, triggers, foreign keys and so forth. This memory once allocated will stay for a very long time. Thus if we allow it to grow too much into the shared memory we will not have space to handle large transactions that require TransactionMemory.

Each region has a reserved space, a maximum space and a priority. In some cases a region can also have a limit where its priority is lowered.

4% of the shared global memory is only accessible to the highest priority regions plus half of the reserved space for job buffers and communication buffers.

10% of the shared global memory is only available to high prio requesters. The remainder of the shared global memory is accessible to all memory regions that are allowed to allocate from the shared global memory.

The actual limits might change over time as we learn more about how to adapt the memory allocations.

Most regions have access also to a shared global memory. It will first use its reserved memory and if there is shared global memory available it can allocate from this as well.

The most important ones are DataMemory and DiskPageBufferMemory. Any row stored in memory and all indexes in RonDB are stored in the DataMemory. The DiskPageBufferMemorycontains the page cache for data stored on disk. To ensure that we can always handlerecovery, DataMemory is fixed in size and since recovery can sometimes grow the data size a bit. We don't allow the DataMemory to be filled beyond 95% in normal operation. In recovery it can use the full DataMemory size. Those extra 5% memory resources are also reserved for critical operations such as growing the cluster with more nodes and reorganising the data inside RonDB. The DiskPageBufferCache is fixed in size, operations towards the disk is queued by using DiskOperationRecords.

Critical regions which have higher priority to get memory compared to the rest of the regions. These are job buffers used for sending messages between modules inside a data node, send buffers used for sending messages between nodes in the cluster, the meta data required for handling backup operations and finally operation records to access disk data.

These regions will be able to allocate memory even when all other regions will fail to allocate memory. Failure to access memory for those regions would lead to failure of the data node or failure to backup the data which are not events that are acceptable in a DBMS.

We have 2 more regions that are fixed in size, the Redo log buffer and the Undo log buffer (the Undo log is only used for operations on disk pages). Those allocate memory at startup and use that memory, there is some functionality to handle overload on those buffers by queueing operations when those buffers are full.

The remaining 4 regions we will go through in detail.

The first one is TransactionMemory. This memory region is used for all sorts of operations such as transaction records, scan records, key operation records and many more records used to handle the queries issued towards RonDB.

The TransactionMemory region have a reserved space, but it can grow up to 50% of the shared global memory beyond that. It can even grow beyond that, but in this case it only has access to the lowest priority region of the shared global memory. Failure to allocate memory in this region leads to aborted transactions.

The second region in this category is SchemaMemory. This region contains a lot of meta data objects representing tables, fragments, fragment replicas, columns, and triggers. These are long-term objects that will be there long-term. Thus we want this region to be flexible in size, but we don't want it grow such that it diminishes the possibility to execute queries towards region. Thus we calculate a reserved part and allow this part to grow into at most 20% of the shared memory region in addition to its reserved region. This region cannot access the higher priority memory regions of the shared global memory.

Failure to allocate SchemaMemory causes meta data operations to be aborted.

Next region in this category is ReplicationMemory. These are memory structures used to represent replication towards other clusters supporting Global Replication. It can also be used to replicate changes from RonDB to other systems such as ElasticSearch. The memory in this region is of temporary nature with memory buffers used to store the changes that are being replicated. The meta data of the replication is stored in the SchemaMemory region.

This region has a reserved space, but it can also grow to use up to 30% of the shared global memory. After that it will only have access to the lower priority regions of the shared global memory.

Failure to allocate memory in this region lead to failed replication. Thus replication have to be set up again. This is a fairly critical error, but it is something that can be handled.

The final region in this category is QueryMemory. This memory has no reserved space, it can use the shared global lower priority regions. This memory is used to handle complex SQL queries. Failure to allocate memory in this region will lead to complex queries being aborted.

This blog presents the memory management architecture in RonDB that is currently in a branch called schema_mem_21102, this branch is intended for RonDB 21.10.2, but could also be postponed to RonDB 22.04. The main difference in RonDB 21.04 is that the SchemaMemory and ReplicationMemory are fixed in size and cannot use the shared global memory. The BackupSchemaMemory is also introduced in this branch. It was currently part of the TransactionMemory.

In the next blog on this topic I will discuss how one configures the automatic memory in RonDB.

Friday, September 24, 2021

Automatic Memory Management in RonDB

RonDB has now grown up to the same level of memory management as you find in expensive commercial DBMSs like Oracle, IBM DB2 and Microsoft SQL Server.

Today I made the last development steps in this large project. This project started with a prototype effort by Jonas Oreland already in 2013 after being discussed for a long time before that. After he left for Google the project was taken over by Mauritz Sundell that implemented the first steps for operational records in the transaction manager.

Last year I added the rest of the operational records in NDB. Today I completed the programming of the final step in RonDB. This last step meant moving around 30 more internal data structures towards using the global memory manager. These memory structures are used to represent meta data about tables, fragments, fragment replicas, triggers and global replication objects.

One interesting part that is contained in this work is a malloc-like implementation that interacts with all record-level data structures that is already in RonDB to handle linked list, hash tables and so forth for internal data structures.

So after more than 5 years it feels like a major step forward in the development of RonDB.

What does this mean for a user of RonDB? It means that the user won't have to bother much with memory management configuration. If RonDB is started in a cloud VM, it will simply use all memory in the VM and ensure that the memory is handled as a global resource that can be used by all parts of RonDB. This feature is exactly existing already in RonDB 21.04. What this new step means is that the memory management is even more flexible, there is no need to allocate more memory than needed for meta data objects (and vice versa if more memory is needed, it is likely to be accessible).

Thus memory can be used for other purposes as well. Thus the end result is that more memory is made available in all parts of RonDB, both to store data in it and to perform more parallel transactions and more query handling.

Another important step is that this step opens up for many new developments to handle larger objects in various parts of RonDB.

In later blogs we will describe how the memory management in RonDB works. This new development will either appear in RonDB 21.10 or in RonDB 22.04.