tag:blogger.com,1999:blog-144551772024-03-07T18:57:26.411+01:00Mikael RonstromMy name is Mikael Ronstrom and I work for Hopsworks AB as
Head of Data. I also assist companies working with NDB Cluster as self-employed consultant. I am a member of The Church of
Jesus Christ of Latter Day Saints. I can be contacted at mikael dot ronstrom at gmail dot com for NDB consultancy services.
The statements and opinions expressed on this blog are my own and do not necessarily represent those of Hopsworks AB.Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.comBlogger265125tag:blogger.com,1999:blog-14455177.post-87721023242050479982024-03-05T17:40:00.000+01:002024-03-05T17:40:27.135+01:00Testing of RonDB releases<p> Since RonDB is a fork of MySQL NDB Cluster it contains a lot of tests that is part of the RonDB development tree. This includes unit tests for various functionalities. It includes many hundreds of MTR test cases that takes between a few seconds to a few minutes to run. These tests are mostly test cases that use SQL commands to test the functionality of RonDB, in addition it tests backup and restore and a few other tools in RonDB. These tests are executed with debug compiled binaries, binaries compiled with error injection, binaries compiled for production and finally the binaries we use in the releases.</p><p>Another very important part of RonDB testing is the autotests. These tests are using the NDB API to test its functionality, it also has a lot of focus on testing recovery. This test suite contains thousands of tests that takes 36 hours to go through one test run when executed serially. It can be parallelised by running it on multiple clusters. This test suite can also be executed on different configurations with different number of replicas, different number of node groups, different number of CPUs per node and different memory sizes in the nodes.</p><p>RonDB is heavily used in Hopsworks. One part of Hopsworks is HopsFS. This is a distributed file system which is built on top of RonDB. It is written in Java and thus interfaces with ClusterJ, the Java API to RonDB that uses an easy to program model of the NDB API. HopsFS has a whole range of test cases related to it that also will be executed on a daily basis, this includes both functional tests and load tests.</p><p>RonDB is also used to handle metadata in Hopsworks and it is used as the Online Feature Store in Hopsworks. This means that the Hopsworks users will define new tables and new table structures on the fly. These parts of Hopsworks again have a set of functional tests and load tests.</p><p>Next there are upgrade tests verifying that we can perform an online upgrade of RonDB and these tests also include verifying that we can downgrade back to the old version if the upgrade didn't work as it should.</p><p>There are test cases also to handle replication to other clusters. This is a very important part of the Hopsworks framework that we support setups with multiple regions.</p><p>There are also benchmark suites, mostly Sysbench, DBT2 (~ TPC-C), DBT3 (~TPC-H) and YCSB that we regularly execute.</p><p>Hopsworks supports managed RonDB in the cloud. This offering includes support for reconfiguration of the RonDB Cluster as an online operation where we can scale resources such as MySQL Servers, REST API servers, RonDB data nodes. This management framework also has its own set of test suites that is regularly executed.</p><p>We are developing a REST API server, it is already completed in a Go version and a new C++ version of it is in development. This adds yet more tests of the RonDB functionality.</p><p>The latest addition is that we are now also developing a Kubernetes operator for RonDB. Again this operator contains CI/CD that ensures that every RonDB releases can be handled in this Kubernetes framework.</p><p>When a RonDB release is finished it has gone through all of those stages.</p><p>After release the RonDB software is used by community users and the Hopsworks customers. Any bugs found by them is immediately fed into the development process. Among other things a community user has added a CommonLisp NDB API to what is supported by RonDB.</p><p>As is hopefully clear from this picture a RonDB release is heavily tested before its release. The next LTS version of RonDB will be RonDB 22.10.1. This software have been moving through all these test frameworks and is going to be made into GA very soon. Since this is a new LTS version we have been especially careful in our testing of this version. At the moment the RonDB 22.10.2 version is going through heavy MTR testing.</p><p>This hopefully makes it clear that building a DBMS and building a data platform that uses it actually is very beneficial for the quality of the DBMS product. Thus RonDB have been through a much more varied set of tests than most DBMSs are facing that works strictly as a DBMS.</p><p>We often find bugs that originates from MySQL NDB Cluster. We try our best to be a good open source citizens by feeding back those as contributions to Oracle so that they can be included in future releases of MySQL NDB Cluster. In our view a bigger community for MySQL NDB Cluster is also good for RonDB.</p><p>Similarly of course we benefit from bug fixes that originates from Oracle. We are currently integrating MySQL 8.0.35 and 8.0.36 into RonDB 22.10 series. RonDB 22.10.1 is based on MySQL 8.0.34.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-78263516448330980572024-01-31T00:38:00.000+01:002024-01-31T00:38:08.821+01:00The completion of a 12 year long project in RonDB<p> In 2012 a project was started to change the memory model of MySQL NDB Cluster. The first step was some early prototypes developed in 2012 and 2013 by Jonas Oreland. When Jonas left Oracle for Google it took a while before the project got up and running again. The first project was to change the memory model for the operation records used by transactions. This project started in 2015.</p><p>It took quite some time to complete. The requirement on maintained performance was high, this required going through the changes ensuring that we either gained or at most lost 1-2% performance. The traditional model used in NDB had a very simple model that had extremely good performance, to maintain the good performance the developer Mauritz experimented with eight different new memory models before settling for a model we call TransientPool. This pool relies on that memory objects are allocated for a short time (typical for short transactions). I assisted Mauritz in ensuring that we maintained performance.</p><p>Finishing this step completed most of the framework for the new memory management model. However it only took care of a fairly small part of all memory parts in NDB. Another step was completed around 2018-2019 that finalised all work on operation records. This was the most significant part of the change and the most important one.</p><p>When I joined Hopsworks we wanted to avoid having loads of configuration parameters affecting setup of RonDB 21.04. To handle this we simply configure to support 20000 table objects (table, ordered indexes and unique indexes). This still used the old memory management model. In RonDB 22.10 the work was finalised, the final part was to move also all memory related to metadata to the new memory management model (called SchemaMemory) and also the memory used by replication to other RonDB clusters (called ReplicationMemory).</p><p>Thus with the release of RonDB 22.10.1 we have finished this very long project transitioning MySQL NDB Cluster to a new memory management model in RonDB. This means that all memory parts share a common memory pool that is allocated at startup. This pool have around 11 different parts and when one part requires much memory it can get from the shared global memory and there is a priority of who gets memory in a situation when the free memory is low.</p><p>The new memory management model in RonDB 22.10.1 also includes that one can use a malloc and free-like model to get memory from the different pools. This will be useful for all sorts of new developments in RonDB. </p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-61608811273676253992024-01-02T16:17:00.005+01:002024-01-02T16:17:46.854+01:00Major update to the RonDB documentation<p>My colleague Vincent has spent some time improving the RonDB documentation. </p><p><span style="background-color: #f8f8f8; color: #1d1c1d; font-family: Slack-Lato, Slack-Fractions, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures;">New/rewritten chapters/sections are:</span></p><div class="p-rich_text_section" style="background-color: #f8f8f8; box-sizing: inherit; color: #1d1c1d; counter-reset: list-0 0 list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; font-family: Slack-Lato, Slack-Fractions, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures;"><span aria-label="" class="c-mrkdwn__br" data-stringify-type="paragraph-break" style="box-sizing: inherit; display: block; height: 8px;"></span></div><ul class="p-rich_text_list p-rich_text_list__bullet" data-border="0" data-indent="0" data-stringify-type="unordered-list" style="background-color: #f8f8f8; box-sizing: inherit; color: #1d1c1d; font-family: Slack-Lato, Slack-Fractions, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures; list-style-type: none; margin: 0px; padding: 0px;"><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Main page: <a class="c-link" data-sk="tooltip_parent" data-stringify-link="https://docs.rondb.com" delay="150" href="https://docs.rondb.com/" rel="noopener noreferrer" style="box-sizing: inherit; text-decoration-line: none;" target="_blank">https://docs.rondb.com</a></li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Installing: <a class="c-link" data-sk="tooltip_parent" data-stringify-link="https://docs.rondb.com/rondb_installation/" delay="150" href="https://docs.rondb.com/rondb_installation/" rel="noopener noreferrer" style="box-sizing: inherit; text-decoration-line: none;" target="_blank">https://docs.rondb.com/rondb_installation/</a></li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Local Quickstart: <a class="c-link" data-sk="tooltip_parent" data-stringify-link="https://docs.rondb.com/rondb_quickstart_local/" delay="150" href="https://docs.rondb.com/rondb_quickstart_local/" rel="noopener noreferrer" style="box-sizing: inherit; text-decoration-line: none;" target="_blank">https://docs.rondb.com/rondb_quickstart_local/</a></li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Start Distributed: <a class="c-link" data-sk="tooltip_parent" data-stringify-link="https://docs.rondb.com/rondb_programs/" delay="150" href="https://docs.rondb.com/rondb_programs/" rel="noopener noreferrer" style="box-sizing: inherit; text-decoration-line: none;" target="_blank">https://docs.rondb.com/rondb_programs/</a></li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Recovery (entire chapter): <a class="c-link" data-sk="tooltip_parent" data-stringify-link="https://docs.rondb.com/rondb_high_availability/" delay="150" href="https://docs.rondb.com/rondb_high_availability/" rel="noopener noreferrer" style="box-sizing: inherit; text-decoration-line: none;" target="_blank">https://docs.rondb.com/rondb_high_availability/</a></li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Two-Phase Commit Protocol: <a class="c-link" data-sk="tooltip_parent" data-stringify-link="https://docs.rondb.com/rondb_nonblocking_2pc/" delay="150" href="https://docs.rondb.com/rondb_nonblocking_2pc/" rel="noopener noreferrer" style="box-sizing: inherit; text-decoration-line: none;" target="_blank">https://docs.rondb.com/rondb_nonblocking_2pc/</a></li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Transaction Model (only ACID section, further PR incoming) <a class="c-link" data-sk="tooltip_parent" data-stringify-link="https://docs.rondb.com/intro_transactions/" delay="150" href="https://docs.rondb.com/intro_transactions/" rel="noopener noreferrer" style="box-sizing: inherit; text-decoration-line: none;" target="_blank">https://docs.rondb.com/intro_transactions/</a></li></ul><div class="p-rich_text_section" style="background-color: #f8f8f8; box-sizing: inherit; color: #1d1c1d; counter-reset: list-0 0 list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; font-family: Slack-Lato, Slack-Fractions, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures;"><span aria-label="" class="c-mrkdwn__br" data-stringify-type="paragraph-break" style="box-sizing: inherit; display: block; height: 8px;"></span>Further UI changes/fixes:<br style="box-sizing: inherit;" /></div><ul class="p-rich_text_list p-rich_text_list__bullet" data-border="0" data-indent="0" data-stringify-type="unordered-list" style="background-color: #f8f8f8; box-sizing: inherit; color: #1d1c1d; font-family: Slack-Lato, Slack-Fractions, appleLogo, sans-serif; font-size: 15px; font-variant-ligatures: common-ligatures; list-style-type: none; margin: 0px; padding: 0px;"><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Added dark mode</li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">HTTP links are visible again</li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Recognition of programming language in code snippets (using Lua filter)</li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">Order & naming of chapters</li><li data-stringify-border="0" data-stringify-indent="0" style="box-sizing: inherit; list-style-type: none; margin-bottom: 0px; margin-left: 28px;">A number of new images based on our Cheetah logo</li></ul>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-51570461547277056362023-11-03T08:12:00.001+01:002023-11-03T08:12:15.688+01:00Presentation of RonDB at Meetup<p>For those that didn't have a chance to come to Stockholm and listen to the presentation of RonDB, <a href="https://www.slideshare.net/mikael329498/rondb-a-newsql-feature-store-for-ai-applicationspdf" target="_blank">here are the slides</a> from the presentation.</p><p>The presentation presents the Requirements, Architecture, Status of RonDB and its use in Hopsworks and other applications.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-12883566331120240802023-10-26T10:42:00.006+02:002023-10-26T10:42:41.849+02:00Results on comparing new Intel/AMD VMs with older VM types using RonDB<p> In <a href="https://hopsworks.ai" target="_blank">Hopsworks</a> cloud offering for GCP one can select a fairly large variety of VM types. I am currently working on extending this list to also include the latest generation of VM types. This blog will focus on the impact of those new VM types for benchmarks using <a href="https://www.rondb.com" target="_blank">RonDB</a>.</p><p>The newer VM types is the c3d-serie that uses AMD EPYC CPUs of the 4th generation and the c3-series which contains VMs using the Intel Saphire Rapid CPUs. Also AWS has introduced similar new VM types, but this blog discuss tests performed on VMs in GCP.</p><p>The older VM types we compared with for the MySQL Servers was the n2-standard-16 VM type. This VM uses an Intel Cascade Lake Xeon processor. This represents the second generation Intel Xeon chips whereas Intel Saphire Rapid represents the 4th generation Intel Xeon.</p><p>The RonDB data nodes used the e2-highmem-16 as the baseline for comparison. This VM types uses either an Intel Xeon of the second generation or an AMD EPYC of the second generation.</p><p>The benchmark used was Sysbench OLTP RW based on version 0.4.12.19 which is included in the RonDB tarball and is setup in the API nodes automatically by our cloud offering. This makes it extremely easy to replicate the benchmarks. We use Consul as a load balancer, so the benchmark process is setup to a single host <b>onlinefs.mysql.service.consul</b>. In reality this address maps to the number of MySQL Servers in the RonDB cluster. We used 3 MySQL Servers in the tests. The setup used 2 RonDB data nodes in one node group.</p><p>Thus in the Hopsworks cloud we get a load balanced RonDB Data Service as part of the infrastructure of the Hopsworks Feature Store.</p><p>We first executed the benchmark using the old VM types to get a baseline. The next step was to upgrade the RonDB MySQL Servers to use c3d-highmem-16. Thus the same amount of memory and number of CPUs as in n2-standard-16 but upgraded from Intel 2nd generation to AMD 4th generation.</p><p>This impacted the throughput mainly. The baseline experiment executed 9000 TPS and was limited by the CPUs in the MySQL Servers (they used 1550% of the 1600% available). The c3d-highmem-16 delivered 11400 TPS but only using 1000% of the available 1600%. Thus the throughput per CPU increased by around 100%. In this execution the bottleneck of the benchmark was the RonDB data nodes.</p><p>The benchmark API node was consistently a n2-standard-48 VM. This meant that most communication went from API VM of old type, to MySQL Server of new type, to RonDB data node VM of old type. Thus in all communication an old VM type was involved. The network latency was the same in this experiment as in the baseline experiment.</p><p>The change from one VM type was using the Reconfiguration support RonDB have in its Cloud offering. This change is an online operation where the cluster remains operational and the new MySQL Servers are included in the Consul setup as soon as they have started up. Only when nodes are stopped could temporary errors happen that can be handled with a simple retry logic.</p><p>Next we changed also the VM type of the RonDB data nodes to be c3d-highmem-16 using the same online reconfiguration as for the MySQL Servers.</p><p>What we quickly noted in this setup was that the latency per transaction was cut in half. Thus performance using a single thread decreased to less than half. Thus it is clear that communication between 2 VMs of the new type have more than 100% improvements on network latency. The throughput now increased to 17800 TPS and the bottleneck was now in the MySQL Servers. Thus throughput improvement is almost 98% and network latency improved by more than 100%.</p><p>When reading the announcement of the C3 machine series and the description of the C3D machine series, it is clear that the new IPU (Infrastructure Processing Unit) that takes care of offloading networking is a major reason for this improved network latency.</p><p>Analysing the Sysbench transaction in this setup there will be around 100 network messages, most of them in serial order. Still the latency of a transaction execution is no more than 6 milliseconds to execute the 20 SQL queries involved in the OLTP RW transaction. Thus a medium of 60 microsecond per message and this includes the time to also execute the RonDB Data node code and the RonDB MySQL Server code.</p><p>Next step was to again change MySQL Server VMs. This time we changed to c3-highmem-22. Unfortunately the VM type c3-highmem-16 didn't exist. So the comparison isn't perfect, but at least it gives a good estimate of the improvements in Intel's 4th generation CPUs.</p><p>The network latency was the same for Intel and AMD 4th generation VM types. The throughput increased by around 40% up to around 24000 TPS. Since the number of CPUs increased by around 40% as well, it seems that c3-serie and c3d-serie is very similar in handling throughput when used in RonDB MySQL Servers.</p><p>To test the throughput of those new VMs we ran the test using c3-highmem-8 and c3d-highmem-8 VM types as RonDB Data node VMs. The performance of those two VM types was almost indistinguishable, to the point where I started wondering if they were the same CPUs. Throughput was half the throughput of the 16 VCPU VMs.</p><p>The main conclusion of these tests is that upgrading from 2nd generation x86 CPUs to 4th generation x86 CPUs in the GCP cloud provides a 100% improvement in throughput and a similar improvement of the network latency.</p><p>The price of those VMs is higher, but substantially less than 100%. So it makes a lot of sense to start using those new VM types for new applications.</p><p>The tests were performed using the RonDB version 21.04.15. We are about to release a new LTS version of RonDB, version 22.10.1. There will be a more thorough benchmark report when this is released.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-46592793864606482062023-09-29T18:22:00.000+02:002023-09-29T18:22:31.131+02:00Release of RonDB 21.04.15<p> We have worked hard on ensuring stability and adding the required features for our customers lately. Thus the RonDB 21.04.15 release has reached a very high quality level and will be able to sustain users of it until they desire to upgrade to a newer release of RonDB.</p><p>Most of the changes in this release is related to the new REST API server that makes it possible to read using single reads or batch reads using primary key lookups through a REST protocol or through a gRPC protocol. The REST API server also supports reading directly from the Hopsworks Feature Store that takes into account the metadata model of the Hopsworks Feature Store.</p><p>Much of the work around RonDB is centered around automated management of RonDB. To this end we have developed the ndb-agent that makes it possible to create a cluster, stop the cluster, start the cluster again, take a backup, delete a backup, restore from backup and finally to reconfigure the cluster as an online operation.</p><p>Reconfigure the cluster means adding or removing replicas, increasing the size of data node VMs. It means that MySQL Server VMs can be added, changed and dropped as needed by the application.</p><p>All of those operations are already operational and working. We are now working on an improvement that speeds up the change process significantly. Adding a new MySQL Server can now be done in 2-3 minutes and most of this time is spent on creating the new VM in the choosen cloud (Hopsworks supports AWS, GCP and Azure).</p><p>The new ndb-agent works in the same fashion as Kubernetes through maintaining a desired state. This means that it is fairly straightforward for the ndb-agent to support both our cloud offering and a Kubernetes setup.</p><p>RonDB development is now focused on the new RonDB release 22.10.1. This will introduce 8 new features. The most important feature is supporting variable sized disk columns. RonDB 22.10 has been in development and testing for almost 3 years already, so it is already a very stable release. It brings in addition a number of performance improvements.</p><p>The release notes for <a href="https://docs.rondb.com/release_notes_210415/" target="_blank">RonDB 21.04.15</a>.</p><p>The full set of new features in <a href="https://docs.rondb.com/new_features_2104/" target="_blank">RonDB 21.04</a>.</p><p>The full set of new features in <a href="https://docs.rondb.com/new_features_2210/" target="_blank">RonDB 22.10</a>.</p><p>The new Hopsworks release also makes use of Replication between RonDB clusters. A Hopsworks cluster can use a single small RonDB cluster and can grow into an Enterprise setup with several large RonDB clusters and replicated between regions far away from each other.</p><p>RonDB is used to handle the Online Feature Store, the metadata of the Hopsworks Feature Store and the metadata of HopsFS. HopsFS is the storage of the Offline Feature Store. HopsFS is a distributed file system that can store many petabytes of data in an efficient manner. Hopsworks Offline Feature Store makes use of DuckDB to perform complex analysis of the data to train AI models and perform batch inferencing.</p><p>Thus RonDB is a critically important component in the next generation AI system developed at Hopsworks. All large companies around the world is considering how they can build their AI models and supporting system. Hopsworks is providing a platform for those companies, both small and very large companies.</p><p>Hopsworks provides a free service where anyone can get a free Hopsworks account at <a href="https://app.hopsworks.ai">https://app.hopsworks.ai</a> and try out the service themselves.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-25420062916188079332023-08-01T17:28:00.001+02:002023-08-02T12:24:25.129+02:00Modernising a Distributed Hash implementation<p> As part of writing my Ph.D thesis about 30 years ago I invented a new distributed hash algorithm called LH^3. The idea is that I apply the hashing in 3 levels. The first level uses the hash to find the table partition where the key is stored, the second level uses the hash to find the page where the key is stored and the final step uses the hash to find the hash bucket where the key is stored.</p><p>The algorithm is based on linear hashing and distributed linear hashing developed by Witold Litwin that I had the privilege at the time to have many interesting discussions with. My professor Tore Risch had collaborated a lot with Witold Litwin. I also took the idea of storing the hash value in the hash bucket to avoid having to compare every key from Mikael Pettersson, another researcher at University of Linköping.</p><p>The basic idea is described in my <a href="http://www.it.uu.se/research/group/udbl/Theses/MikaelRonstromPhD.pdf" target="_blank">Ph.D thesis</a>. The implementation in MySQL Cluster and in RonDB (fork of MySQL Cluster) is still very much similar to this. This hash table is one of the reasons of the good performance of RonDB, it makes sure that the hash lookup normally only hits one CPU cache miss during the hash search.</p><p>At Hopsworks we are moving the implementation of RonDB forward with a new generation of developers, in this particular work I am collaborating with Axel Svensson. The best method to learn the code is as usual to rewrite the code. RonDB has access to more memory allocation interfaces compared to MySQL Cluster, so I thought this could be useful.</p><p>Interestingly going through the requirements on memory allocations with a fresh mind more or less comes to the same conclusions as 30 years ago. So after 30 years of developing the product one can rediscover the basic ideas underlying the product.</p><p>The original implementation made it possible to perform scan operations using the hash index. However this led to a 3x increase of complexity of the implementation. Luckily nowadays one can also scan using the row storage. Thus in RonDB we have removed the possibility to scan using the hash index. This opens up for rewriting the hash index with much less complexity.</p><p>A hash implementation thus consists of the following parts, a dynamic array to find the page, a method to handle the memory layout of the page, a method to handle the individual hash buckets and finally a method to handle overflow buckets.</p><p>What we found is that the dynamic array can be much more efficiently implemented using the new memory allocation interfaces. The overflow buckets can potentially be handled with other techniques other than just overflow buckets, one could also handle them using recursive hashing.</p><p>What we have found is that the idea of using pages and hash buckets inside those pages is still a very good idea for a hash table that must be very adaptable to both increasing sizes and decreasing sizes.</p><p>Modern CPUs have new instructions to handle parallel execution of searches, this can be used to speed up the lookup in the hash buckets.</p><p>On top of this the hash function used in RonDB is MD5, this is replaced with a new hash function XX3_HASH64 that is about 30x faster.</p><p>A new requirement in RonDB compared to MySQL Cluster is that we work with applications that constantly create and drop tables and also the number of tables can be substantial and thus there could be many very small tables. This means that a small table could make use of an alternative and much simpler implementation to save memory.</p><p>This is work in progress, it serves a number of purposes, it is a nice way to learn the RonDB code base for new developers, it means that we can save memory for hash indexes, it means that we can make the implementation even more optimised, it simplifies the code thus making it easier to support it and it makes use of the new modern CPU instructions to substantially speed up the hash index lookups.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-55675447952734874542023-07-08T16:02:00.002+02:002023-07-08T16:02:51.272+02:00Number theory for birthdays and IQ tests<p> I have always been interested in numbers and playing with them since I was a small kid. Every time someone has a birthday I am always ready to provide an alternative to have the normal decimal birthday. So e.g. having your 100th birthday when you really have your 49 birthday in decimal numbers.</p><p>So here is some number theory for birthdays and IQ tests that you can play around with on your vacations days and prepare for future birthdays and IQ tests. Have fun.</p><p>First some short introduction to numbers and the number base. When we use numbers we always assume we're counting with decimal numbers. Decimal numbers means that we are using 10 as the base. Thus when we are saying that someone is 25 years old we really mean that he is 2 * 10^1 + 5 * 10^0 = 2 * 10 + 5 year old. If instead someone has his 25 year birthday in octal number what we are saying is that he has his 2 * 8^1 + 5 * 8^0 = 2 * 8 + 5 = 21 in decimal numbers.</p><p>So by varying the number base we can have almost any birthday changed into an even birthday. For example when can we say that we have our 100th birthday. The smallest base is 2, this means that our first 100th birthday happens already at our 4th birthday using base 2. Later in life we can have a 100th birthday when we have 9th birthday, 16th birthday, 25th, 36th, 49th, 64th, 81st and 100th. It is very unlikely that someone will celebrate their 100th birthday in base 11 which would happen at age of 121.</p><p>Thus celebrating 100 years happens quite a few times, but not very often still.</p><p>Other even numbers are more common. We can have our 20th birthday every second year from our 6th birthday. To be 20, the minimum base is 3 since the number 2 cannot be used in base 2 that only have numbers 0 and 1. Thus 2 * 3 + 0 = 6 is the minimum age to become 20.</p><p>However after 6 years old you can always have your 20th birthday at any birthday which is an even number. Thus e.g. at age 38 you will be 20 using base 19, 2 * 19 + 0 = 38.</p><p>If you want to search for an appropriate age to celebrate on your next birthday, start by dividing your age into a product of prime numbers. So e.g. 38 is the product of 2 and 19 which both are prime numbers. Thus the most even numbers you can get here is 20 in base 19 and 100110 in base 2. If your age is 18 you have more options, this is divided into prime numbers 2,3 and 3 since 18 = 2 * 3 * 3. So here you can have your 200th birthday in base 3 and 10010 in base 2.</p><p>However you stumble into issues with the above approach when the age you have achieved is a prime number itself. So for example when your 37th birthday approaches, how will you divide this into an even number to celebrate. The only obvious even number to reach here is 10 years old which can be achieved with any prime number by using the prime number itself as the base.</p><p>Here the age 25 comes to the rescue which is seen as an even birthday by most people. Actually we can prove that every birthday with an odd number of years can become 25 in some base if the odd number is at least 17.</p><p>Proof: The proof is fairly simple, first of all an odd number is always written as 2 * k + 1 where k is any number. Second the minimum base to use for an age of 25 is 6 since the number 5 doesn't exist in bases 2,3,4 and 5. Thus the first time to have your 25th birthday happens on your 2 * 6 + 5 = 17th birthday. </p><p>So choose any odd number larger than or equal to 17. This number can always be written as 2 * k + 1 where k is at least 8. But it can also be written as 2 * (k - 2) + (1 + 2 * 2) = 2 * (k - 2) + 5 = 25 in base k - 2. Thus to calculate the number base to use one calculates:</p><p>(Odd - 5) / 2. Thus with 37 you get (37 - 5) / 2 = 16. Thus at your 37th birthday you have 25th birthday in base 16.</p><p>Isn't it nice to know that you can always claim to be 20 or 25 years old after reaching 17 years of age for the rest of your life :)</p><p>Have fun on future birthday in figuring what age you want to have this time.</p><p>Actually the base 10 was selected in Arabia, in many older cultures the base 12 was used, even some money systems still have the number 12 in them. If you are working with computer programs it is very popular to use hexadecimal numbers using base 16 with digits 1,2,3,4,5,6,7,8,9,A,B,C,D,E and F.</p><p>So on to IQ tests. Most of you have seen tests like the below one:</p><p>1, 4, 9, 16, ?</p><p>This one is fairly easy, it is the square of the index, thus x^2 is the function in this number series. Thus the next number in the series is 5 * 5 = 25.</p><p>Let's take a bit more complex number series now.</p><p>2, 9, 28, 65, ?</p><p>This one is a bit more difficult to see directly, I will give a hint, it is based on the function x^3 + 1. Thus the next number is 5 * 5 * 5 + 1 = 126.</p><p>Now let's take another one, we use the function x^2 - 2 * x - 2.</p><p>-3, -2, 1, 6, ?</p><p>This looks difficult at the outset, but since we know the answer we cheat and simply set it to 5 * 5 - 2 * 5 - 2 = 13.</p><p>So how does one solve this type of IQ tests in a quick manner. Well it is fairly simple using difference techniques, a bit like Fibonaccis tree.</p><p>So write the difference between the numbers and then the difference of the differences.</p><p>In the above calculation we write it up as follows.</p><p>-3, -2, 1, 6, 13</p><p> 1, 3, 5, 7</p><p> 2, 2, 2</p><p>Interesting the difference is simply a linearly increasing function which is very easy to see and the second difference is simply constant so even easier.</p><p>We can see that the difference function is simply 2 * x - 2 and the second difference is simply 2.</p><p>For those familiar with derivatives, you can see that 2 * x - 2 is the derivative function of x^2 - 2 * x - 2. and 2 is the second derivative of this function.</p><p>So now let's try if this works in practice, here is a number series again:</p><p>0, 1, 8, 27, ?</p><p>We use the difference technique:</p><p>0, 1, 8, 27, => 64</p><p> 1, 7, 19, => 37, </p><p> 6, 12, => 18</p><p>So we write the answer to be 64. Now let's check the answer, the function I used in this case was:</p><p>x^3 - 3 * x^2 + 3 * x - 1</p><p>Thus using x = 5 we get 5^3 - 3 * 5^5 + 3 * 5 - 1 = 125 - 75 + 15 - 1 = 64</p><p>Thus we found the correct answer of a fairly complex IQ test and we can claim to be more intelligent than we really are :)</p><p>Have fun in showing off your capabilities in IQ tests.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-24644584747997605402023-04-29T02:15:00.005+02:002023-04-29T02:17:33.563+02:00Status report RonDB development<p> What is going on with <a href="http://www.rondb.com" target="_blank">RonDB</a> development. Actually a lot, but most happens under the radar at the moment. So this blog will give any interested some idea about what is going on.</p><p>RonDB core development is further development of the fork of MySQL NDB Cluster. For the most part this development is focused on our production version RonDB 21.04 that is used at numerous companies in production. Development is very centered around supporting the <a href="http://www.hopsworks.ai" target="_blank">Hopsworks</a> platform. This means that we now have added 27 new features on top of MySQL NDB Cluster and 127 bug fixes. The latest feature is an improvement of the node recovery. This improvement can bring up to 4-8x shorter restart times. This was seen as an important improvement to ensure that Online Reconfiguration of RonDB in our cloud setting is speedy.</p><p>We now have 3 main versions of RonDB core. The RonDB 21.04 that we use in production. RonDB 22.10 that is prepared for use in production. It brings the possibility to store 10x more data in RonDB compared to RonDB 21.04 important for large customers and large applications. We have also started work on the next RonDB generation in RonDB 23.04 that is integrated with MySQL 8.0.33 already.</p><p>Managed RonDB has been delivered in two steps. The first integrated the possibility to start up, backup, stop and restore a RonDB database. The configuration is specified in numbers of replicas, number of MySQL Servers and type of VMs for the various node types. One can start the cluster either through a UI or through Terraform.</p><p>Now the second step is working as well, this step introduces Online RonDB Reconfiguration. One can change the number of replicas, change the VM types of the nodes and increase/decrease the number of MySQL Servers. This is currently an experimental feature available to our customers on request. The change is fully online and has been verified in internal Hackathons where our developers test various <a href="http://www.hopsworks.ai" target="_blank">Hopsworks</a> features while the RonDB cluster is reconfiguring.</p><p>We are now working on a third step that makes changes more efficient and uses the Kubernetes model with desired state. So the cloud specifies the new desired state and the agent software will ensure that the RonDB cluster moves to this new desired state. Anyone can run RonDB in Docker and try out those new changes on their own laptops.</p><p>Those steps are also available using Docker with the <a href="https://github.com/logicalclocks/rondb-docker" target="_blank">rondb-docker github tree</a>. We use Docker as a development platform making it easy to test thousands of state transformations at various levels. Soon there will be videos and blogs describing how to use Docker to test RonDB Reconciliation that will be accessible from the github tree.</p><p>It doesn't stop there, a major focus is currently on developing the first version of the RonDB REST API server. This makes it easy to access RonDB using a REST service in parallel with the MySQL Server and more efficient NDB API applications. We have already seen a great interest in this API even before it is completed.</p><p>We are also working on automating replication between clusters in different regions.</p><p>As usual there is also a set of interesting product ideas on how to improve the RonDB core with even more flexibility in growing and shrinking, making use of SIMD operations to speed up various parts of RonDB and some thoughts on long-term development projects as well.</p><p>As usual a benchmark or two is in the works as well. These are further developments of the benchmark described on <a href="https://www.rondb.com/benchmark-rondb-the-fastest-key-value-store-on-the-cloud" target="_blank">www.rondb.com</a> where we show throughput and latency of YCSB both in normal operations as well as during recovery.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-46183075626755540722023-03-23T12:42:00.003+01:002023-03-23T12:42:30.210+01:00Laptop vs Desktop for RonDB development<p> Most developers today use laptops for all development work. For the last 15 years I have considered desktops and laptops to be very similar in performance and use cases. This is no longer the case as I will discuss in this blog.</p><p>Personally I use a mix of laptops and desktops. For me the most important thing as a developer is the screen resolution and the speed of compilation. But I have now found that desktops can be very useful for the test phase of a development project, in particular the later testing phases.</p><p>Many years ago I read that one can increase productivity by 30% by using a screen with higher resolution thus fitting more things at the same time on the screen. Personally I have so far found 27" screens to be the best size, larger size means neck pain and smaller means that productivity suffers. The screen resolution should be as high as your budget allows.</p><p>My experience is that modern laptops can be quite efficient in compilation. There is very little difference in compilation time towards desktops.</p><p>However recently I tested running our new <a href="http://www.rondb.com" target="_blank">RonDB</a> <a href="https://github.com/logicalclocks/rondb-docker" target="_blank">Docker</a> clusters on laptops and desktops. What I have seen is that the performance of these tests can differ up to 4x.</p><p>I think the reason for this large difference is that desktops can sustain high performance for a long time. Some modern desktops can handle CPUs that use more than 200W whereas most laptops will be limited to about 45W. For a compilation that only runs for about 5 minutes and have some serialisation the difference becomes very small. The most important part for compilation is how fast the CPU is on single-threaded performance and that it can scale the compilation to a decent number of CPUs.</p><p>However running a development environment for RonDB means running a cluster on a single machine where there are two data node processes, two MySQL server processes and a management process and of course any number of application processes. A laptop can handle this nicely and the performance for a single-threaded application is the same for laptop and desktop. However when scaling the test to many threads the laptop hits a wall whereas the desktop simply continues to scale.</p><p>The reason is twofold, the desktop CPUs can have more CPU cores. Most high-end laptops today have around 8-10 CPU cores. The high-end desktops today however goes to around 16-24 CPU cores. In addition the desktop can usually handle more than 4x as much power. The power difference and the core difference delivers a 4x higher throughput in heavy testing.</p><p>Thus my conclusion is that laptops are good enough for the development phase together with an external screen. However when you enter the testing phase when you need to run heavy functional tests and load tests on your software a desktop or a workstation will be very useful.</p><p>In my tests on a high-end desktop I ran a Sysbench OLTP RW benchmark using the RonDB Docker environment, I managed to run up to 15.000 TPS. This means running 300.000 SQL queries per second towards the MySQL servers and the data nodes. The laptop could handle roughly 25% of this throughput.</p><p>Obviously the desktop could be a virtual desktop in the modern development environment. But a physical machine is still a lot more fun.</p><p>RonDB is part of the <a href="http://www.hopsworks.ai" target="_blank">Hopsworks</a> Feature Store platform.</p><p><br /></p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-84638816341708182552023-03-02T18:09:00.001+01:002023-03-02T18:09:48.770+01:003 commands to start a RonDB cluster<p> RonDB is a key-value store with SQL capabilities. We are working on making it really easy to develop applications against RonDB. You can now get a RonDB cluster up and running using 3 commands on your development machine assuming you have Docker installed there.</p><p>Here are the commands:</p><p>1. git clone https://github.com/logicalclocks/rondb-docker rondb-docker</p><p>2. cd rondb-docker</p><p>3. ./run.sh</p><p>Prerequisites is that you have git installed and Docker or Docker Desktop. Using Docker Desktop and a new Resource Extension one can see the usage of the various containers in both memory and CPU usage. Using it on Windows also requires WSL 2 to be installed.</p><p>If you are using Windows it is important that you have set it to use WSL 2 as the engine. One might also have to activate WSL 2 integration with the Linux distribution you are using in the WSL 2. Both of those can be set from the Docker Desktop settings pages. One need to start a new Linux terminal after changing those settings before it actually works.</p><p>When trying it on Windows 11 it has worked like a charm for me. But trying it on Windows 10 had issues with firewalls preventing the MySQL Server to start. Feel free to post comments to this blog if you found issues and workarounds for those.</p><p>The run.sh command will create the docker image by pulling it from DockerHub. It is a download of a several hundred MBytes, so the time takes depends on the speed of your interconnection. Next it starts a RonDB cluster with 1 MGM server, 2 MySQL Server and 2 Data nodes.</p><p>When it started you can access the MySQL Servers on port 15000 and 15001 using a normal MySQL client or the application you are developing.</p><p>To access the MySQL Servers you can run the below command using a MySQL client.</p><p>mysql --protocol=tcp --user=mysql --host=localhost --port=15000 -p</p><p>Enter the password Abc123?e and you are connected to the MySQL Server and can use it as a normal MySQL client connected to a MySQL Cluster. The mysql user have full access to the ycsb% databases, the sbtest% databases, sysbench% databases and the dbt% databases.</p><p>You can enter the docker containers in the normal manner using</p><p>docker exec -it docker_id /bin/bash</p><p>You find the docker_id using the docker ps command.</p><p>You can use the run.sh script to create the RonDB cluster of your choice. It has 5 predefined profiles (mini, small, medium, large, xlarge). All profiles have the same nodes except mini which only creates 1 MySQL Server and 1 data node.</p><p>We have tested this using Docker Desktop on Mac OS X, Docker Desktop on Windows using WSL 2 and using Docker on Linux. So most developers should be able to try it out in their environment of choice. </p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-76649935774471621502023-01-10T19:12:00.000+01:002023-01-10T19:12:26.025+01:00The flagship feature in the new LTS version RonDB 22.10.0<p> In RonDB 22.10.0 we added a new major feature to RonDB. This feature means that variable sized disk columns in RonDB are stored in variable sized rows instead of using fixed size rows.</p><p>The history of disk data in RonDB starts already in 2004 when the NDB team at Ericsson had been acquired by MySQL AB. NDB Cluster was originally designed as an in-memory DBMS. The reason for this was based on that a disk based DBMS couldn't handle the latency requirements in telecom applications.</p><p>Thus NDB was developed using a distributed architecture using Network Durability (meaning that a transaction is made durable by writing the transaction into memory in several computers in a network). Long-term durability of data is achieved by a background process ensuring that data is written to disk.</p><p>When the NDB team joined MySQL we looked for many other application categories as well and thus increasing the database sizes NDB could handle was seen as important. Thus we started on developing support for disk-based columns. The design decisions of this design was accepted as a <a href="https://www.semanticscholar.org/paper/Recovery-Principles-in-MySQL-Cluster-5.1-Ronstr%C3%B6m-Oreland/bb294f18c25c877a453b14e80b40b56707753592" target="_blank">paper at VLDB in Trondheim in 2005</a>.</p><p>The use of this feature didn't really take off in any significant manner for a few years since the latency of hard drives and also the performance of hard drives made it too different from the performance of in-memory data.</p><p>That problem has been solved by technology development of SSDs and with the introduction of NVMe drives and newer versions of PCI Express 3,4 and now 5. As an anecdote I installed a set of NVMe drives on my workstation capable of handling millions of IOPS and able to deliver 66 GBytes per second of bandwidth to these NVMe drives. However while installing I discovered that I had only 1 memory card which meant that I had 3x more bandwidth to my NVMe drives compared to my memory bandwidth. So in order to make use of those NVMe drives I had to install a number of memory cards to get the required memory bandwidth to handle those NVMe drives.</p><p>So with the introduction of NVMe drives the feature became useful, actually one of the main users of this feature is HopsFS, a distributed file system in the Hopsworks platform which uses RonDB for metadata management. HopsFS can use disk columns in RonDB for storing small files.</p><p>Performance of disk columns is really good. This <a href="https://mikaelronstrom.blogspot.com/2020/10/ycsb-disk-data-benchmark-with-ndb.html" target="_blank">blog</a> presents a benchmark with YCSB using disk-based columns in NDB Cluster. We get a bandwidth of more than 1 GByte per second of application data read and written.</p><p>The latency on NVMe drives is 100x lower than on hard drives. This means that previously latency on hard drives was a lot more than 100x higher than in-memory latency for database operations. With modern NVMe drives the difference on latency between in-memory columns and disk columns is down to a factor of 2. We analysed performance and latency using the YCSB benchmark and compared it to in-memory columns in this <a href="https://www.rondb.com/benchmark-rondb-the-fastest-key-value-store-on-the-cloud" target="_blank">blog</a>.</p><p>One problem with the original implementation is that the disk columns was always stored in fixed size rows. In HopsFS we found ways to handle this by using multiple tables for different row sizes.</p><p>In a traditional application and in the Feature Store it is very common to store data in variable sized columns. To ensure that the data fits the maximum size of the column can be 10x higher than the average size of the column. Thus we can easily waste 90% of the disk space. This means that to use disk columns in Feature Store applications we have to enable support of variable sized rows on disk.</p><p>Thus with the release of the new LTS RonDB version 22.10.0 the disk columns is now as useful as the in-memory columns. They have excellent performance, the latency is very good, even better than the in-memory latency of some competitors and the storage efficiency is now high as well.</p><p>This means that with RonDB 22.10.0 we can handle nodes with TBytes of in-memory and many tens of TBytes of disk columns. Thus RonDB can scale all the way up to handling database sizes up to the petabyte level with latency of read and write operations in less than a millisecond.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-21057637748178620082023-01-10T17:41:00.001+01:002023-01-10T17:41:20.900+01:00Summary of RonDB 21.04.9 changes<p> RonDB 21.04 main use case is being the base of the data management platform in Hopsworks. As such every now and then some new requirements on RonDB emerges. But obviously the most important feature of development of RonDB 21.04 is on stability.</p><p>Hopsworks provides a free Serverless use case to try out the Hopsworks platform. Check it out on <a href="https://app.hopsworks.ai">https://app.hopsworks.ai</a>. Each user gets their own database in RonDB and can create a number of tables. Then one can load data from various sources using the OnlineFS (a feature using Kafka and ClusterJ to load data from external sources into Feature Groups, a Feature Group is a table in RonDB).</p><p>Previously ClusterJ was limited to using only one database per cluster connection which led to a lot of unnecessary connect and disconnect of connections to the RonDB cluster. In RonDB 21.04.9 it is now possible for one cluster connection to use any number of databases.</p><p>In addition we did a few changes to RonDB to make it easier to manage RonDB in our managed platform.</p><p>In preparation for releasing Hopsworks 3.1 which includes RonDB 21.04.9 we extended the tests for the Hopsworks platform, among other things for HopsFS, a distributed file system that uses RonDB to store metadata and small files. We fixed all issues found in these extended tests and any other problems found in the last couple of months.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-79353535310834793242023-01-09T18:02:00.006+01:002023-01-09T18:13:56.937+01:00RonDB News<p> The RonDB team has been busy in development in 2022. Now is the time to start releasing things. There are 5 things that we are planning to release in Q1 2023.</p><p>RonDB 21.04.9: A new version of RonDB with a few new features required by the Hopsworks 3.1 release and a number of bug fixes. This is released today and will be described in a separate blog.</p><p>RonDB 22.10.0: This is a new Long-Term Support version (LTS) that will be maintained until 2025 at least. It is also released today. It has a number of new features on top of RonDB 21.04 of which the most important one is supporting variable sized disk columns which makes it much more interesting to use RonDB with large data sets. More on this feature in a separate blog post.</p><p>In addition RonDB 22.10.0 is updated to be based on MySQL 8.0.31, RonDB 21.04 is based on MySQL 8.0.23. I will post a separate blog more about the content of RonDB 22.10.0.</p><p>The release content is shown in detail in the release notes and new features chapters in the <a href="https://docs.rondb.com" target="_blank">RonDB docs</a>.</p><p>We are going to release very soon a completely revamped version of RonDB Docker using Docker Compose. This is intended to support developing applications on top of RonDB in your local development environment. This is used by RonDB developers to develop new features in RonDB, but is also very useful to develop any type of applications on top of RonDB using any of the numerous APIs by which you can connect to RonDB.</p><p>We are also close to finishing up the first version of our completely new RonDB REST API that will have the possibility to issue REST API requests towards RonDB as well as the same queries using gRPC calls. In the first version it will support primary key lookups and batched key lookups. Batched key lookups are very useful in some Feature Store applications where it is necessary to read hundreds of rows in RonDB for ranking query results. Our plan is to further develop this REST API service such that it can also be used efficiently in multi- tenant setups enabling the use of RonDB in Serverless applications.</p><p>Finally we have completed the development phase and test phase of RonDB Reconfiguration in Hopsworks cloud using AWS. Hopsworks cloud is implemented using Amplify in AWS. So the Hopsworks cloud service is handled by Amplify even if the actual Hopsworks cluster is running in GCP or Azure. RonDB Reconfiguration means that you can start creating a Hopsworks cluster with 2 Data node VMs with 8 VCPUs and 64 GB of memory with 2 MySQL Server VMs using 16 VCPUs. When you see that this cluster is required to grow you can simply tell the Hopsworks UI that you want e.g. 3 Data node VMs with 16 VCPUs and 128 GB of memory each and 3 MySQL Server VMs with 32 VCPUs each. The Hopsworks cloud service will then reconfigure the cluster as an online operation. No downtime will happen during the reconfiguration. There might be some queries that gets temporary errors, but those can simply be retried.</p><p>The Hopsworks cloud applications uses virtual service names through Consul, this means that the services using the MySQL service will automatically use the new MySQL Servers as they come online and will use the MySQL servers in a round-robin fashion.</p><p>It is possible to scale data node VM sizes upwards, we currently don't support scaling sizes downwards. It is possible to scale up and down the number of replicas between 1 and 3. The number of MySQL Servers can be increased by one and decreased and the size of the MySQL Server VMs can go both upwards and downwards. At the moment we don't allow adding more Node Groups of data nodes as an online operation. This requires an offline change.</p><p>This reconfiguration feature is going to be integrated into Hopsworks cloud in the near future.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-44970358062055182302022-07-29T12:25:00.000+02:002022-07-29T12:25:38.052+02:00The world's first LATS benchmark<p> LATS stands for low Latency, high Availability, high Throughput and scalable Storage. When testing an OLTP DBMS it is important to look at all those aspects. This means that the benchmark should test how the DBMS works in scenarios where data fits in memory, where data doesn't fit in memory. In addition tests should run measuring both throughput and latency. Finally it isn't enough to run the benchmarks while the DBMS operates in normal operation. There should also be tests that verify the performance when node fails and when nodes rejoin the cluster.</p><p>We have executed a battery of tests using RonDB, a key value store with SQL capabilities that makes it a complete LATS benchmark. We used the Yahoo! Cloud Serving Benchmark (YCSB) for this. These benchmark were executed using Amazon EC2 servers with 16 VCPUs, 122 GBytes, 2x1900 GB NVMe drives and with up to 10 Gb Ethernet. These virtual machines were used both for in-memory tests and tests of on-disk performance.</p><p><a href="https://www.rondb.com/benchmark-rondb-the-fastest-key-value-store-on-the-cloud" target="_blank">Link to full benchmark presentation</a>. The full benchmark presentation contains lots of graphs and a lot more detail about the benchmarks. Here I will present a short summary of the results we saw.<br /></p><p>YCSB contains 6 different workloads and there were tests of all 6 workloads in different aspects. In most workloads the average latency is around 600 microseconds and 99 percentile is usually around 1 millisecond and almost always below 2 milliseconds.</p><p>The availability tests starts by shutting down one of the RonDB data nodes. The ongoing transactions that are affected by this node failure will see a latency of up to a few seconds since node failure handling requires the transaction state to be rebuilt to decide if the transaction should be committed or aborted. New transactions can start as soon as the cluster has been reconfigured to remove the failed node. After discovering the node failures this reconfiguration only takes about one millisecond. The time to discover depends on how the failure occurs, if the failure is a software failure in the data node it will be discovered by the OS and the discovery is immediate since the connection is broken. If there is a HW failure this could lead to the heartbeat mechanism discovering the failure. The time to discover failures using heartbeats depends on the configured heartbeat interval.</p><p>After the node failure has been handled the throughput decreases around 10% and latency goes up by about 30%. The main reason for this is that we now have less data nodes to serve the reads. The impact will be higher with 2 replicas compared to using 3 replicas. When the recovery reaches the synchronisation phase where the starting node is synchronising its data with the live data nodes sees a minor decrease of throughput which actually leads to a shorter latency. Finally when the process is completed and the starting node can serve reads again the throughput and latency returns to normal levels.</p><p>Thus it is clear from those numbers that one should design the RonDB clusters with a small amount of extra capacity to handle node recovery, but it is not very high, a bit more if using 2 replicas compared to when using 3 replicas.</p><p>Performance when data doesn't fit in memory decreases significantly. The reason for this is that it is limited by how many IOPS the NVMe drives can sustain. We have done similar experiments a few years ago and saw that RonDB performance can scale to even 8 NVMe drives and handle read and write workloads of more than a GByte per second using YCSB. The HW development of NVMe drives is even faster than for CPUs, so this bottleneck is likely to diminish as HW development proceeds.</p><p>The latency for reads is higher, but the update latency is substantially higher for on-disk storage. The update latency at high throughput reaches up to 10 milliseconds. We expect latency and throughput to improve substantially using the new generation of VMs using substantially improved NVMe drives. It will be even more interesting to see how this performance improves when moving to PCI Express 5.0 NVMe drives.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-47792342134971055912022-07-28T15:49:00.001+02:002022-07-28T15:49:43.110+02:00New stable release of RonDB, RonDB 21.04.8<p>Today we released a new version of RonDB 21.04, the stable release series of RonDB. RonDB 21.04.8 fixes a few critical bugs and two new features. <a href="https://docs.rondb.com/release_notes_21048/">See the docs for more details of this released version</a>.</p><h2 style="text-align: left;">Make it possible to use IPv4 sockets between ndbmtd and API nodes</h2><p>In MySQL NDB Cluster all sockets have been converted to use IPv6 format even when IPv4 sockets are used. This led to MySQL NDB Cluster no longer being able to interact with device drivers that only works using IPv4 sockets. This is the case for <a href="http://www.dolphinics.no" target="_blank">Dolphin SuperSockets</a>.</p><p>Dolphin SuperSockets makes it possible to use extreme low latency HW in connecting the nodes in a cluster to improve latency significantly. This feature makes it possible for RonDB 21.04.8 to make use of interconnect cards from Dolphin using the Dolphin SuperSockets. RonDB has been tested and benchmarked using Dolphin SuperSockets. We will soon release a benchmark report of this.</p><h2 style="text-align: left;">Two new ndbinfo tables to check memory usage</h2><div>RonDB is now used by <a href="http://app.hopsworks.ai" target="_blank">app.hopsworks.ai</a>, a Serverless Feature Store. This means that thousands of users can share RonDB. To ensure this multi-tenant usage of RonDB is working we have introduced two new ndbinfo tables that makes it possible to track exactly how much memory a specific user is using. A user in Hopsworks is mapped to a project and a project uses its own database in RonDB. Thus those two new tables makes it possible to implement quotas both on user level and on Feature Group level.</div><p>Two new ndbinfo tables are created, ndb$table_map and ndb$table_memory_usage. The ndb$table_memory_usage lists four properties for all table replicas, in_memory_bytes (the number of bytes used by a table fragment replica in DataMemory), free_in_memory_bytes (the number of bytes free of the previous, these bytes are always in the variable sized part), disk_memory_bytes (the number of bytes in the disk columns, essentially the number of extents allocated to the table fragment replica times the size of the extents in the tablespace), free_disk_memory_bytes (number of bytes free in the disk memory for disk columns).</p><p>Since each table fragment replica provides one row we will use a GROUP BY on table id and fragment id and the MAX of those columns to ensure we only have one row per table fragment.</p><p>We want to provide the memory usage in-memory and in disk memory per table or per database. However a table in RonDB is spread out in several tables. There are four places a table can use memory. First the table itself uses memory for rows and for a hash index, when disk columns are used this table also makes use of disk memory. Second there are ordered indexes that use memory for the index information. Thirdly there are unique indexes that use memory for rows in the unique index (a unique index is simply a table with unique key as primary key and primary key as columns) and the hash index for the unique index table. This table is not necessarily colocated with the table itself. Finally there is also BLOB tables that can contain hash index, row storage and even disk memory usage.</p><p>The user isn't particularly interested in this level of detail, so we want to display information about memory usage for tables and databases that the user sees. Thus we have to gather data for this, the tool to gather the data is the new ndbinfo table ndb$table_map, this table lists the table name and database name provided the table id, the table id can be the table id of a table, an ordered index, a unique index or a BLOB table, but will always present the name of the actual table defined by the user, not the name of the index table or BLOB table.</p><p>Using those two tables we create two ndbinfo views, the table_memory_usage listing the database name and table name and the above 4 properties for each table in the cluster. The second view, database\_memory\_usage lists the database name and the 4 properties summed over all table fragments in all tables created by RonDB for the user based on the BLOBs and indexes.</p><p>To make things a bit more efficient we keep track of all ordered indexes attached to a table internally in RonDB. Thus ndb$table_memory_usage will list memory usage of tables plus the ordered indexes on the table, there will be no rows presenting memory usage of an ordered index.</p><p>These two tables makes it easy for users to see how much memory they are using in a certain table or database. This is useful in managing a RonDB cluster.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-9289120650085987972022-04-23T12:29:00.000+02:002022-04-23T12:29:53.697+02:00Variable sized disk rows in RonDB<p> RonDB was a pure in-memory database engine in its origin. The main reason for this was to support low latency applications in the telecom business. However already in 2005 we presented a design at VLDB in Trondheim for the introduction of columns stored on disk. These columns cannot be indexed, but is very suitable for columns with large sizes.</p><p>RonDB is currently targeting Feature Store applications. These applications often access data through a set of primary key lookups where each row can have hundreds of columns with varying size.</p><p>In RonDB 21.04 the support for disk columns uses a fixed size disk row. This works very well to support handling small files in HopsFS. HopsFS is a distributed file system that can handle petabytes of storage in an efficient manner. On top of it Hopsworks build the offline Feature Store applications.</p><p>The small files are stored in a set of fixed size rows in RonDB with suitable sizes. YCSB benchmarks have shown that RonDB can handle writes of up to several GBytes per second. Thus the disk implementation of RonDB is very efficient.</p><p>Applications using the online Feature Store will however store much of its data in variable sized columns. These work perfectly well in the in-memory columns. They work also in the disk columns in RonDB 21.04. However to make storage more efficient we are designing a new version of RonDB where the row parts on disk are stored on variable sized disk pages.</p><p>These pages use the same data structure as the in-memory variable sized pages. So the new format only affects handling free space, handling of recovery. This design has now reached a state where it is passing our functional test suites. We will still add more tests, perform system tests and search for even more problems before we release for production usage.</p><p>One interesting challenge that can happen with a variable sized rows is that one might have to use more space in a data page. If this space isn't available we have to find a new page where space is available. It becomes an interesting challenge when taking into account that we can abort operations on a row while still committing other operations on the same row. The conclusion here is that one can never release any allocated resources until you fully commit or fully abort the transaction.</p><p>This type of challenge is one reason why it is so interesting to work with the internals of a distributed database engine. After 30 years of education, development and support, there are still new interesting challenges to handle.</p><p>Another challenge we faced was that we need to page in multiple data pages to handle an operation on the row. This means that we have to ensure that while paging in one data page, that other pages that we already paged in won't be paged out before we have completed our work on the row. This work also prepares the stage for handling rows that span over multiple disk pages. RonDB already supports rows that span multiple in-memory pages and one disk page.</p><p>If you want to learn more about RonDB requirements, LATS properties, use cases and internal algorithms, join us on Monday <a href="https://db.cs.cmu.edu/events/vaccination-2022-rondb-a-key-value-store-with-sql-capabilities-and-lats-properties-mikael-ronstrom/" target="_blank">CMU Vaccin database presentation</a>. Managed RonDB is supported on AWS, Azure and GCP and on-prem.</p><p>If you like to join the effort to develop RonDB and a managed RonDB version we have open positions at Hopsworks AB. Contact me at LinkedIn if you are interested.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-47023368270902081772022-01-31T16:47:00.003+01:002022-01-31T16:47:52.256+01:00RonDB receives ARM64 support and large transaction support<p> RonDB is the base platform for all applications in Hopsworks. Hopsworks is a machine learning platform featuring a Feature Store that can be used in online applications as well as offline applications.</p><p>This means that RonDB development is driven towards ensuring that operating RonDB in this environment is the best possible.</p><p>RonDB is designed for millions of small transactions reading and writing data. However occasionally applications perform rather large transactions. Previous versions of RonDB had some weaknesses in this area. The new versions of RonDB now supports also large transactions although the focus is still on many smaller transactions.</p><p>Designing this new support of large transactions required a fairly large development effort. To do this in a stable release is a challenge, therefore it was decided to combine this effort with a heavy testing period focused on fixing bugs.</p><p>This effort has been focused on achieving three objectives. First to stabilise the new RonDB 21.04 releases which is the stable release of RonDB. Second, to stabilise the next RonDB release at the same level as RonDB 21.04. Third, we also wanted the same level of support for ARM64 machines.</p><p>We are now proud to release RonDB 21.04.3, a new stable release of RonDB that supports much larger transactions. Since the release of RonDB 21.04.1 in July 2021 we have fixed more than 50 bugs in RonDB and we are very satisfied with the stability also on ARM64 machines.</p><p>The original plan was to release the next version of RonDB in October 2021, however we didn't want to release a new version with any less stability than the RonDB 21.04 release. Thus instead we release this new version of RonDB now, RonDB 22.01.0.</p><p>ARM64 support covers both RonDB 21.04.3 and RonDB 22.01.0. RonDB is now also supported on both Linux and Mac OS X and on Windows it is supported using WSL 2 (Linux on Windows) on Windows 11. We have extensively tested RonDB on the following platforms:</p><p></p><ol style="text-align: left;"><li>Mac OS X 11.6 x86_64</li><li>Mac OS X 12.2 ARM64</li><li>Windows WSL 2 Ubuntu x86_64</li><li>Ubuntu 21.04 x86_64</li><li>Oracle Linux 8 Cloud Developer version ARM64</li></ol><p></p><p>It is used in production on AWS and Azure and has been extensively tested also on GCP and Oracle Cloud.</p><p>As part of the new RonDB release we have also updated the documentation of RonDB at <a href="http://docs.rondb.com">docs.rondb.com</a>. Among other things it contains a new section on Contributing to RonDB that shows how you can build, test and develop extensions to RonDB. In the documentation you will also find an extensive list of the improvements made in the two new RonDB releases.</p><p>ARM64 support is still in beta phase, our plan is to make it available for production use in Q2 2022. There are no known bugs, but we want to give it a bit more time before we assign it to production workloads. This includes adding more test machines and also performing benchmarks on ARM64 VMs.</p><p>Our experience with ARM64 machines so far says that it is fairly stable, but it isn't yet at the same level as x86, it is possible to find bugs in the compilers, the support around it is however maturing very quickly and not surprising the support on Mac OS X is here leading the way since Mac OS X has fully committed its future on ARM. We have also great help of participating in the OCI ARM Accelerator program providing access to ARM VMs in the Oracle Cloud making it possible to test on Oracle Linux using ARM with both small and large VMs.</p><p>RonDB 22.01.0 comes with a set of new features:</p><p style="text-align: left;"></p><ol style="text-align: left;"><li>Now possible to scale reads using locks onto more threads</li><li>Improved placement of primary replicas to enable</li><li>All major memory areas now managed by global memory manager</li><li>Even more flexibility in thread configurations</li><li>Removing a scalability hog in index statistics handling</li><li>Merged with MySQL Cluster 8.0.28</li></ol><p></p><p>You can either download RonDB tarballs from <a href="https://github.com/logicalclocks/rondb">https://github.com/logicalclocks/rondb</a> or from <a href="https://repo.hops.works/master">https://repo.hops.works/master</a>, for exact links to the various versions of the binary tarballs see <a href="https://docs.rondb.com" target="_blank">Release Notes</a> on each version.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-59024077286916553952021-12-22T23:37:00.001+01:002021-12-22T23:38:03.694+01:00Merry Christmas from the RonDB team<p> This year we bring a new christmas present in the form of a new release of RonDB 21.04.</p><p>It is packed with improvements, our focus has been on extending support for more platforms while at the same time increasing the quality of RonDB.</p><p>Normally the RonDB 21.04.2 would have been released in October 2021. However we had a number of support issues where we had crashes due to running very large transactions. RonDB is designed for OLTP with small to moderate sizes of transactions. However some applications makes use of foreign keys that use ON DELETE CASCADE or ON UPDATE CASCADE and these transactions can easily become hundreds of thousands of operations.</p><p>This meant changing the handling of transactions, since this was a rather large change in a stable release we wanted to ensure that we didn't introduce any quality issues. We used this opportunity to make an extensive effort in fixing all sorts of other bugs at the same time.</p><p>The new RonDB release have been tested with transaction sizes up to a number of million row operations in one transaction. We still recommend to keep transaction sizes at moderate levels since very large transactions will make heavy use of CPU and memory resources during commit and abort processing. In addition very large transactions will lock large parts of the database, thus making it more difficult for other transactions. Generally an OLTP database behaves much better if transaction sizes are kept small.</p><p>RonDB development is very much focused on supporting cloud operations. This means that our focus is on supporting Linux for production installations. Quite a few cloud vendors are now supporting ARM64 VMs in addition to the traditional Intel and AMD x86 VMs. Also Apple released a set of new ARM64 laptops lately.</p><p>Our development platform is both Mac OS X and Linux, thus it makes sense to also release RonDB on Mac OS X.</p><p>Thus we took the opportunity in RonDB 21.04.2 to provide support for ARM64 as a new platform to use for RonDB. This support covers both Linux and Mac OS X. The ARM64 support is still in beta state.</p><p>In addition we test RonDB extensively on Windows using WSL 2, the Windows subsystem to run Linux on top of Windows. Thus our Linux tarballs should work just fine to test also on Windows platforms through WSL 2.</p><p>RonDB 21.04.2 contains a large set of bug fixes that can be found in details in the RonDB documentation at <a href="https://docs.rondb.com">https://docs.rondb.com</a>. With these changes RonDB 21.04 contains around 100 bug fixes on top of the stable release of MySQL NDB Cluster 8.0.23 and around 15 new features.</p><p>Even more releases are developed in RonDB 21.10 and upcoming new versions of RonDB. These versions will be released when they are ready for more general consumption, but the development can be tracked on <a href="https://github.com/logicalclocks/rondb">RonDB's git</a>. If you want early access to the binary tarballs of RonDB 21.04.2 you can visit the <a href="https://github.com/logicalclocks/rondb" target="_blank">git homepage of RonDB</a></p><p>Early next year we will return with benchmarks of RonDB that shows all the qualities of a LATS database. These benchmarks will show all four qualities of a LATS database, thus low L(atency), high A(vailability), high T(hroughput) and S(calable storage).</p><p>So finally a Merry Christmas and a Happy New Year from the RonDB team.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-76226404769917117152021-10-20T19:10:00.000+02:002021-10-20T19:10:37.174+02:00Running Linux in Windows 11<p> Most people that know me, also knows how I really don't like working with Windows. I actually worked with Microsoft products already in the 1980s, so I haven't been able to stay away from them completely.</p><p>Developing MySQL on a Windows platform haven't been so easy since building MySQL on Windows is difficult and also there are a number of issues in running even the simple test suites of MySQL.</p><p>At the same time the world is full of developers with only Windows as their development platform. So when I read about the possibility to run Linux inside Windows 11 I thought it was time to test if development of RonDB could now be done on Windows.</p><p>My 9-year old laptop needed replacement, so I went ahead and found a nice Windows laptop at a fair price. I wanted to test the very latest developments that I could find in Windows 11. Since the laptop was delivered with Windows 10 I did an unusual thing, I wanted to upgrade the OS :)</p><p>This wasn't exactly easy, took a few hours, but eventually after upgrading Windows 10 a few times and searching for a Windows 11 download I eventually found such a one using Google on the Microsoft website. After a couple of upgrades I was at the latest Windows 11 release.</p><p>Installing Linux was quite easy, it was one simple command. I installed an Ubuntu variant.</p><p>Most of the installation went ok. The graphics installation didn't work, but the installation of terminal software was good enough to test at least the testing part. For development I use graphical Vim, so this needs to wait for a working version of the graphical parts of Linux (or using a Windows editor, but likely not since they tend to add extra line feeds on the line).</p><p>Downloading the RonDB git tree went fine. Compiling RonDB required installing a number of extra packages, but that is normal and will happen also in standard Linux (build essentials, cmake, openssl dev, ncurses, bison, ..).</p><p>Running the MTR test suite also went almost fine. I had to install zip and unzip as well for this to work.</p><p>Running the MTR test suite takes about an hour and here I found a few challenges. First I had to find the parallelism it could survive. I was hoping on parallelism of 12 (which in reality for RonDB means 6 parallel tests running). But in reality it was only stable with a parallelism of 6.</p><p>However since I wasn't sitting by the Windows laptop while the test was running the screen saver decided to interfere (although I had configured it to go into screen save mode after 4 hours). Unfortunately the screen saver decided that the Linux VM should be put to sleep and this meant that all test cases running when the screen saver hit in failed. Seems like this is a new issue in WSL 2 not existing in WSL 1.</p><p>However I think that I am still happy with what I saw. Running an operating system inside another and making it feel like Linux is a part of Windows isn't an easy task. So here I must give some kudos to the development team. So if they continue working on this integration I think that I am going to get good use of my new laptop.</p><p>I must admit that I don't fully understand how they have solved the issue of running Linux inside Windows. But it definitely looks like the Linux kernel makes use of Windows services to implement the Linux services. Running top in an idle system is interesting, there is only a few init processes and a bash process. So obviously all the Linux kernel processes are missing and presumably implemented inside Windows in some fashion.</p><p>The best part is that the Linux VM configuration isn't static. The Linux VM could make use of all 16 CPUs in the laptop, but it could also allow Windows to grab most of them. So obviously the scheduler can handle both Linux and Windows programs.</p><p>Memory-wise the Linux VM defaults to being able to grow to a maximum of 80% of the memory in the laptop. However in my case top in Linux constantly stated that it saw 13.5 GB of memory in a machine with 32 GB of memory. I saw some post on internet stating that Linux can return memory to Windows if it is no longer needed. Not sure I saw this, but it is a new feature, so feel confident it will be there eventually.</p><p>So at least working with RonDB on Windows 11 is going to be possible. How exactly this will pan out I will write about in future worklogs. At least it is now possible that I can do some development in Windows, it was more than 30 years ago I last had a development machine with a Microsoft OS, so to me, Linux on Windows is definitely making Windows as a platform a lot more interesting.</p><p>My development environments have shifted a bit over the years. It started with a mix of Microsoft OSs and Unix and some proprietary OSs in the 80s. In the 90s I was mostly working in Solaris on Sun workstations. Early 2000's I was working with Linux as development machine. But since 2003 I have been working mostly on Mac OS X (and of course lots of test machines on all sorts of platforms).</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-42243542089453831122021-09-28T13:05:00.005+02:002021-09-28T13:39:26.281+02:00Memory Management in RonDB<p> Most of the memory allocated in <a href="https://www.rondb.com" target="_blank">RonDB</a> is handled by the global memory manager. Exceptions are architecture objects and some fixed size data structures. In this presentation we will focus on the parts handled by the global memory manager.</p><p>In the global memory manager we have 13 different memory regions as shown in the figure below:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtHSDqYSLgLvJILxkQQ5BXKQwKfDj03XDDeI4fya_laqUszy1Virg5YHPDLbhzgP-ihwNtFwaTBPFaOl7dQpoDI9NztvPnvfSZP0USf3X0o-UdsCNmbmlXiKupWJr_WFcXyomt/s1920/memory_arch_rondb.001.jpeg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1080" data-original-width="1920" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtHSDqYSLgLvJILxkQQ5BXKQwKfDj03XDDeI4fya_laqUszy1Virg5YHPDLbhzgP-ihwNtFwaTBPFaOl7dQpoDI9NztvPnvfSZP0USf3X0o-UdsCNmbmlXiKupWJr_WFcXyomt/w561-h360/memory_arch_rondb.001.jpeg" width="561" /></a></div><br /><p><br /></p><p>- DataMemory</p><p>- DiskPageBufferCache</p><p>- RedoBuffer</p><p>- UndoBuffer</p><p>- JobBuffer</p><p>- SendBuffers</p><p>- BackupSchemaMemory</p><p>- TransactionMemory</p><p>- ReplicationMemory</p><p>- SchemaMemory</p><p>- SchemaTransactionMemory</p><p>- QueryMemory</p><p>- DiskOperationRecords</p><p>One could divide those regions into a set of qualities. We have a set of regions that are fixed in size, another set of regions are critical and cannot handle failure to allocate memory, a set of regions have no natural upper limit and are unlimited in size, there is also a set of regions that are flexible in size that can work together to achieve the best use of memory. We can also divide regions based on whether the memory is short term or long term. Each region can belong to multiple categories.</p><p>To handle these qualities of the regions we have priorities on each memory region, this priority can be affected by the amount of memory that the resource has allocated.</p><p>Fixed regions have a fixed size, this is used for database objects, the Redo log Buffer, the Undo log buffer, the DataMemory and the DiskPageBufferCache (the page cache for disk pages). There is code to ensure that we queue up when those resources are no longer available. DataMemory is a bit special and we will describe it separately below.</p><p>Critical regions are regions where a request to allocate memory would cause a crash. This relates to the job buffer which is used for internal messages inside a node, it also relates to send buffers which are used for messages to other nodes. DataMemory is a critical region during recovery, if we fail to allocate memory for database objects during recovery we would not be able to recover the database. Thus DataMemory is a critical region in the startup phase, but not during normal operation. DiskOperationRecords are also a critical resource since otherwise we cannot maintain the disk data columns. Finally we also treat BackupSchemaMemory as critical since not being able to perform a backup would make it very hard to manage RonDB.</p><p>Unlimited regions have no natural upper limit, thus as long as memory is available at the right priority level, the memory region can continue to grow. The regions in this category is BackupSchemaMemory, QueryMemory and SchemaTransactionMemory. QueryMemory is memory used to handle complex SQL queries such as large join queries. SchemaTransactionMemory can grow indefinitely, but the meta data operations try avoid growing too big.</p><p>Flexible regions are regions that can grow indefinitely but that have to set limits on its own growth to ensure that other flexible regions are also allowed to grow. Thus one flexible resource isn't allowed to grab all the shared memory resources. There are limits to how much memory a resource can grab before its priority is significantly lowered.</p><p>Flexible regions are TransactionMemory, ReplicationMemory, SchemaMemory, QueryMemory, SchemaTransactionMemory, SendBuffers, BackupSchemaMemory, DiskOperationRecords, </p><p>Finally we have short term versus long term memory regions. A short term memory region allocation is of smaller signifance compared to a long term memory region. In particular this relates to SchemaMemory. SchemaMemory contains metadata about tables, indexes, columns, triggers, foreign keys and so forth. This memory once allocated will stay for a very long time. Thus if we allow it to grow too much into the shared memory we will not have space to handle large transactions that require TransactionMemory.</p><p>Each region has a reserved space, a maximum space and a priority. In some cases a region can also have a limit where its priority is lowered.</p><p>4% of the shared global memory is only accessible to the highest priority regions plus half of the reserved space for job buffers and communication buffers.</p><p>10% of the shared global memory is only available to high prio requesters. The remainder of the shared global memory is accessible to all memory regions that are allowed to allocate from the shared global memory.</p><p>The actual limits might change over time as we learn more about how to adapt the memory allocations.</p><p>Most regions have access also to a shared global memory. It will first use its reserved memory and if there is shared global memory available it can allocate from this as well.</p><p>The most important ones are DataMemory and DiskPageBufferMemory. Any row stored in memory and all indexes in RonDB are stored in the DataMemory. The DiskPageBufferMemorycontains the page cache for data stored on disk. To ensure that we can always handlerecovery, DataMemory is fixed in size and since recovery can sometimes grow the data size a bit. We don't allow the DataMemory to be filled beyond 95% in normal operation. In recovery it can use the full DataMemory size. Those extra 5% memory resources are also reserved for critical operations such as growing the cluster with more nodes and reorganising the data inside RonDB. The DiskPageBufferCache is fixed in size, operations towards the disk is queued by using DiskOperationRecords.</p><p>Critical regions which have higher priority to get memory compared to the rest of the regions. These are job buffers used for sending messages between modules inside a data node, send buffers used for sending messages between nodes in the cluster, the meta data required for handling backup operations and finally operation records to access disk data.</p><p>These regions will be able to allocate memory even when all other regions will fail to allocate memory. Failure to access memory for those regions would lead to failure of the data node or failure to backup the data which are not events that are acceptable in a DBMS.</p><p>We have 2 more regions that are fixed in size, the Redo log buffer and the Undo log buffer (the Undo log is only used for operations on disk pages). Those allocate memory at startup and use that memory, there is some functionality to handle overload on those buffers by queueing operations when those buffers are full.</p><p>The remaining 4 regions we will go through in detail.</p><p>The first one is TransactionMemory. This memory region is used for all sorts of operations such as transaction records, scan records, key operation records and many more records used to handle the queries issued towards RonDB.</p><p>The TransactionMemory region have a reserved space, but it can grow up to 50% of the shared global memory beyond that. It can even grow beyond that, but in this case it only has access to the lowest priority region of the shared global memory. Failure to allocate memory in this region leads to aborted transactions.</p><p>The second region in this category is SchemaMemory. This region contains a lot of meta data objects representing tables, fragments, fragment replicas, columns, and triggers. These are long-term objects that will be there long-term. Thus we want this region to be flexible in size, but we don't want it grow such that it diminishes the possibility to execute queries towards region. Thus we calculate a reserved part and allow this part to grow into at most 20% of the shared memory region in addition to its reserved region. This region cannot access the higher priority memory regions of the shared global memory.</p><p>Failure to allocate SchemaMemory causes meta data operations to be aborted.</p><p>Next region in this category is ReplicationMemory. These are memory structures used to represent replication towards other clusters supporting Global Replication. It can also be used to replicate changes from RonDB to other systems such as ElasticSearch. The memory in this region is of temporary nature with memory buffers used to store the changes that are being replicated. The meta data of the replication is stored in the SchemaMemory region.</p><p>This region has a reserved space, but it can also grow to use up to 30% of the shared global memory. After that it will only have access to the lower priority regions of the shared global memory.</p><p>Failure to allocate memory in this region lead to failed replication. Thus replication have to be set up again. This is a fairly critical error, but it is something that can be handled.</p><p>The final region in this category is QueryMemory. This memory has no reserved space, it can use the shared global lower priority regions. This memory is used to handle complex SQL queries. Failure to allocate memory in this region will lead to complex queries being aborted.</p><p>This blog presents the memory management architecture in RonDB that is currently in a branch called schema_mem_21102, this branch is intended for RonDB 21.10.2, but could also be postponed to RonDB 22.04. The main difference in RonDB 21.04 is that the SchemaMemory and ReplicationMemory are fixed in size and cannot use the shared global memory. The BackupSchemaMemory is also introduced in this branch. It was currently part of the TransactionMemory.</p><p>In the next blog on this topic I will discuss how one configures the automatic memory in RonDB.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-41745331785786212322021-09-24T19:37:00.000+02:002021-09-24T19:37:14.820+02:00Automatic Memory Management in RonDB<p><a href="https://www.rondb.com" target="_blank">RonDB</a> has now grown up to the same level of memory management as you find in expensive commercial DBMSs like Oracle, IBM DB2 and Microsoft SQL Server.</p><p>Today I made the last development steps in this large project. This project started with a prototype effort by Jonas Oreland already in 2013 after being discussed for a long time before that. After he left for Google the project was taken over by Mauritz Sundell that implemented the first steps for operational records in the transaction manager.</p><p>Last year I added the rest of the operational records in NDB. Today I completed the programming of the final step in RonDB. This last step meant moving around 30 more internal data structures towards using the global memory manager. These memory structures are used to represent meta data about tables, fragments, fragment replicas, triggers and global replication objects.</p><p>One interesting part that is contained in this work is a malloc-like implementation that interacts with all record-level data structures that is already in RonDB to handle linked list, hash tables and so forth for internal data structures.</p><p>So after more than 5 years it feels like a major step forward in the development of RonDB.</p><p>What does this mean for a user of RonDB? It means that the user won't have to bother much with memory management configuration. If RonDB is started in a cloud VM, it will simply use all memory in the VM and ensure that the memory is handled as a global resource that can be used by all parts of RonDB. This feature is exactly existing already in RonDB 21.04. What this new step means is that the memory management is even more flexible, there is no need to allocate more memory than needed for meta data objects (and vice versa if more memory is needed, it is likely to be accessible).</p><p>Thus memory can be used for other purposes as well. Thus the end result is that more memory is made available in all parts of RonDB, both to store data in it and to perform more parallel transactions and more query handling.</p><p>Another important step is that this step opens up for many new developments to handle larger objects in various parts of RonDB.</p><p>In later blogs we will describe how the memory management in RonDB works. This new development will either appear in RonDB 21.10 or in RonDB 22.04.</p><div><br /></div>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com2tag:blogger.com,1999:blog-14455177.post-70343524885792635302021-08-13T16:24:00.005+02:002021-08-13T16:27:30.126+02:00How to achieve AlwaysOn<p>When discussing how to achieve High Availability most DBMS focus on handling it via replication. Most of the focus has thus been focused on various replication algorithms.</p><p>However truly achieving AlwaysOn availability requires more than just a clever replication algorithm.</p><p><a href="http://rondb.com" target="_blank">RonDB</a> is based on NDB Cluster, NDB has been able to prove in practice that it can deliver capabilities that makes it possible to build systems with less than 30 seconds of downtime per year.</p><p>So what is required to achieve this type of availability?</p><p></p><ol style="text-align: left;"><li>Replication</li><li>Instant Failover</li><li>Global Replication</li><li>Failfast Software Architecture</li><li>Modular Software Architecture</li><li>Advanced Crash Analysis</li><li>Managed software</li></ol><p></p><p>Thus a clever replication algorithm is only 1 of 7 very important parts to achieve the highest possible level of availability. Managed software is one of the addition that RonDB does to NDB Cluster. This won't be discussed in this blog.</p><p>Instant Failover means that the cluster must handle failover immediately. This is the reason why RonDB implements a Shared Nothing DBMS architecture. Other HA DBMS such as Oracle and MySQL InnoDB Cluster and Galera Cluster relies on replaying the logs at failover to catch up. Before this catch up has happened the failover hasn't completed. In RonDB every updating transaction updates both data and logs as part of the changing transaction, thus at failover we only need to update the distribution information.</p><p>In a DBMS updating information about node state is required to be a transaction itself. This transaction takes less than one millisecond to perform in a cluster. Thus in RonDB the time it takes to failover is dependent on the time it takes to discover that the node has failed. In most cases the reason for the failure is a software failure and this usually leads to dropped network connections which are discovered within microseconds. Thus most failovers are handled within milliseconds and the cluster is repaired and ready to handle all transactions again.</p><p>The hardest failure to discover are the silent failures, this can happen e.g. when the power on a server is broken. In this case the time it takes is dependent on the time configured for heartbeat messages. How low this time can be set is dependent on the operating system and how much one can depend on that it sends a message in a highly loaded system. Usually this time is a few seconds.</p><p>But even with replication and instant failover we still have to handle failures caused by things like power breaks, thunderstorms and many more problems that cause an entire cluster to fail. A DBMS cluster is usually located within a confined space to achieve low latency on database transactions.</p><p>To handle this we need to handle failover from one RonDB cluster to another RonDB cluster. This is achieved in RonDB by using asynchronous replication from one cluster to another. This second RonDB cluster needs to physically separated from the other cluster to ensure higher independence of failures.</p><p>Actually having global replication implemented also means that one can handle complex software changes such as if your application does a massive rewrite of the data model in your application.</p><p>Ok, are we done now, is this sufficient to get a DBMS cluster which is AlwaysOn.</p><p>Nope, more is needed. After implementing these features it is also required to be able to quickly find the bugs and be able to support your customers when they hit issues.</p><p>The nice thing with this architecture is that a software failure will most of the time not cause anything more than a few aborted transactions which the application layer should be able to handle.</p><p>However in order to build an AlwaysOn architecture one has to be able to quickly get rid of bugs as well.</p><p>When NDB Cluster joined MySQL two different software architectures met each other. MySQL was a standalone DBMS, this meant that when it failed the database was no longer available. Thus MySQL strived to avoid crashes since that meant that the customer no longer could access its data.</p><p>With NDB Cluster the idea was that there would always be another node available to take over if we fail. Thus NDB, and thus also RonDB implements a Failfast Software Architecture. In RonDB this is implemented using a macro in the RonDB called ndbrequire, this is similar how most software uses assert. However ndbrequire stays in the code also when we run in production code.</p><p>Thus every transaction that is performed in RonDB causes thousands error checks to be checked. If one of those ndbrequire's returns false we will immediately fail the node. Thus RonDB will never proceed when we have an indication that we have reached a disallowed state. This ensures that the likelihood of a software failure leading to data being incorrect is minimised.</p><p>However crashing solves only the problem as a short-term solution. In order to solve the problem for real we also have to fix the bug. To be able to fix bugs in a complex DBMS requires a modular software architecture. RonDB software architecture is based on experiences from AXE, this is a switch developed in the 1970s at Ericsson.</p><p>The predecessor of AXE at Ericsson was AKE, this was the first electronic switch developed at Ericsson. It was built as one big piece of code without clear boundaries between the code parts. When this software reached sizes of millions of lines of code it became very hard to maintain the software.</p><p>Thus when AXE was developed in a joint project between Ericsson and Telia (a swedish telco operator) the engineers needed to find a new software architecture that was more modular.</p><p>The engineers had lots of experiences of designing hardware as well. In hardware the only path to communicate between two integrated circuits is by using signals on an electrical wire. Since this made it possible to design complex hardware with small amount of failures, the engineers reasoned that this architecture should work as a software architecture as well.</p><p>Thus the AXE software architecture used blocks instead of integrated circuits and signals instead of electrical signals. In modern software language these would have been called modules and messages most likely.</p><p>A block owns its own data, it cannot peek at other blocks data, the only manner to communicate between blocks is by using signals that send messages from one block to another block.</p><p>RonDB is designed like this with 23 blocks that implements different parts of the RonDB software architecture. The method to communicate between blocks is mainly through signals. These blocks are implemented as large C++ classes.</p><p>This software architecture leads to a modular architecture that makes it easy to find bugs. If a state is wrong in a block it can either be caused by code in the block, or by a signal sent to the block.</p><p>In RonDB signals can be sent between blocks in the same thread, to blocks in another thread in the same node and they can be sent to a thread in another node in the cluster.</p><p>In order to be able to find the problem in the software we want access to a number of things. The most important feature to discover is to discover the code path that led to the crash.</p><p>In order to find this RonDB software contains a macro called jam (Jump Address Memory). This means that we can track a few thousand of the last jumps before the crash. The code is filled with those jam macros. This is obviously an extra overhead that makes RonDB a bit slower, but to deliver the best availability is even more important than being fast.</p><p>Just watch Formula 1, the winner of Formula 1 over a season will never be a car that fails every now and then, the car must be both fast and reliable. Thus in RonDB reliability has priority over speed even though we mainly talk about the performance of RonDB.</p><p>Now this isn't enough, the jam only tracks jumps in the software, but it doesn't provide any information about which signals that led to the crash. This is also important. In RonDB each thread will track a few thousand of the last signals executed by the thread before the crash. Each signal will carry a signal id that makes it possible to follow signals being sent also between threads within RonDB.</p><p>Let's take an example of how useful this information is. Lately we had an issue in the NDB forum where a user complained that he hadn't been able to produce any backups the last couple of months since one of the nodes in the cluster failed each time the backup was taken.</p><p>In the forum the point in the code was described in the error log together with a stack trace of which code we executed while crashing. However this information wasn't sufficient to find the software bug.</p><p>I asked for the trace information that includes both the jam's and the signal logs of all the threads in the crashed node.</p><p>Using this information one could quickly discover how the fault occurred. It would only happen in high-load situations and required very tricky races to occur, thus the failure wasn't seen by most users. However with the trace information it was fairly straightforward to find what caused the issue and based on this information a work-around to the problem was found as well as a fix of the software bug. The user could again be comfortable by being able to produce backups.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com0tag:blogger.com,1999:blog-14455177.post-30871405929222079702021-08-12T15:24:00.002+02:002021-08-27T09:35:50.276+02:00RonDB and Docker Compose<p>After publishing the <a href="http://mikaelronstrom.blogspot.com/2021/08/rondb-and-docker.html" target="_blank">Docker container for RonDB</a> I got a suggestion to simplify it further by using Docker Compose. After a quick learning using Google I came up with a Docker Compose configuration file that will start the entire RonDB cluster and stop it using a single command.</p><p>First of all I had to consider networking. I decided that using an external network was the best solution. This makes it easy to launch an application that uses RonDB as a back-end database. Thus I presume that an external network has been created with the following command before using Docker Compose to start RonDB:</p><p>docker network create mynet --subnet=192.168.0.0/16</p><p>The docker-compose.yml is available on GitHub at</p><p><a href="https://github.com/logicalclocks/rondb-docker/rondb/21.04/docker-compose.yml" target="_blank">https://github.com/logicalclocks/rondb-docker</a><br /></p><p>In the file rondb/21.04/docker-compose.yml for RonDB 21.04 and in rondb/21.10/docker-compose.yml for RonDB 21.10. <a href="https://github.com/logicalclocks/rondb-docker/blob/main/rondb/21.04/docker-compose.yml" target="_blank">Link to docker-compose.yml</a></p><p>To start a RonDB cluster now run this command from a directory where you have placed docker-compose.yml.</p><p>docker-compose up -d</p><p>After about 1 minute the cluster should be up and running and you can access it using:</p><p>docker exec -it compose_test_my1_1 mysql -uroot -p</p><p>password: password</p><p>The MySQL Server is available at port 3306 on IP 192.168.0.10 using the mynet subnet</p><p>When you want to stop the RonDB cluster use the command:</p><p>docker-compose stop</p><p>Docker Compose creates normal Docker containers that can be viewed using docker ps and docker logs commands as usual.</p>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com2tag:blogger.com,1999:blog-14455177.post-69687342369371514542021-08-12T12:43:00.001+02:002021-08-12T14:01:07.131+02:00RonDB and Docker<div style="text-align: left;"><div>There was a request to be able to test RonDB using Docker. This is now working.</div><div>These commands will set up a RonDB cluster on your local machine that can be used to test RonDB:</div><div><br /></div><div>Step 1: Download the Docker containers for RonDB</div><div><br /></div><div>docker pull mronstro/rondb</div><div><br /></div><div>Step 2: Create a Docker subnet</div><div><br /></div><div>docker network create mynet --subnet=192.168.0.0/16</div><div><br /></div><div>Step3: Start the RonDB management server</div><div><br /></div><div>docker run -d \</div><div> --net=mynet \</div><div> -v /path/datadir:/var/lib/rondb \</div><div> -ip 192.168.0.2 \</div><div> -name mgmt1 \</div><div> mronstro/rondb ndb_mgmd --ndb-nodeid=65</div><div><br /></div><div>Step 4: Start the first RonDB data node</div><div><br /></div><div>docker run -d \</div><div> --net=mynet \</div><div> -v /path/datadir:/var/lib/rondb \</div><div> -ip 192.168.0.4 \</div><div> -name ndbd1 \</div><div> mronstro/rondb ndbmtd --ndb-nodeid=1</div><div><br /></div><div>Step 5: Start the second RonDB data node</div><div><br /></div><div>docker run -d \</div><div> --net=mynet \</div><div> -v /path/datadir:/var/lib/rondb \</div><div> -ip 192.168.0.5 \</div><div> -name ndbd2 \</div><div> mronstro/rondb ndbmtd --ndb-nodeid=2</div><div><br /></div><div>Step 6: Check that the cluster has started and is working</div><div><br /></div><div>This step isn't required, but just to show that the cluster is</div><div>up and running, start the RonDB management client and issue the</div><div>show command.</div><div><br /></div><div>docker exec -it mgmt1 ndb_mgm</div><div>ndb_mgm> show</div><div><br /></div><div>This should hopefully show a starting cluster and after about</div><div>half a minute the cluster should be started.</div><div><br /></div><div>Step 7: Start a MySQL Server</div><div><br /></div><div>Note that the MySQL Server uses /var/lib/mysql as datadir internally</div><div>whereas the RonDB management server and data node uses</div><div>/var/lib/rondb.</div><div><br /></div><div>docker run -d \</div><div> --net=mynet \</div><div> -v /path/datadir:/var/lib/mysql \</div><div> -e MYSQL_ROOT_PASSWORD=your_password \</div><div> -ip 192.168.0.10 \</div><div> -name mysqld1 \</div><div> mronstro/rondb mysqld --ndb-cluster-connection-pool-nodeids=67</div><div><br /></div><div>Step 8: Start a MySQL client</div><div><br /></div><div>docker exec -it mysqld1 mysql -uroot -p</div><div>Password: your_password</div><div><br /></div><div>Now you are connected to a MySQL client that can issue SQL commands</div><div>towards the RonDB cluster. Below is a very simple example of such</div><div>commands:</div><div><br /></div><div>mysql> CREATE DATABASE TEST;</div><div>mysql> USE TEST;</div><div>mysql> CREATE TABLE t1 (a int primary key) engine=ndb;</div><div>mysql> INSERT INTO t1 VALUES (1),(2);</div><div>mysql> SELECT * FROM t1;</div><div><br /></div><div>I tested this on my development machine using Mac OS X. To succeed with the setup</div><div>my Docker setup required at least 8 GByte of memory. RonDB is optimised for use</div><div>in VMs in the cloud where a minimum of 8 GByte of memory is available for the</div><div>data node VMs. Since the default configuration of Docker will presumably mainly</div><div>be used for simple tests I decided to decrease the size of the RonDB data nodes</div><div>such that they fit in 3 GBytes of memory. It is definitely possible to run</div><div>RonDB in an even smaller environment, but I think that the default should at least</div><div>be able to load at least 1 GByte of data and a fair amount of tables into RonDB.</div><div><br /></div><div>RonDB and Docker is documented at <a href="https://docs.rondb.com/rondb_docker/">https://docs.rondb.com/rondb_docker/</a></div><div><br /></div><div>The RonDB documentation has also been improved at the same time.</div><div><br /></div><div>The GitHub tree for the Docker containers can be found at:</div><div><a href="https://github.com/logicalclocks/rondb-docker" target="_blank">https://github.com/logicalclocks/rondb-docker</a><br /></div><div><br /></div><div>The GitHub tree is based on the MySQL Docker tree at GitHub.</div><div><br /></div><div>The Docker Hub is found at:</div><div><a href="https://hub.docker.com/r/mronstro/rondb/" target="_blank">https://hub.docker.com/r/mronstro/rondb/</a><br /></div></div>Mikael Ronstromhttp://www.blogger.com/profile/07134215866292829917noreply@blogger.com1