For MySQL Cluster 7.2 we have worked on increasing the number of CPUs that can be used to execute the workload on the data nodes.
A MySQL Cluster 7.2 data node is functionally divided into 7 thread types.
1) The Local Data Manager threads (ldm)
2) The Transaction Coordinator threads (tc)
3) The Asynchronous Replication threads (rep)
4) The Schema Management threads (main)
5) Network receiver threads (recv)
6) Network send threads (send)
7) IO threads
In the LDM domain we have increased the maximum number of threads from 4 to 16. The LDM contains the actual data, this means that when using 16 threads the data becomes more partitioned (as usual this is all automatic in MySQL Cluster). Each LDM thread maintains its own set of data partitions, index partitions and its own REDO log. The number of LDM parts per data node isn't dynamically changeable. It's however possible to map more than one part onto each LDM thread which provides some flexibility in changing the number of LDM threads.
The TC domain contains the state of ongoing transactions. This means that every new transaction can easily be assigned to a new TC thread. This provides for easy flexibility of the number of TC threads to grow to a large number of threads. We've found that in most cases 1 TC thread per 2 LQH thread is sufficient and in many cases even 1 TC thread per 4 LQH threads. In rare cases with very high update loads it can be required to have 3-4 TC threads per 4 LQH threads. In MySQL Cluster 7.1 it was only possible to have 1 TC thread, now with MySQL Cluster 7.2 it's possible to have up to 16 TC threads. The TC domain also contains the control parts that handle the Adaptive Query Localization introduced in MySQL Cluster 7.2.
The Asynchronous Replication functionality is already in MySQL Cluster 7.1 separated into its own thread and remains so.
The Schema Management domain has been separated from the TC domain to make it necessary to scale the number of TC threads. This thread normally have very little load and we've seen no requirements on scaling this domain to more than one thread.
The Network receiver domain was bound to 1 thread on MySQL Cluster 7.1. With the increase of threads in MySQL Cluster 7.2 it is also necessary to scale up the number of receive threads. We have partitioned such that each receive thread takes care of one or more sockets to communicate with other nodes in MySQL Cluster. There is one socket per node to communicate with other nodes. In MySQL Cluster 7.2 we can now scale the number of receive threads up to 8 threads.
The Network send domain is a new thread type. In MySQL Cluster 7.1 sending was done from the other threads directly. This can provide slightly lower latency, but for highest throughput we have provided the possibility to perform send from separate send threads. There can be up to 8 send threads. It is also possible to continue without using send threads. We have also been able to improve the overload handling when using send threads.
The final thread type, the IO threads remains the same in 7.2 as in 7.1. This means that we can have multiple IO threads, either one thread per open file, or a fixed number of IO threads that handles the IO traffic. Except when using compression on disk, the IO threads have very light load.
The configuration of threads can be handled in two ways. The first is to simply set MaxNoOfExecutionThreads to an appropriate number that specifies how many threads we want to run in the data node. The second manner is to use the new ThreadConfig variable where it is possible to set both how many threads of each type to use and also to set the CPUs to bind those threads to.
We're very proud of those scalability enhancements that have made it possible to scale CPU usage per data node to more than 5x of what is possible in MySQL Cluster 7.1. In addition we've removed a number of bottlenecks making it possible to scale per data node performance by even more than 5x. Comparing our benchmark numbers for MySQL Cluster 7.2 we can see that we achieved 2.1M reads per second on 7.1 and we've achieved 17.6M reads per second, both on 8 data node set-ups using the same HW. Thus more than 8X better performance per node for MySQL Cluster 7.2 compared to 7.1.
The flexible configuration means that it is possible to optimise data node set-up to use anything from a single CPU up to a 48 CPU box. Colocating the MySQL Server and a single data node we can even make nice use of a 64 CPU box or even an 80 CPU box. As usual it is also still possible to have multiple data nodes per machine, but this is now required only for very large machines with 4 CPU sockets and more.
As an example of how we can make best use of a 24 CPU box is to use 8 ldm threads, 4 tc threads, 3 recv threads, 3 send threads and 1 rep thread for asynchronous replication. Each of those threads should be bound to a CPU. The main thread (schema management domain) and the IO threads should be possible to bind to the same CPU in most installations. Thus we have bound threads to 20 different CPUs. We should also protect these 20 CPUs from interrupts by using the IRQBALANCE_BANNED_CPUS config variable in /etc/sysconfig/irqbalance and setting it to 0xFFFFF0. Running MySQL Cluster generates a lot of interrupt processing and also OS kernel processing and it's a good idea to stay away from a number of CPUs to ensure that this activity won't conflict with the MySQL Cluster threads. When booting a Linux kernel one can even provide an option isolcpus=0-19 in grub.conf. This means that the Linux scheduler won't use these CPUs for any task. Only by using CPU affinity syscalls can a process be made to run on those CPUs. Using this together with binding MySQL Cluster threads and banning CPUs IRQ processing on these tasks should provide for a very stable performance environment of a MySQL Cluster data node.
On a 32 CPU box one upgrades the number of ldm threads to 12, increase tc threads to 6 and provide 2 more CPUs for the OS and interrupts. The number of send and receive threads should in most cases still be sufficient.
On a 40 CPU box one can go 16 ldm threads, 8 tc threads and increment send and receive threads to 4.
On a 48 CPU box it is possible to use more safety margins and use 12 tc threads instead, 2 more CPUs for the OS and interrupts and avoid using IO threads and main thread on same CPU and finally add 1 more receive thread.
You can learn more about all of the enhancements in MySQL Cluster 7.2 from our Developer Zone article.
A MySQL Cluster 7.2 data node is functionally divided into 7 thread types.
1) The Local Data Manager threads (ldm)
2) The Transaction Coordinator threads (tc)
3) The Asynchronous Replication threads (rep)
4) The Schema Management threads (main)
5) Network receiver threads (recv)
6) Network send threads (send)
7) IO threads
In the LDM domain we have increased the maximum number of threads from 4 to 16. The LDM contains the actual data, this means that when using 16 threads the data becomes more partitioned (as usual this is all automatic in MySQL Cluster). Each LDM thread maintains its own set of data partitions, index partitions and its own REDO log. The number of LDM parts per data node isn't dynamically changeable. It's however possible to map more than one part onto each LDM thread which provides some flexibility in changing the number of LDM threads.
The TC domain contains the state of ongoing transactions. This means that every new transaction can easily be assigned to a new TC thread. This provides for easy flexibility of the number of TC threads to grow to a large number of threads. We've found that in most cases 1 TC thread per 2 LQH thread is sufficient and in many cases even 1 TC thread per 4 LQH threads. In rare cases with very high update loads it can be required to have 3-4 TC threads per 4 LQH threads. In MySQL Cluster 7.1 it was only possible to have 1 TC thread, now with MySQL Cluster 7.2 it's possible to have up to 16 TC threads. The TC domain also contains the control parts that handle the Adaptive Query Localization introduced in MySQL Cluster 7.2.
The Asynchronous Replication functionality is already in MySQL Cluster 7.1 separated into its own thread and remains so.
The Schema Management domain has been separated from the TC domain to make it necessary to scale the number of TC threads. This thread normally have very little load and we've seen no requirements on scaling this domain to more than one thread.
The Network receiver domain was bound to 1 thread on MySQL Cluster 7.1. With the increase of threads in MySQL Cluster 7.2 it is also necessary to scale up the number of receive threads. We have partitioned such that each receive thread takes care of one or more sockets to communicate with other nodes in MySQL Cluster. There is one socket per node to communicate with other nodes. In MySQL Cluster 7.2 we can now scale the number of receive threads up to 8 threads.
The Network send domain is a new thread type. In MySQL Cluster 7.1 sending was done from the other threads directly. This can provide slightly lower latency, but for highest throughput we have provided the possibility to perform send from separate send threads. There can be up to 8 send threads. It is also possible to continue without using send threads. We have also been able to improve the overload handling when using send threads.
The final thread type, the IO threads remains the same in 7.2 as in 7.1. This means that we can have multiple IO threads, either one thread per open file, or a fixed number of IO threads that handles the IO traffic. Except when using compression on disk, the IO threads have very light load.
The configuration of threads can be handled in two ways. The first is to simply set MaxNoOfExecutionThreads to an appropriate number that specifies how many threads we want to run in the data node. The second manner is to use the new ThreadConfig variable where it is possible to set both how many threads of each type to use and also to set the CPUs to bind those threads to.
We're very proud of those scalability enhancements that have made it possible to scale CPU usage per data node to more than 5x of what is possible in MySQL Cluster 7.1. In addition we've removed a number of bottlenecks making it possible to scale per data node performance by even more than 5x. Comparing our benchmark numbers for MySQL Cluster 7.2 we can see that we achieved 2.1M reads per second on 7.1 and we've achieved 17.6M reads per second, both on 8 data node set-ups using the same HW. Thus more than 8X better performance per node for MySQL Cluster 7.2 compared to 7.1.
The flexible configuration means that it is possible to optimise data node set-up to use anything from a single CPU up to a 48 CPU box. Colocating the MySQL Server and a single data node we can even make nice use of a 64 CPU box or even an 80 CPU box. As usual it is also still possible to have multiple data nodes per machine, but this is now required only for very large machines with 4 CPU sockets and more.
As an example of how we can make best use of a 24 CPU box is to use 8 ldm threads, 4 tc threads, 3 recv threads, 3 send threads and 1 rep thread for asynchronous replication. Each of those threads should be bound to a CPU. The main thread (schema management domain) and the IO threads should be possible to bind to the same CPU in most installations. Thus we have bound threads to 20 different CPUs. We should also protect these 20 CPUs from interrupts by using the IRQBALANCE_BANNED_CPUS config variable in /etc/sysconfig/irqbalance and setting it to 0xFFFFF0. Running MySQL Cluster generates a lot of interrupt processing and also OS kernel processing and it's a good idea to stay away from a number of CPUs to ensure that this activity won't conflict with the MySQL Cluster threads. When booting a Linux kernel one can even provide an option isolcpus=0-19 in grub.conf. This means that the Linux scheduler won't use these CPUs for any task. Only by using CPU affinity syscalls can a process be made to run on those CPUs. Using this together with binding MySQL Cluster threads and banning CPUs IRQ processing on these tasks should provide for a very stable performance environment of a MySQL Cluster data node.
On a 32 CPU box one upgrades the number of ldm threads to 12, increase tc threads to 6 and provide 2 more CPUs for the OS and interrupts. The number of send and receive threads should in most cases still be sufficient.
On a 40 CPU box one can go 16 ldm threads, 8 tc threads and increment send and receive threads to 4.
On a 48 CPU box it is possible to use more safety margins and use 12 tc threads instead, 2 more CPUs for the OS and interrupts and avoid using IO threads and main thread on same CPU and finally add 1 more receive thread.
You can learn more about all of the enhancements in MySQL Cluster 7.2 from our Developer Zone article.
Hi, Mikael,
ReplyDeleteThe performance of Cluster 7.2.5 is fantastic especially to use more than 8 core on Data Node. But where we could download 7.2.5 Dev and if unavailable today, when it will be released?
Best regards,
Li
Please tell me whether Dictionary Object obtained from Ndb object is thread safe or not in Ndb API.
ReplyDelete