In MySQL Cluster 8.0.20 we added one more query type that can be pushed
down to NDB data nodes. These queries are outer join queries and semi-join
queries using the FirstMatch strategy.
These queries can be pushed if conditions can be pushed on those parts of
the query that involve the outer join using a concept in the MySQL Server
called join nests.
Pushed queries means that the join execution is pushed down to the NDB
data nodes. This means that the rows to process for the second table in
the join is sent directly from the LDM threads to the join processor
that will start the select on the second table. The join can contain
up to 32 tables. Many filters can be pushed as well. It is also possible
to push parts of a join query. We are actively working on supporting
more and more query variants for push down to NDB data nodes.
The main benefits of pushing the queries down to data nodes is that it
enables the queries to be parallelised since both accesses to the data
owning parts (LDM threads) and the join processors (tc threads) can be
executed in parallel on multiple nodes and multiple threads in each
node. This means that queries can return the results much quicker.
Some queries can become more than 50 times faster using this approach
compared to when queries are executed entirely in the MySQL Server.
In our development of these new pushdown join features we measure the
result of executing queries in TPC-H. In TPC-H there is one outer join
query that looks like this:
count(*) as custdist
count(o_orderkey) as c_count
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
) as c_orders
This query benefits greatly from both the push of outer joins, it also
benefits greatly from the new CPU spinning and it benefits from using the
shared memory transporter between the data node and the MySQL Server.
Combined together these 3 things together improve latency of this query
(Q13) in TPC-H by a factor of 10x when compared to executing it in
MySQL Cluster 7.6.
Another query that benefits greatly from these changes is Q4 in TPC-H.
count(*) as order_count
o_orderdate >= '1993-07-01'
and o_orderdate < date_add( '1993-07-01', interval '3' month)
and exists (
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
This query executes more than 50 times faster in MySQL Cluster 8.0.20
compared to in MySQL Cluster 7.6 and this benefit comes strictly from
changing the execution algorithm.