In MySQL Cluster 8.0.20 we added one more query type that can be pushed
down to NDB data nodes. These queries are outer join queries and semi-join
queries using the FirstMatch strategy.
These queries can be pushed if conditions can be pushed on those parts of
the query that involve the outer join using a concept in the MySQL Server
called join nests.
Pushed queries means that the join execution is pushed down to the NDB
data nodes. This means that the rows to process for the second table in
the join is sent directly from the LDM threads to the join processor
that will start the select on the second table. The join can contain
up to 32 tables. Many filters can be pushed as well. It is also possible
to push parts of a join query. We are actively working on supporting
more and more query variants for push down to NDB data nodes.
The main benefits of pushing the queries down to data nodes is that it
enables the queries to be parallelised since both accesses to the data
owning parts (LDM threads) and the join processors (tc threads) can be
executed in parallel on multiple nodes and multiple threads in each
node. This means that queries can return the results much quicker.
Some queries can become more than 50 times faster using this approach
compared to when queries are executed entirely in the MySQL Server.
In our development of these new pushdown join features we measure the
result of executing queries in TPC-H. In TPC-H there is one outer join
query that looks like this:
select
c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey) as c_count
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%special%requests%'
group by
c_custkey
) as c_orders
group by
c_count
order by
custdist desc,
c_count desc;
This query benefits greatly from both the push of outer joins, it also
benefits greatly from the new CPU spinning and it benefits from using the
shared memory transporter between the data node and the MySQL Server.
Combined together these 3 things together improve latency of this query
(Q13) in TPC-H by a factor of 10x when compared to executing it in
MySQL Cluster 7.6.
Another query that benefits greatly from these changes is Q4 in TPC-H.
select
o_orderpriority,
count(*) as order_count
from
orders
where
o_orderdate >= '1993-07-01'
and o_orderdate < date_add( '1993-07-01', interval '3' month)
and exists (
select
*
from
lineitem
where
l_orderkey = o_orderkey
and l_commitdate < l_receiptdate
)
group by
o_orderpriority
order by
o_orderpriority;
This query executes more than 50 times faster in MySQL Cluster 8.0.20
compared to in MySQL Cluster 7.6 and this benefit comes strictly from
changing the execution algorithm.
No comments:
Post a Comment