Friday, December 06, 2019

NDB Parallel Query, part 2

In part 1 we showed how NDB can parallelise a simple query with only a single
table involved. In this blog we will build on this and show how NDB can only
parallelise some parts of two-way join query. As example we will use Q12 in
DBT3:

SELECT
        l_shipmode,
        SUM(CASE
                WHEN o_orderpriority = '1-URGENT'
                        OR o_orderpriority = '2-HIGH'
                        THEN 1
                ELSE 0
        END) AS high_line_count,
        SUM(CASE
                WHEN o_orderpriority <> '1-URGENT'
                        AND o_orderpriority <> '2-HIGH'
                        THEN 1
                ELSE 0
        END) AS low_line_count
FROM
        orders,
        lineitem
WHERE
        o_orderkey = l_orderkey
        AND l_shipmode IN ('MAIL', 'SHIP')
        AND l_commitdate < l_receiptdate
        AND l_shipdate < l_commitdate
        AND l_receiptdate >= '1994-01-01'
        AND l_receiptdate < DATE_ADD( '1994-01-01', INTERVAL '1' year)
GROUP BY
        l_shipmode
ORDER BY
        l_shipmode;

This query when seen through the relational operators will first pass through
a SELECT operator and a PROJECT operator in the data nodes. The JOIN operator
will be executed on the lineitem and orders tables and the result of the JOIN operator
will be sent to the MySQL Server. The MySQL Server will thereafter handle the
GROUP BY operator with its aggregation function and also the final SORT operator.
Thus we can parallelise the filtering, projection and join, but the GROUP BY
aggregation and sorting will be implemented in the normal MySQL execution of
GROUP BY, SUM and sorting.

This query will be execute by first performing a range scan on the lineitem
table and evaluating the condition that limits the amount of rows to send to
the join with the orders table. The join is performed on the primary key of
the orders table. So the access in the orders table is a primary key lookup
for each row that comes from the range scan on the lineitem table.

In the MySQL implementation of this join one will fetch one row from the
lineitem table and for each such row it will perform a primary key lookup
in the orders table. Given that this means that we can only handle one
primary key lookup at a time unless we do something in the NDB storage
engine. The execution of this query without pushdown join would make
it possible to run the scans towards the lineitem table in parallel. The
primary key lookup on the orders table would however execute serially
and only fetching one row at a time. This will increase the query time in
this case with a factor of around 5x. So by pushing the join down into
the NDB data nodes we can make sure that the primary key lookups on the
orders table are parallelised as well.

To handle this the MySQL Server has the ability to push an entire join
execution down to the storage engine. We will describe this interface in more
detail in a later blog part.

To handle this query in NDB we have implemented a special scan protocol that
enables performing complex join operations. The scan will be presented with
a parameter part for each table in the join operation that will describe the
dependencies between the table and the conditions to be pushed to each table.

This is implemented in the TC threads in the NDB data node. The TC threads in
this case acts as parallel JOIN operators. The join is parallelised on the
first table in the join, in this case the lineitem table. For each node in
the cluster a JOIN operator will be created that takes care of scanning all
partitions that have its primary partition in the node. This means that the
scan of the first table and the join operator is always located on the same node.

The primary key lookup is sent to the node where the data resides, in a cluster
with 2 replicas and 2 nodes and the table uses READ BACKUP, we will always find
the row locally. With larger clusters the likelihood that this lookup is sent
over the network increases.

Compared to a single threaded storage engine this query scales almost 30x times
using 2 nodes with 8 LDM threads each. NDBs implementation is as mentioned in
the previous blog very efficiently implemented, so the speedup gets a benefit
from this.

This query is more efficiently implemented in MySQL Cluster 8.0.18 since we
implemented support for comparing two columns, both from the same table and
from different tables provided they have the same data type. This improved
performance of this query by 2x. Previous to this the NDB interpreter could
handle comparisons of the type col_name COMPARATOR constant, e.g.
l_receiptdate >= '1994-01-01'.

Query Execution


In the figure below we show the execution flow for this query in NDB. As described
above we have a module called DBSPJ in the TC threads that handle the JOIN
processing. We have shown in the figure below the flow for the scan of the lineitem
table in blue arrows. The primary key lookups have been shown with red arrows.
In the figure below we have assumed that we're not using READ BACKUP. We will
describe in more detail the impact of READ BACKUP in a later part of this blog serie.


Query Analysis

The query will read the lineitem in parallel using a range scan. This scan will
evaluate 909.844 rows when using scale factor 1 in TPC-H. Of those rows there will
be 30.988 rows that will evaluate to true. Each of those 30.988 rows will be sent to the
NDB API but will also be reported to the DBSPJ module to issue parallel key lookups
towards the orders table.

As a matter of a fact this query will actually execute faster than Q6 although it does
more work compared to the previous query we analysed (Q6 in TPC-H). Most of the
work is done in the lineitem table, both Q6 and Q12 does almost the same amount of
work in the range scan on the lineitem. However since there are fewer records to report
back to the MySQL Server this means that parallelism is improved due to the batch
handling in NDB.

Scalability impact

This query will scale very well with more partitions of the lineitem table
and the orders table. As the cluster grows some scalability impact will
come from a higher cost of the primary key lookups that have to be sent on
the network to other nodes.

Next Part

In part 3 we will discuss how the MySQL Server and the NDB storage engine works
together to define the query parts pushed down to NDB.

No comments: