Mikael Ronstrom: NDB Parallel Query, part 5

In this part we are going to analyze a bit more complex query than before.
This query is a 6-way join.

The query is:
SELECT
supp_nation,
cust_nation,
l_year,
SUM(volume) AS revenue
FROM
(
SELECT
n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
extract(year FROM l_shipdate) as l_year,
l_extendedprice * (1 - l_discount) AS volume
FROM
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2
WHERE
s_suppkey = l_suppkey
AND o_orderkey = l_orderkey
AND c_custkey = o_custkey
AND s_nationkey = n1.n_nationkey
AND c_nationkey = n2.n_nationkey
AND (
(n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')
OR (n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY')
)
AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31'
) AS shipping
GROUP BY
supp_nation,
cust_nation,
l_year
ORDER BY
supp_nation,
cust_nation,
l_year;

It is the inner SELECT that is the 6-way join. The outer part only deals with the
GROUP BY aggregation and ORDER BY of the result set from the inner
SELECT. As mentioned before the GROUP BY aggregation and ORDER BY
parts are handled by the MySQL Server. So the NDB join pushdown only deals
with the inner select.

In the previous queries we analysed the join order was pretty obvious. In this
case it isn't that obvious. But the selection of join order is still fairly
straightforward. The selected join order is
n1 -> supplier -> lineitem -> orders -> customer -> n2.

Query analysis

The query starts by reading 2 rows from the nation table. This creates a new scan
on the supplier table, these 2 rows are either coming from the same TC thread or
from separate TC threads. This scan generates data for the next scan in the supplier
table. The supplier table will return 798 rows that is used in the scan against the
lineitem table. This assumes scale factor 1.

This represents a new thing to discuss. If this query would have been executed in the
MySQL Server we would only be able to handle one row from the supplier table at a
time. There have been some improvement in the storage engine API to handle this
using read multi range API in the storage engine API. This means a lot of
communication back and forth and starting up new scans. With the NDB join
processing we will send a multi-range scan to the lineitem table. This means that we
will send one scan message that contains many different ranges. There will still be a
new walking through the index tree for each range, but there is no need to send the
scan messages again and again.

Creation of these multi-ranges is handled as part of the join processing in the
DBSPJ module.

The join between supplier table and the lineitem contains one more interesting
aspect. Here we join towards the column l_orderkey in the lineitem table. In many
queries in TPC-H the join against the lineitem table uses the order key as the join
column. The order key is the first part of the primary key and is thus a candidate to
use as partition key. The TPC-H queries definitely improves by using the order key as
partition key instead of the primary key. This means that the orders and all lineitems
for the order are stored in the same LDM thread.

The scan on the lineitem will produce 145.703 to join with the orders table. The rest of
the joins are joined through the primary key. Thus we will perform 145.703 key lookups
in the orders table, there will be 145.703 key lookups in the customer table and finally
there will be 145.703 lookups against the nations table. The only filtering here will be
on the last table that will decrease the amount of result rows to the MySQL Server,
the end result will be 5.924 rows.

This gives another new point that it would be possible to increase parallelism in this
query by storing the result rows in the DBSPJ. However this would increase the
overhead, so it would improve parallelism at the cost of efficiency.

Scalability impact

If we make sure that the lineitem table is partitioned on the order key this query will
scale nicely. There will be fairly small impact with more partitions since only the scan
against the supplier table will be more costly in a larger cluster.

One thing that will make the query cost more is when the primary key lookups are
distributed instead of local. One table that definitely will be a good idea to use
FULLY REPLICATED for is the nations table. This means that all those 145.703 key
lookups will be handled inside a data node instead of over the network.

The supplier table has only 10.000 rows compared to the lineitem table that has
6M rows. Thus it should definitely be possible to use FULLY REPLICATED also
for this table. The customer table has 150.000 rows and is another candidate to use
for FULLY REPLICATED.

Since the MySQL Server will have to handle more than 300.000 rows in this query,
this will be the main bottleneck for parallelism. This means that the query will have a
parallelism of about 5. This is also the speed up we see compared to single threaded
storage engine for this query. This bottleneck will be about the same even with
larger clusters.

Next Part

I will take a break in this sequence of blogs for now and come back later with a

description of a bit more involved queries and how NDB handles pushing down

subqueries and parts of join query.

Mikael Ronstrom

Wednesday, December 11, 2019

NDB Parallel Query, part 5

Query analysis

Scalability impact

Next Part

No comments:

About Me

Links