Thursday, November 26, 2020

New adventures

 After more than 30 years of hard work at Ericsson, MySQL, Sun and Oracle I found

the time to start some new adventures in my life.


I have spent most of that time developing NDB Cluster and the MySQL Server.

Most of this time have been spent on developing new features improving

functionality and performance of these products.


The NDB Cluster product has been developed to a stage where it can be used for

a multitude of applications. Already now NDB Cluster is used for such diverse

things as DNS servers, DHCP servers, AAA servers, LDAP servers, SQL servers,

financial trading, gaming servers and even file servers.


Since I have become a bit older now, I wanted to ensure that I only focus on

the projects that I feel inspired by whether it is work related or not.

I feel much more inspired to assist people in using NDB Cluster. I have always

found it inspiring to work with companies using the product. So what could be

more natural than to join a company, Logical Clocks AB that has built their

future on top of NDB Cluster. Link to announcement


Logical Clocks AB already uses NDB Cluster to build a highly scalable file

system based on the Hadoop file system, HDFS, with very low latency. They have

proceeded to develop an on-line Feature Store that can be used by companies

in their Data Science projects. The feature store can also be integrated in

on-line systems to analyse e.g. financial transactions as they happen.


I will also be available for short projects assisting other companies to work

with NDB Cluster through my consulting company, iClaustron AB.


Going to work last monday was interesting, mostly I have known very well what

to do when I start working in the morning. This day I went to work with no

idea what would happen. This was exciting and I already started learning lots

of things about machine learning, Go programming, Python programming,

PCI Express networking, Prometheus, Grafana and a lot of other things found

on GitHub.


Even though my new assignment is about using NDB rather than developing I have

also found some time to do some things to improve NDB Cluster around

monitoring. More on that later.


I also hope to find time to pursue some personal interests. I am very interested

in genealogy, currently this means I am transcribing court records from the

1680s in the county of Luleå in the north of Sweden.


I also have some ideas about how to develop global genealogical databases that

I might play around with.


So life in Corona pandemic times can be socially tough, but I still find lots of

time to develop myself in new ways and with the new job at Logical Clocks AB I

have had many chances to meet and interact with many new people. It's also nice

to be tutored and not only to tutor. The world of machine learning is a world

that I first learnt about in 1986, at that time the world wasn't ready for

AI, but now there are many interesting challenges that can be handled with the

immensely more effective computing at hand now.


I feel really inspired to dive into all these interesting things while still

continuing my focus on making NDB Cluster a product that can be used to handle

the most challenging problems mankind encounters.

Tuesday, October 27, 2020

Using CTEs in TPC-H in NDB Cluster

 In the previous post I showed some improvements based on using

Windows functions in rewritten TPC-H queries. In this blog post

I will discuss the improvements made in Q15 using a CTE (Common

Table Expression). This is again based on inspiration from new and

old blogs written by Öystein Grövlen (should be norwegian ö's).


Here is Q15 in TPC-H:

create view revenue (supplier_no, total_revenue) as

        select

                l_suppkey,

                sum(l_extendedprice * (1 - l_discount))

        from

                lineitem

        where

                l_shipdate >= '1996-01-01'

                and l_shipdate < date_add('1996-01-01', interval '90' day)

        group by

                l_suppkey;


select

        s_suppkey,

        s_name,

        s_address,

        s_phone,

        total_revenue

from

        supplier,

        revenue

where

        s_suppkey = supplier_no

        and total_revenue = (

                select

                        max(total_revenue)

                from

                        revenue

        )

order by

        s_suppkey;


drop view revenue;


It creates a view and uses this view in 2 places in the query.

The same query using a CTE:

WITH revenue0(supplier_no , total_revenue) AS (

    SELECT l_suppkey, SUM(l_extendedprice * (1 - l_discount))

    FROM lineitem

    WHERE l_shipdate >= '1996-01-01'

      AND l_shipdate < DATE_ADD('1996-01-01', INTERVAL '90' DAY)

    GROUP BY l_suppkey )

SELECT s_suppkey, s_name, s_address, s_phone, total_revenue

FROM supplier, revenue0

WHERE s_suppkey = supplier_no

  AND total_revenue = (SELECT MAX(total_revenue) FROM revenue0)

ORDER BY s_suppkey;

Here is the output from EXPLAIN for the original query:

+----+-------------+------------+-------------------------+--------+--------------------------------+------------+---------+--------------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| id | select_type | table      | partitions              | type   | possible_keys                  | key        | key_len | ref                      | rows   | filtered | Extra                                                                                                                                                                                        |

+----+-------------+------------+-------------------------+--------+--------------------------------+------------+---------+--------------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|  1 | PRIMARY     | <derived3> | NULL                    | ALL    | NULL                           | NULL       | NULL    | NULL                     | 223448 |    10.00 | Using where; Using temporary; Using filesort                                                                                                                                                 |

|  1 | PRIMARY     | supplier   | p0,p1,p2,p3,p4,p5,p6,p7 | eq_ref | PRIMARY                        | PRIMARY    | 4       | dbt3.revenue.supplier_no |      1 |   100.00 | NULL                                                                                                                                                                                         |

|  3 | DERIVED     | lineitem   | p0,p1,p2,p3,p4,p5,p6,p7 | range  | l_shipDATE,l_partkey,l_suppkey | l_shipDATE | 4       | NULL                     | 223448 |   100.00 | Using pushed condition ((`dbt3`.`lineitem`.`l_shipDATE` >= DATE'1996-01-01') and (`dbt3`.`lineitem`.`l_shipDATE` < <cache>(('1996-01-01' + interval '90' day)))); Using MRR; Using temporary |

|  2 | SUBQUERY    | <derived4> | NULL                    | ALL    | NULL                           | NULL       | NULL    | NULL                     | 223448 |   100.00 | NULL                                                                                                                                                                                         |

|  4 | DERIVED     | lineitem   | p0,p1,p2,p3,p4,p5,p6,p7 | range  | l_shipDATE,l_partkey,l_suppkey | l_shipDATE | 4       | NULL                     | 223448 |   100.00 | Using pushed condition ((`dbt3`.`lineitem`.`l_shipDATE` >= DATE'1996-01-01') and (`dbt3`.`lineitem`.`l_shipDATE` < <cache>(('1996-01-01' + interval '90' day)))); Using MRR; Using temporary |

+----+-------------+------------+-------------------------+--------+--------------------------------+------------+---------+--------------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

5 rows in set, 1 warning (0.00 sec)


What is clear in the original query is that we execute the view in both places where

it is used.

Here is the EXPLAIN from the CTE query:

+----+-------------+------------+-------------------------+--------+--------------------------------+------------+---------+----------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| id | select_type | table      | partitions              | type   | possible_keys                  | key        | key_len | ref                  | rows   | filtered | Extra                                                                                                                                                                                        |

+----+-------------+------------+-------------------------+--------+--------------------------------+------------+---------+----------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|  1 | PRIMARY     | <derived2> | NULL                    | ALL    | NULL                           | NULL       | NULL    | NULL                 | 223448 |    10.00 | Using where; Using temporary; Using filesort                                                                                                                                                 |

|  1 | PRIMARY     | supplier   | p0,p1,p2,p3,p4,p5,p6,p7 | eq_ref | PRIMARY                        | PRIMARY    | 4       | revenue0.supplier_no |      1 |   100.00 | NULL                                                                                                                                                                                         |

|  3 | SUBQUERY    | <derived2> | NULL                    | ALL    | NULL                           | NULL       | NULL    | NULL                 | 223448 |   100.00 | NULL                                                                                                                                                                                         |

|  2 | DERIVED     | lineitem   | p0,p1,p2,p3,p4,p5,p6,p7 | range  | l_shipDATE,l_partkey,l_suppkey | l_shipDATE | 4       | NULL                 | 223448 |   100.00 | Using pushed condition ((`dbt3`.`lineitem`.`l_shipDATE` >= DATE'1996-01-01') and (`dbt3`.`lineitem`.`l_shipDATE` < <cache>(('1996-01-01' + interval '90' day)))); Using MRR; Using temporary |

+----+-------------+------------+-------------------------+--------+--------------------------------+------------+---------+----------------------+--------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


Here we only execute the scan on the lineitem table once and reuse it in both places

where it is used.

Thus what we see here is that CTEs as well as windows functions makes it easier

for MySQL to discover repeated patterns.

Since we only need to scan once instead of twice the query execution times is cut in

half.

Using Windows functions in TPC-H with NDB Cluster

 This is probably the first time I post a blog that handles variants of

how to use SQL :)

I got the inspiration from reading some new and old blogs by Öystein Grövlen

where he mentioned various ways to rewrite queries as Common Table

Expressions (CTEs) and using windows functions instead of using subqueries.

I tried this and found that the rewritten queries was faster using windows in some

cases. Obviously the CTE variant and the subquery variant of the query can

be executed in the same way. But SQL implementations are not perfect and

are able to handle some constructs better than others.


This got me a bit puzzled, so I wanted to understand what specifically is the

thing that makes the CTEs and windows functions variant run faster.


Let's take Q17 in TPC-H as an example.

Here is the original query:

select

        sum(l_extendedprice) / 7.0 as avg_yearly

from

        lineitem,

        part

where

        p_partkey = l_partkey

        and p_brand = 'Brand#23'

        and p_container = 'MED BOX'

        and l_quantity < (

                select

                        0.2 * avg(l_quantity)

                from

                        lineitem

                where

                        l_partkey = p_partkey

        );


Here is the output from EXPLAIN.

+----+--------------------+----------+-------------------------+------+-----------------------+-----------+---------+---------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------------------------+

| id | select_type        | table    | partitions              | type | possible_keys         | key       | key_len | ref                 | rows   | filtered | Extra                                                                                                                                    |

+----+--------------------+----------+-------------------------+------+-----------------------+-----------+---------+---------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------------------------+

|  1 | PRIMARY            | part     | p0,p1,p2,p3,p4,p5,p6,p7 | ALL  | PRIMARY               | NULL      | NULL    | NULL                | 200000 |     1.00 | Parent of 2 pushed join@1; Using pushed condition ((`dbt3`.`part`.`p_container` = 'MED BOX') and (`dbt3`.`part`.`p_brand` = 'Brand#23')) |

|  1 | PRIMARY            | lineitem | p0,p1,p2,p3,p4,p5,p6,p7 | ref  | l_partkey,l_partkey_2 | l_partkey | 5       | dbt3.part.p_partkey |     30 |   100.00 | Child of 'part' in pushed join@1; Using where                                                                                            |

|  2 | DEPENDENT SUBQUERY | lineitem | p0,p1,p2,p3,p4,p5,p6,p7 | ref  | l_partkey,l_partkey_2 | l_partkey | 5       | dbt3.part.p_partkey |     30 |   100.00 | NULL                                                                                                                                     |

+----+--------------------+----------+-------------------------+------+-----------------------+-----------+---------+---------------------+--------+----------+------------------------------------------------------------------------------------------------------------------------------------------+



Here is the output from EXPLAIN ANALYZE.

| -> Aggregate: sum(lineitem.l_extendedprice)  (actual time=484.773..484.773 rows=1 loops=1)

    -> Nested loop inner join  (cost=760100.20 rows=6017424) (actual time=14.792..484.552 rows=587 loops=1)

        -> Table scan on part, activating pushed join of 2 tables, with pushed condition: ((part.p_container = 'MED BOX') and (part.p_brand = 'Brand#23'))  (cost=98183.59 rows=200000) (actual time=10.267..10.664 rows=313 loops=1)

        -> Filter: (lineitem.l_quantity < (select #2))  (cost=30.09 rows=30) (actual time=0.625..1.513 rows=2 loops=313)

            -> Index lookup on lineitem using l_partkey (l_partkey=part.p_partkey), child of part in pushed join  (cost=30.09 rows=30) (actual time=0.001..0.018 rows=19 loops=313)

            -> Select #2 (subquery in condition; dependent)

                -> Aggregate: avg(lineitem.l_quantity)  (actual time=0.073..0.073 rows=1 loops=6088)

                    -> Index lookup on lineitem using l_partkey (l_partkey=part.p_partkey)  (cost=33.10 rows=30) (actual time=0.062..0.068 rows=31 loops=6088)



What seem to happen here is that we re-execute the subquery again and again.

When writing this query using a windows function the query looks like this:

WITH win AS (

 SELECT l_extendedprice, l_quantity, AVG(l_quantity)

 OVER (PARTITION BY p_partkey) avg_l_quantity

 FROM lineitem, part

 WHERE p_partkey = l_partkey AND

 p_brand = 'Brand#23' AND

 p_container = 'MED BOX')

SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly

from win

where l_quantity < 0.2 * avg_l_quantity;


What we have done here is that we have pushed the calculation of the average

into the join processing and thus made the subquery possible to run on a

materialised table.

Here is the output of the new query from EXPLAIN.

+----+-------------+------------+-------------------------+------+-----------------------+-----------+---------+---------------------+--------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| id | select_type | table      | partitions              | type | possible_keys         | key       | key_len | ref                 | rows   | filtered | Extra                                                                                                                                                                     |

+----+-------------+------------+-------------------------+------+-----------------------+-----------+---------+---------------------+--------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

|  1 | PRIMARY     | <derived2> | NULL                    | ALL  | NULL                  | NULL      | NULL    | NULL                |  60174 |    33.33 | Using where                                                                                                                                                               |

|  2 | DERIVED     | part       | p0,p1,p2,p3,p4,p5,p6,p7 | ALL  | PRIMARY               | NULL      | NULL    | NULL                | 200000 |     1.00 | Parent of 2 pushed join@1; Using pushed condition ((`dbt3`.`part`.`p_container` = 'MED BOX') and (`dbt3`.`part`.`p_brand` = 'Brand#23')); Using temporary; Using filesort |

|  2 | DERIVED     | lineitem   | p0,p1,p2,p3,p4,p5,p6,p7 | ref  | l_partkey,l_partkey_2 | l_partkey | 5       | dbt3.part.p_partkey |     30 |   100.00 | Child of 'part' in pushed join@1                                                                                                                                          |

+----+-------------+------------+-------------------------+------+-----------------------+-----------+---------+---------------------+--------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Here is the output of the new query from EXPLAIN ANALYZE.

| -> Aggregate: sum(win.l_extendedprice)  (actual time=33.745..33.746 rows=1 loops=1)

    -> Filter: (win.l_quantity < (0.2 * win.avg_l_quantity))  (actual time=32.352..33.695 rows=587 loops=1)

        -> Table scan on win  (cost=6772.08 rows=60174) (actual time=0.000..0.218 rows=6088 loops=1)

            -> Materialize CTE win  (actual time=32.343..32.802 rows=6088 loops=1)

                -> Window aggregate with buffering: avg(lineitem.l_quantity) OVER (PARTITION BY part.p_partkey )   (actual time=16.773..30.318 rows=6088 loops=1)

                    -> Sort: part.p_partkey  (actual time=16.700..17.123 rows=6088 loops=1)

                        -> Stream results  (cost=760100.20 rows=6017424) (actual time=10.584..15.531 rows=6088 loops=1)

                            -> Nested loop inner join  (cost=760100.20 rows=6017424) (actual time=10.578..13.499 rows=6088 loops=1)

                                -> Table scan on part, activating pushed join of 2 tables, with pushed condition: ((part.p_container = 'MED BOX') and (part.p_brand = 'Brand#23'))  (cost=98183.59 rows=200000) (actual time=10.551..11.403 rows=313 loops=1)

                                -> Index lookup on lineitem using l_partkey (l_partkey=part.p_partkey), child of part in pushed join  (cost=30.09 rows=30) (actual time=0.000..0.005 rows=19 loops=313)


What we see there is that the windows function is materialised and executing the

where clause on average quantity is very quick.


The result is impressive the speedup of running this query in
MySQL NDB Cluster is 16x! The long project of integrating NDB Cluster
with MySQL 8.0 bears fruit here.

Monday, October 19, 2020

New things coming in MySQL Cluster 8.0.22

In version 8.0.22 we have added a number of important new features. The first

one is that we introduced the capability to create encrypted backups.

This includes capabilities to create encrypted backups, restore encryped

backups and also encrypt backups that was created without encryption.


The second is that we have added support for IPv6 addresses.  If it is

necessary to mix IPv4 and IPv6 addresses in a cluster it is important

to be careful and follow a number of rules as documented.


Performance of replication of BLOB columns has been significantly

improved and similarly for SELECT statements using BLOB columns.

This has been achieved through decreasing the amount of round-trips used

to perform operations involving BLOB tables. This significantly decreases

the latency of operations involving BLOBs.


Also in 8.0.21 an important new feature was added that made some

complex queries significantly faster. We added the ability to push

antijoins and semijoin execution down to the NDB data nodes.

One example query of this is Q22 in TPC-H. This query executes

3.5x faster due to this change and there is likely queries that will

benefit even more than this query due to this change.

select

        cntrycode,

        count(*) as numcust,

        sum(c_acctbal) as totacctbal

from

        (

                select

                        substr(c_phone from 1 for 2) as cntrycode,

                        c_acctbal

                from

                        customer

                where

                        substr(c_phone from 1 for 2) in

                                ('13', '31', '23', '29', '30', '18', '17')

                        and c_acctbal > (

                                select

                                        avg(c_acctbal)

                                from

                                        customer

                                where

                                        c_acctbal > 0.00

                                        and substr(c_phone from 1 for 2) in

                                                ('13', '31', '23', '29', '30', '18', '17')

                        )

                        and not exists (

                                select

                                        *

                                from

                                        orders

                                where

                                        o_custkey = c_custkey

                        )

        ) as custsale

group by

        cntrycode

order by

        cntrycode;