One of the important optimisations we did in MySQL Cluster 7.4 was to
use more of direct function calls as part of SCAN processing instead of
relying on asynchronous messages.
We want each message execution to be short, up to around 10
microseconds of execution per message we should strive to
keep it below. However if we decrease the time spent executing
a message to down to a microsecond or smaller than we spend too
much time in the message scheduler. So it is important to strike a
balance here.
For primary key lookups we do most of the work in one message
execution, so here we already were doing the optimal behaviour
in our model of execution.
For scans however we often scan hundreds or even thousands and
in some cases even millions of rows, so here there is a lot of
opportunity to decide how to execute the scan.
Previously the scan model used a number of messages per row
scanned. This was quite obviously a bit too conservative. We have
now made optimisations such that we can scan up to 5 rows within
the execution of one message execution.
When converting an asynchronous message into a function call
there are more than one way to achieve this. There are ways that
are extremely fast, but these leave no extra opportunity of
tracing. So for each such call we have to make a decision.
The excerpt below is taken from the code in MySQL Cluster 7.4
and explains the different ways to convert asynchronous messages
into function calls.
When calling another block immediately there is a set of ways to
do this.
Standard, non-optimised manner:
Use EXECUTE_DIRECT with four parameters that specify block reference
of receiver, the signal object, the global signal number and the
length of the signal (There is also a variant used when sending such
a signal to a different instance). This method is fairly optimised
but has quite a lot of potential for performance improvement.
Standard, optimised manner:
In this case we optimise things by translating to block object and
retrieving the function pointer in the block call. This gives
the compiler assistance to separate the loads and stores more from
each other.
The call will be seen as:
block->EXECUTE_DIRECT(signal, f);
This manner optimises the code but retains the flexibility and also
the possibility to trace signal execution between blocks.
Non-standard, optimised manner:
In this case we remove some of the flexibility of the call to enhance
the performance yet a bit more. In this case we remove the possibility
for a flexible receiver of the signal, it is directed to a certain
block and we also call the method directly without any indirection.
The call will be seen as e.g.:
c_tup->execTUPKEYREQ(signal);
The standard manner of calling EXECUTE_DIRECT are both always calling
functions with one parameter being the signal object and no return
value. There is however two ways of sending signals back. In some
cases a signal is always sent back when returning from the
EXECUTE_DIRECT call, in this case the signal object returned contains
a signal object with data for the return signal. In some cases
one gets a return signal in this manner by a combination of the signal
number and the parameters. In the example below with NEXT_SCANREQ
we always gets a return signal when specifying ZSCAN_COMMIT
but not in other cases.
The other manner of sending a return signal is to perform a new
EXECUTE_DIRECT signal. In those cases one needs to ensure that the
call chain is bounded to not run out of stack. It is also a good
idea to try and ensure that the EXECUTE_DIRECT can use the
tail-call optimisation to avoid using too much stack which is bad
for CPU caching. This means returning immediately after
EXECUTE_DIRECT but also avoiding having objects that needs
destruction after return and also avoiding taking the reference of
stack variables.
Using the non-standard manner one can obviously also change more
ways, one can return a bool for example as in the example with
execTUPKEYREQ, one can add parameters and one can even change the
name to a different name not according to the standard naming
conventions. Obviously doing this removes flexibility of using
blocks in a flexible manner.
use more of direct function calls as part of SCAN processing instead of
relying on asynchronous messages.
We want each message execution to be short, up to around 10
microseconds of execution per message we should strive to
keep it below. However if we decrease the time spent executing
a message to down to a microsecond or smaller than we spend too
much time in the message scheduler. So it is important to strike a
balance here.
For primary key lookups we do most of the work in one message
execution, so here we already were doing the optimal behaviour
in our model of execution.
For scans however we often scan hundreds or even thousands and
in some cases even millions of rows, so here there is a lot of
opportunity to decide how to execute the scan.
Previously the scan model used a number of messages per row
scanned. This was quite obviously a bit too conservative. We have
now made optimisations such that we can scan up to 5 rows within
the execution of one message execution.
When converting an asynchronous message into a function call
there are more than one way to achieve this. There are ways that
are extremely fast, but these leave no extra opportunity of
tracing. So for each such call we have to make a decision.
The excerpt below is taken from the code in MySQL Cluster 7.4
and explains the different ways to convert asynchronous messages
into function calls.
When calling another block immediately there is a set of ways to
do this.
Standard, non-optimised manner:
Use EXECUTE_DIRECT with four parameters that specify block reference
of receiver, the signal object, the global signal number and the
length of the signal (There is also a variant used when sending such
a signal to a different instance). This method is fairly optimised
but has quite a lot of potential for performance improvement.
Standard, optimised manner:
In this case we optimise things by translating to block object and
retrieving the function pointer in the block call. This gives
the compiler assistance to separate the loads and stores more from
each other.
The call will be seen as:
block->EXECUTE_DIRECT(signal, f);
This manner optimises the code but retains the flexibility and also
the possibility to trace signal execution between blocks.
Non-standard, optimised manner:
In this case we remove some of the flexibility of the call to enhance
the performance yet a bit more. In this case we remove the possibility
for a flexible receiver of the signal, it is directed to a certain
block and we also call the method directly without any indirection.
The call will be seen as e.g.:
c_tup->execTUPKEYREQ(signal);
The standard manner of calling EXECUTE_DIRECT are both always calling
functions with one parameter being the signal object and no return
value. There is however two ways of sending signals back. In some
cases a signal is always sent back when returning from the
EXECUTE_DIRECT call, in this case the signal object returned contains
a signal object with data for the return signal. In some cases
one gets a return signal in this manner by a combination of the signal
number and the parameters. In the example below with NEXT_SCANREQ
we always gets a return signal when specifying ZSCAN_COMMIT
but not in other cases.
The other manner of sending a return signal is to perform a new
EXECUTE_DIRECT signal. In those cases one needs to ensure that the
call chain is bounded to not run out of stack. It is also a good
idea to try and ensure that the EXECUTE_DIRECT can use the
tail-call optimisation to avoid using too much stack which is bad
for CPU caching. This means returning immediately after
EXECUTE_DIRECT but also avoiding having objects that needs
destruction after return and also avoiding taking the reference of
stack variables.
Using the non-standard manner one can obviously also change more
ways, one can return a bool for example as in the example with
execTUPKEYREQ, one can add parameters and one can even change the
name to a different name not according to the standard naming
conventions. Obviously doing this removes flexibility of using
blocks in a flexible manner.
No comments:
Post a Comment