A while ago C++ standard added a new feature to C++ 20 called coroutines. I thought it was an interesting thing to try out for RonDB and used some time this summer to read more about it. My findings was that C++ coroutines can only be used for special tasks that require no stacks. The problem is that a coroutine cannot save the stack.
My hope was to find that I could have a single thread that could work using fibers where the fiber belongs to one thread and the thread could switch between different fibers. The main goal of fibers would be to improve throughput by doing work instead of blocking the CPU on cache misses. However fibers can also be a tool to allow a scalable server where the process runs in a single thread when the load is low, and scale up to hundreds of threads (if the process has access to that many CPUs) when required. This will provide better power efficiency, better latency at low loads. Also if we ever get VMs that can dynamically scale up and down the number of CPUs we can use fibers to scale up and down the number of threads in this case.
My read up on C++ 20 coroutines was that it could not deliver on this.
However my read up found an intriguingly simple and elegant solution to the problem. See this blog for a description and here is the GitHub tree with the code. So a small header file of around 300 lines solves the problem elegantly for x86_64 both on Macs and on Linux and similarly for ARM64. Thus all the platforms RonDB supports. The header file can also be used on Windows (Windows supports fibers).
I developed a small test program to see the code in action:
#include <iostream>
#include "tiny_fiber.h"
/**
* A very simple test program checking how fibers and threads interact.
* The program will printout the following:
hello from fibermain
hello from main
hello from fibermain 2
hello from main 2
*/
tiny_fiber::FiberHandle thread_fiber;
tiny_fiber::FiberHandle fiber;
void fibermain(void* arg) {
tiny_fiber::FiberHandle fiber =
*reinterpret_cast<tiny_fiber::FiberHandle*>(arg);
std::cout<<"hello from fibermain"<<std::endl;
tiny_fiber::SwitchFiber(fiber, thread_fiber);
std::cout<<"hello from fibermain 2"<<std::endl;
tiny_fiber::SwitchFiber(fiber, thread_fiber);
}
int main(int argc, char** argv) {
const int stack_size = 1024 * 16;
thread_fiber = tiny_fiber::CreateFiberFromThread();
fiber = tiny_fiber::CreateFiber(stack_size, fibermain, &fiber);
tiny_fiber::SwitchFiber(thread_fiber, fiber);
std::cout<<"hello from main"<<std::endl;
tiny_fiber::SwitchFiber(thread_fiber, fiber);
std::cout<<"hello from main 2"<<std::endl;
return 0;
}
Equipped with this I have the tools I need to develop an experiment and see how fibers works with RonDB. Good news is that I need no learn any complex C++ syntax to do this. It is all low level system programming. I have learnt through long experience that it is not a certain success if you have a theory. A computer is sufficiently complex to not understand the impact of changes that one does. So I am excited to see how this particular new idea works out.
The concept of fibers fits very nicely into the RonDB runtime scheduler and the division of work between threads. It even provides the ability for a thread to be turned into a fiber and moved to another OS thread and it can be returned to its original thread again as well.
2 comments:
Yes, stackful coroutines can be enormously useful in certain application. But they are not generally as easily available that they should be, especially considering how simple the implementation can be as shown by the tiny_fiber implementation.
The boost::context is also a good option for stackful coroutines, which I recently learned about. It may be useful where a more "mainstream" library is desired that supports more platforms, but the price is that is is nowhere as small and simple as tiny_fiber (the generated code should be similar).
The MariaDB non-blocking client library in libmariadb uses its own assember implementation of stackful coroutines similar to tiny_fiber, and uses boost::context as a fallback on remaining platforms.
My personal goal of Coroutines was to optimise performance by avoiding waiting for CPU cache misses, but a colleague of mine found also a good use case where he used it to write simple code for handling web streaming.
Post a Comment