This blog describes background and implementation of a poll
set to monitor many sockets in one or several receive
threads. The blog is intended as a description but also to
enable some feedback on the design.
I've been spending some time working out all the gory details
of starting and stopping lots of send threads, connection
threads and receive threads over the last few months.
The receive threads will be monitoring a number of socket
connections and as soon as data is available ensure that
the data is received and forwarded to the proper user
thread for execution.
However listening on many sockets is a problem which needs
a scalable solution. Almost every operating system on the
planet has some solution to this problem. The problem is
that they all have different solutions. So I decided to
make a simple interface to those socket monitoring solutions.
First my requirements. I will handle a great number of socket
connections from each thread, I will have the ability to
move socket connections from receive thread to another thread
to dynamically adapt to the usage scenarios. However mostly
the receive thread will wait for events on a set of socket
connections, handle them as they arrive and go back waiting
for more events. On the operating system side I aim at
supporting Linux, OpenSolaris, Mac OS X, FreeBSD and Windows.
One receive thread might be required to listen to socket
connections from many clusters. So there is no real limit
to the number of sockets a receive thread can handle. I
decided however to put a compile time limit in there since
e.g. epoll requires this at create time. This is currently
set to 1024. So if more sockets are needed another receive
thread is needed even if not needed from a performance
point of view.
The implementation aims to cover 5 different implementations.
epoll interface is a Linux-only interface which uses
epoll_create to create an epoll file descriptor, then
epoll_ctl is used to add/drop file descriptors to the epoll
set. Finally epoll_wait is used to wait on the events to
arrive. Socket connections remain in the epoll set as
long as they are not explicitly removed or closed.
Poll is the standard which is there simply to make sure it
works also on older platforms that have none of the other
mechanisms supported. Here there is only one system call,
the poll-call and all the state of the poll set needs to
be taken care of by this implementation.
kqueue achieves more or less the same thing as epoll, it does
so however with a more complex interface that can support a
lot more things such as polling for completed processes and
i-nodes and so forth. It has a kqueue-method to create the
kqueue file descriptor and then a kevent call which is used
both to add, drop and listen to events on the kqueue socket.
kqueue exists in BSD OS:s such as FreeBSD and Mac OS X.
eventports is again a very similar implementation to epoll which
has the calls port_create to create the eventport file descriptor.
It has a port_associate call to add a socket to the eventport set.
It has a port_dissociate call to drop a socket from the set.
It has a port_getn call to wait on the events arriving. There is
however a major difference in that after an event arriving in a
port_getn call the socket is removed from the set and has to be
added back. From an implementation point of view this mainly
complicated my design of error handling.
Windows IO Completion
I have only skimmed this yet and it differs mainly in being a tad
complex and also in that events from the set can be distributed to
more than one thread. However this feature will not be used in this
design, also I have currently not implemented this yet, I need to
get all the bits together on building on Windows done first.
The implementation is done in C but I still wanted to have
a clear object-oriented interface. To achieve this I
created two header files ic_poll_set.h which declares all
the public parts and ic_poll_set_int.h which defines the
private and the public data structures used. This means that
the internals of the IC_POLL_SET-object is hidden from the
user of this interface.
Here is the public part of the interface (the code is GPL:ed
but it isn't released yet):
Copyright (C) 2008 iClaustron AB, All rights reserved
typedef struct ic_poll_connection IC_POLL_CONNECTION;
typedef struct ic_poll_set IC_POLL_SET;
The poll set implementation isn't multi-thread safe. It's intended to be
used within one thread, the intention is that one can have several
poll sets, but only one per thread. Thus no mutexes are needed to
protect the poll set.
ic_poll_set_add_connection is used to add a socket connection to the
poll set, it requires only the file descriptor and a user object of
any kind. The poll set implementation will ensure that this file
descriptor is checked together with the other file descriptors in
the poll set independent of the implementation in the underlying OS.
ic_poll_set_remove_connection is used to remove the file descriptor
from the poll set.
ic_check_poll_set is the method that goes to check which socket
connections are ready to receive.
ic_get_next_connection is used in a loop where it is called until it
returns NULL after a ic_check_poll_set call, the output from
ic_get_next_connection is prepared already at the time of the
ic_check_poll_set call. ic_get_next_connection will return a
IC_POLL_CONNECTION object. It is possible that ic_check_poll_set
can return without error whereas the IC_POLL_CONNECTION can still
have an error in the ret_code in the object. So it is important to
both check this return code as well as the return code from the
call to ic_check_poll_set (this is due to the implementation using
eventports on Solaris).
ic_free_poll_set is used to free the poll set, it will also if the
implementation so requires close any file descriptor of the poll
ic_is_poll_set_full can be used to check if there is room for more
socket connections in the poll set. The poll set has a limited size
(currently set to 1024) set by a compile time parameter.
int (*ic_poll_set_add_connection) (IC_POLL_SET *poll_set,
int (*ic_poll_set_remove_connection) (IC_POLL_SET *poll_set,
int (*ic_check_poll_set) (IC_POLL_SET *poll_set,
(*ic_get_next_connection) (IC_POLL_SET *poll_set);
void (*ic_free_poll_set) (IC_POLL_SET *poll_set);
gboolean (*ic_is_poll_set_full) (IC_POLL_SET *poll_set);
typedef struct ic_poll_operations IC_POLL_OPERATIONS;
/* Creates a new poll set */