We've recently developed a system that has very high throughput and low latency requirements: tens of thousands of transactions per second per commodity server instance each with a latency in the order of low tens of milliseconds.
In order to meet these tough requirements we built our system based on a staged event-driven architecture (SEDA). I was primarily responsible for taking the architecture concepts and implementing the design.
We found that SEDA was ideal for this approach.
To achieve the requirements we had to consider all places that could affect our performance and scalability. Essentially we needed to identify and isolate any operation that would block so that the rest of the system could continue to operate at full speed, and to make the blocking operations as efficient as possible. Blocking typically happens in the following scenarios:
- Network IO
- Disk IO
- Resource contention and consistency: for example, internal locking on data structures
- Synchronous calls
We found a SEDA approach perfect for isolating network and disk IO. We decomposed our business process into stages which needed network IO, disk IO or were purely computational or in-memory data manipulation.
This decomposition of the business process into IO and computational stages allowed the computational ones to run at full speed without blocking as they did not need to perform IO.
IO operations were placed in batch stages and therefore more efficient batched IO operations could be used to improve throughput. For example sending one large message across the network is much more efficient than many small ones, and writing a large number of bytes to a disk in one go is more efficient than lots of small writes.
Naturally, batching affects latency. Batching waits and collects operations until there are enough to efficiently process. This is where tuning come into play, trading off batch size against batch latency to achieve the goals.
The performance feedback and tuning of the SEDA stages means that the system should be able to reach an equilibrium. In our implementation we were less concerned with that aspect of SEDA and more concerned with not damaging latency. From our load and performance testing we knew that we could handle the throughput requirements and scale horizontally where necessary and therefore we didn't implement a complex dynamic tuning scheme.
Our approach was such that we sized our batches so that they were the most efficient size for highest throughput to the IO device and set our batch latency to be the lowest such that we could still achieve our latency requirements even when the load was reduced and we were not processing full batches each time.
Since we did not implement a dynamic tuning scheme, the primary issue that we needed to cope with was ensuring that we didn't accepted more requests per second than our lowest throughput stage could handle as this obviously leads to the intra-stage queues building up and latency taking a hit. We tuned, through performance testing, the number of threads per stage so that throughput was as even as possible across all stages. Ideally, this would have been handle by SEDA dynamic tuning.
The second main point that helped us achieve our goals was the asynchronous nature of SEDA. This architecture meant that we built in asynchrony from the ground up and in the main engine of our system nothing would be waiting on synchronous calls. At the periphery of the system in the web tier we found that we could handle asynchrony either through a Comet ‘push' approach or through the use of continuations to allow the web server to ‘park' requests while they are blocked waiting for a synchronous response.
The one area that pure SEDA didn't help in was in resource contention. In order to achieve low latency we actually kept the master of all of our working data in memory instead of in a database. Our persistent disk based storage was only used as a backup, so that in the event of failure we could restore the system. Hence we had no read waits while we loaded data and by designing our persistence mechanism appropriately we didn't have any write contention and could make it as quick as possible.
Alongside this we did have to contend with internal data locking for access to shared data since every stage in SEDA runs on different threads. In order to eliminate this problem we developed a thread partitioning strategy, but that's a topic for another time. This thread partitioning strategy though is the main reason we didn't implement SEDA dynamic tuning since we could not easily adjust the number of threads per stage without impacting how we handled uncontended access to in-memory data.
The downside of using SEDA is that a process implemented in SEDA is more complex than one implemented in a more traditional approach where all processing for a request is performed sequentially on a dedicated thread. The added complexity is in terms of decomposing the process and implementing it in stages that are called asynchronously, which is inherently hard to test. Additionally, there are many excellent web frameworks that provide support for a web applications that give a great head start for developers, there are not similar frameworks for SEDA which of course means that a team adopting SEDA has more infrastructure work required to get up and running. Once (if) there are more mature SEDA frameworks then this should alleviate this issue.
We found though that the performance and scalability gains by far outweighed the disadvantages and the time and effort to develop our own framework - since we could actually meet our requirements whereas a more traditional approach would never have been capable of our required throughput.
Would I use a SEDA approach where the requirements aren't as severe as on this project? Well, as ever, it depends on the non-functional requirements. The qualified answer though is yes.
In a traditional application without any strict throughput or latency requirements or where horizontal scalability can handle the load then I would use a traditional approach because of the advantages provided by the web frameworks available in terms of simplicity and time to market. Where efficiency and throughput are issues, then I wouldn't hesitate to use SEDA again.