SIGMOD2025
Accelerating Stream Processing Engines via Hardware Offloading
Zhengyan Guo, Mingxing Zhang, Yingdi Shan, Kang Chen, Jinlei Jiang, Yongwei Wu
Abstract
Modern stream processing engines (SPEs) must handle massive real-time data streams under strict latency and throughput requirements. However, conventional SPEs are constrained by their software parallelization strategies (e.g., queue-based data re-partitioning, high synchronization overheads, etc.), which prevent efficient utilization of modern hardware capabilities, ultimately limiting performance scalability. In this paper, we present FlexStream, a novel SPE that leverages hardware offloading to redesign the parallelization strategies and overcome these limitations. By offloading data re-partitioning to hardware and integrating a coupled network-executor model, FlexStream maximizes resource utilization, achieving up to 95% network bandwidth saturation. To address the load imbalance challenges introduced by this design, we implement a lock-free state backend with efficient state migration mechanisms. Overall, FlexStream achieves throughput improvements of 1.95 × - 3.35 × compared to state-of-the-art SPEs (e.g., LightSaber) across six real-world streaming analytics applications. FlexStream cuts latency spikes by 71.9% and migration time by 66.8% during state migration, highlighting the benefits of hardware-software co-design in SPEs. Our work underscores the potential of hardware-software co-design in SPEs, offering a scalable, elastic solution for real-time analytics.