SIGMOD2025
DRPQ: Distributed Evaluation of Regular Path Queries On Streaming Graphs
Siyuan Zhang, Kai Zhang, Zhenying He, Yinan Jing, Zhigang Zhao, X. Sean Wang
Abstract
Persistent Regular Path Query (RPQ) on streaming graphs is widely applicable to many online analysis applications. Existing research primarily focuses on the single-worker scenario, while scaling out to distributed RPQ processing on multiple workers is desirable when facing a high workload. Existing distributed solutions are designed for general streaming queries, and various bottlenecks exist that significantly limit the performance when performing streaming RPQ evaluation. The challenge is how to execute queries with multiple workers while introducing limited overhead and ensuring sufficient speedup as the number of workers increases. This paper introduces a distributed processing strategy called DRPQ by carefully dividing a query into multiple partially matched query tasks. The idea is to form query tasks based on initial matches of the graph against the given regular expression, and to dynamically distribute these tasks to workers to balance their workloads. To reduce redundant evaluation across different workers, a grouping method is proposed to find query tasks that are likely to share evaluation processes, and send them to the same workers. Extensive experiments on two real-world graph datasets demonstrate that DRPQ is significantly more efficient and scalable than existing distributed solutions. Furthermore, the proposed grouping method proves to be particularly effective, nearly doubling the throughput in most cases.