VLDB2026
SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns
Cong Yu, Tuo Shi, Matthias Weidlich, Bo Zhao
Abstract
The detection of sequential patterns in data is a basic functionality of modern data processing systems for complex event processing (CEP), OLAP, and retrieval-augmented generation (RAG). To improve the results quality for downstream applications, pattern matching engines employ multiple shared patterns that collaboratively deliver more insights. In practice, pattern matching is challenging, since common applications rely on a large set of patterns that shall be evaluated with tight latency bounds. At the same time, matching needs to maintain state, i.e., intermediate results, that grow exponentially in the input size. Hence, systems turn to best-effort processing, striving for maximal recall under a latency bound. Existing techniques, however, consider patterns in isolation, neglecting the optimization potential induced by state sharing and corresponding interactions and interference across shared patterns. We describe Sharp, a state management library that employs state reduction for efficient best-effort pattern matching in shared patterns. To this end, Sharp incorporates state sharing between patterns through a new abstraction, coined pattern-sharing degree (PSD). At runtime, PSD facilitates the categorization and indexing of partial pattern matches. Once a latency bound is exceeded, Sharp realizes best-effort processing by using a cost model to select a subset of partial matches for further processing in constant time. In experiments with real-world data, Sharp achieves a recall of 95%, 93% and 72% for pattern matching in CEP, OLAP, and RAG applications, under a bound of 50% of the average processing latency.