SIGMOD2025

FAAQP: Fast and Accurate Approximate Query Processing based on Bitmap-augmented Sum-Product Network

Hanbing Zhang, Yinan Jing, Zhenying He, Kai Zhang, X. Sean Wang

摘要

For interactive data exploration, approximate query processing (AQP) is a useful approach that provides a timely response by trading query accuracy. To reduce query latency, existing AQP methods use samples or models rather than the underlying data to answer queries. However, it is difficult to achieve satisfactory results in terms of query accuracy and query latency simultaneously with these methods. For the sample-based methods, this is because the more accurate the query results are, the more samples are needed and the more time cost is required for processing. The model-based methods have lower query latency, but they cannot return the approximate results with high accuracy because the existing models cannot capture the complex data distribution accurately. In this paper, we propose a fast and accurate AQP method FAAQP . In FAAQP, we propose a novel unsupervised model bitmap-augmented sum-product network (BSPN) that combines the advantages of the sum-product network with bitmaps to capture the characteristics of data distribution more accurately. Then, we propose a budget-aware BSPN construction method that builds BSPN models with the maximum query accuracy for the given storage budget. Furthermore, to reduce the query latency of FAAQP, we propose a bitmap merging strategy that makes a trade-off between query accuracy and query latency. Experimental results on real-world and synthetic datasets show that FAAQP outperforms the state-of-the-art AQP methods and achieves 1.3×-9.0× improvements in query accuracy with a low query latency.