SIGMOD2025
Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search
Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, Raymond Chi-Wing Wong
被引用 16 次
摘要
Approximate nearest neighbor (ANN) query in high-dimensional Euclidean space is a key operator in database systems. For this query, quantization is a popular family of methods developed for compressing vectors and reducing memory consumption. Among these methods, a recent algorithm called RaBitQ achieves the state-of-the-art performance and provides an asymptotically optimal theoretical error bound. RaBitQ uses 1 bit per dimension for quantization and compresses vectors with a large compression rate. In this paper, we extend RaBitQ to compress vectors with flexible compression rates - it achieves this by using B bits per dimension for quantization with B = 1, 2, ... It inherits the theoretical guarantees of RaBitQ and achieves the asymptotic optimality in terms of the trade-off between space and error bounds as to be proven in this study. Additionally, we present efficient implementations of the extended RaBitQ, enabling its application to ANN queries to reduce both space and time consumption. Extensive experiments on real-world datasets confirm that our method consistently outperforms the state-of-the-art baselines in both accuracy and efficiency when using the same amount of memory.