SIGMOD2025
Efficient Approximate Nearest Neighbor Search via Hemi-Sphere Centroids Graph
Runwen Qiu, Jing Tang
被引用 5 次
摘要
Nearest-neighbor search is a fundamental task in various applications, including retrieval-augmented generation, recommendation systems, and image classification. To cope with large datasets, Approximate Nearest Neighbor Search (ANNS) is widely used to save computational cost while maintaining high accuracy. Existing ANNS algorithms mainly focus on the Euclidean distance. However, in practice, cosine similarity is commonly adopted in downstream tasks. In this paper, we study the Monotonic Relative Neighbor Graph (MRNG), a state-of-the-art graph-based ANNS structure that shows strong performance under Euclidean distance. We analyze MRNG under cosine similarity and prove two key properties: (1) greedy search on the graph always moves closer to the query until the exact nearest neighbor is found, and (2) the graph's maximum out-degree is bounded by a constant independent of the dataset size. These properties lead to fast search and compact index size. However, constructing an exact MRNG is computationally expensive on large datasets. Moreover, existing approximate construction methods tailored for Euclidean distance, e.g., Euclidean centroid or KD-Trees, are not suitable for cosine similarity. To address these issues, we propose an approximate version of MRNG, named Hemi-Sphere Centroids Graph (HSCG), which uses the hemi-sphere centroids as the entry points and employs locality-sensitive hashing to initialize the graph efficiently. Extensive experiments on eight datasets demonstrate the superiority of HSCG in terms of both search performance and index size compared to existing representative algorithms under cosine similarity.