ICLR2026
Probabilistic Kernel Function for Fast Angle Testing
Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa
被引用 1 次
摘要
In this paper, we study the angle testing problem in the context of similarity search in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and adopts a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5×-3× higher query-per-second (QPS) throughput compared to the widely-used graph-based search algorithm HNSW. Our code and data are available at https://github.com/KejingLu-810/KS . INTRODUCTION Vector-based similarity search is a core problem with broad applications in machine learning, data mining, and information retrieval. It involves retrieving data points in a high-dimensional space that are most similar to a given query vector based on a specific similarity measure. This task is central to many downstream applications, including nearest neighbor classification, recommendation systems, clustering, anomaly detection, and retrieval-augmented generation (RAG). However, the high dimensionality of modern datasets makes efficient similarity search particularly challenging, highlighting the need for fast and scalable vector computation techniques. Among the various similarity measures for high-dimensional vectors, the ℓ 2 norm, cosine similarity, and inner product are the most commonly used in practice. As discussed in Yan et al. ( 2018 ); Dai et al. (2020); Lu et al. ( 2024 ), it is often possible to pre-compute and store the norms of vectors in advance, allowing these measures to be reduced to the computation of the cosine of the angle between two normalized vectors, thereby highlighting the central role of angle computation. On the other hand, in many real-world scenarios, we are not concerned with the exact values of the angles but rather with the outcome-which one is greater-of an angle comparison, which is referred to as angle testing: Given a query vector q and data vectors v 1 , v 2 , v on the sphere S d-1 , typical operations include comparing ⟨q, v 1 ⟩ and ⟨q, v 2 ⟩, or determining whether ⟨q, v⟩ exceeds a certain threshold. These operations, however, require computing exact cosines of angles, which has a cost of O(d) per comparison and becomes expensive in high dimensions. To address this, we aim to design a computationally efficient probabilistic kernel function K that can approximate these comparisons with reduced cost and high success probability. More precisely, we focus on the following two problems: Problem 1.1. (Probabilistic kernel function for comparison) Design a probabilistic kernel function K: S d-1 × S d-1 → RV with computational cost o(d), where RV denotes the set of random variables, such that, for any data vectors v 1 , v 2 ∈ S d-1 and query q ∈ S d-1 satisfying ⟨q, v 1 ⟩ > ⟨q, v 2 ⟩, we have Pr[K(q, v 1 ) > K(q, v 2 )] > 1 -ϵ, where ϵ ≤ 0.5. Problem 1.2. (Probabilistic kernel function for thresholding) Given a fixed angle threshold θ ∈ (0, π), design a probabilistic kernel function K: S d-1 × S d-1 → RV with computational cost o(d) such that