ICML2025
Sort Before You Prune: Improved Worst-Case Guarantees of the DiskANN Family of Graphs
Siddharth Gollapudi, Ravishankar Krishnaswamy, Kirankumar Shiragur, Harsh Wardhan
Abstract
Graph-based data structures have become powerful and ubiquitous tools for scalable approximate nearest-neighbor (ANN) search over the past decade. In spite of their apparent practical performance, there has only recently been progress on the worst-case performance of these data structures. Indeed, the influential work of Indyk & Xu introduced the key concept of α-reachable graphs, showing that graphs constructed by the DiskANN algorithm (Subramanya et al., 2019) produce an α+1 α-1 -approximate solution with a simple bestfirst search that runs in poly-logarithmic query time. In our work, we improve and generalize this analysis as follows: • We introduce sorted α-reachable graphs, and use this notion to obtain a stronger approximation factor of α α-1 for the DiskANN algorithm on Euclidean metrics. • We present the first worst-case theoretical analysis for the popular beam-search algorithm, which is used in practice to search these graphs for k > 1 candidate nearest neighbors. We also present empirical results validating the significance of sorted α-reachable graphs, which aligns with our theoretical findings.