ICML2023

A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree

Yury Elkin, Vitaliy Kurlin

18 citations

Abstract

Given a reference set RR of nn points and a query set QQ of mm points in a metric space, this paper studies an important problem of finding kk-nearest neighbors of every point qQq \in Q in the set RR in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree on RR and attempted to prove that this tree can be built in O(nlogn)O(n\log n) time while the nearest neighbor search can be done in O(nlogm)O(n\log m) time with a hidden dimensionality factor. This paper fills a substantial gap in the past proofs of time complexity by defining a simpler compressed cover tree on the reference set RR. The first new algorithm constructs a compressed cover tree in O(nlogn)O(n \log n) time. The second new algorithm finds all kk-nearest neighbors of all points from QQ using a compressed cover tree in time O(m(k+logn)logk)O(m(k+\log n)\log k) with a hidden dimensionality factor depending on point distributions of the given sets R,QR,Q but not on their sizes.