NeurIPS2020

On Adaptive Distance Estimation

Yeshwanth Cherapanamjeri, Jelani Nelson

32 citations

Abstract

We provide a static data structure for distance estimation which supports adaptive queries. Concretely, given a dataset X={xi}i=1nX = \{x_i\}_{i = 1}^n of nn points in Rd\mathbb{R}^d and 0<p20 < p \leq 2, we construct a randomized data structure with low memory consumption and query time which, when later given any query point qRdq \in \mathbb{R}^d, outputs a (1+ϵ)(1+\epsilon)-approximation of qxip\lVert q - x_i \rVert_p with high probability for all i[n]i\in[n]. The main novelty is our data structure's correctness guarantee holds even when the sequence of queries can be chosen adaptively: an adversary is allowed to choose the jjth query point qjq_j in a way that depends on the answers reported by the data structure for q1,,qj1q_1,\ldots,q_{j-1}. Previous randomized Monte Carlo methods do not provide error guarantees in the setting of adaptively chosen queries. Our memory consumption is O~((n+d)d/ϵ2)\tilde O((n+d)d/\epsilon^2), slightly more than the O(nd)O(nd) required to store XX in memory explicitly, but with the benefit that our time to answer queries is only O~(ϵ2(n+d))\tilde O(\epsilon^{-2}(n + d)), much faster than the naive Θ(nd)\Theta(nd) time obtained from a linear scan in the case of nn and dd very large. Here O~\tilde O hides log(nd/ϵ)\log(nd/\epsilon) factors. We discuss applications to nearest neighbor search and nonparametric estimation. Our method is simple and likely to be applicable to other domains: we describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to p\ell_p norm estimation with negligible overhead in query time and a factor dd overhead in memory.