NeurIPS2022

Fast Distance Oracles for Any Symmetric Norm

Yichuan Deng, Zhao Song, Omri Weinstein, Ruizhe Zhang

10 citations

Abstract

In the Distance Oracle problem, the goal is to preprocess nn vectors x1,x2,,xnx_1, x_2, \cdots, x_n in a dd-dimensional metric space (Xd,l)(\mathbb{X}^d, \| \cdot \|_l) into a cheap data structure, so that given a query vector qXdq \in \mathbb{X}^d and a subset S[n]S\subseteq [n] of the input data points, all distances qxil\| q - x_i \|_l for xiSx_i\in S can be quickly approximated (faster than the trivial dS\sim d|S| query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of p\ell_p norms, the problem is well understood, and optimal data structures are known for most values of pp. Our main contribution is a fast (1+ε)(1+\varepsilon) distance oracle for any symmetric norm l\|\cdot\|_l. This class includes p\ell_p norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-kk norms, max-mixture and sum-mixture of p\ell_p norms, small-support norms and the box-norm. We propose a novel data structure with O~(n(d+mmc(l)2))\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) ) preprocessing time and space, and tq=O~(d+Smmc(l)2)t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2) query time, for computing distances to a subset SS of data points, where mmc(l)\mathrm{mmc}(l) is a complexity-measure (concentration modulus) of the symmetric norm. When l=pl = \ell_{p} , this runtime matches the aforementioned state-of-art oracles.