NeurIPS2022

Fast Distance Oracles for Any Symmetric Norm

Yichuan Deng, Zhao Song, Omri Weinstein, Ruizhe Zhang

被引用 10 次

摘要

In the Distance Oracle problem, the goal is to preprocess nn vectors x1,x2,,xnx_1, x_2, \cdots, x_n in a dd-dimensional metric space (Xd,l)(\mathbb{X}^d, \| \cdot \|_l) into a cheap data structure, so that given a query vector qXdq \in \mathbb{X}^d and a subset S[n]S\subseteq [n] of the input data points, all distances qxil\| q - x_i \|_l for xiSx_i\in S can be quickly approximated (faster than the trivial dS\sim d|S| query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of p\ell_p norms, the problem is well understood, and optimal data structures are known for most values of pp. Our main contribution is a fast (1+ε)(1+\varepsilon) distance oracle for any symmetric norm l\|\cdot\|_l. This class includes p\ell_p norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-kk norms, max-mixture and sum-mixture of p\ell_p norms, small-support norms and the box-norm. We propose a novel data structure with O~(n(d+mmc(l)2))\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) ) preprocessing time and space, and tq=O~(d+Smmc(l)2)t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2) query time, for computing distances to a subset SS of data points, where mmc(l)\mathrm{mmc}(l) is a complexity-measure (concentration modulus) of the symmetric norm. When l=pl = \ell_{p} , this runtime matches the aforementioned state-of-art oracles.