NeurIPS2024

Euclidean distance compression via deep random features

Brett Leroux, Luis Rademacher

Abstract

Motivated by the problem of compressing point sets into as few bits as possible while maintaining information about approximate distances between points, we construct random nonlinear maps φ\varphi_\ell that compress point sets in the following way. For a point set SS, the map φ:RdN1/2{1,1}N\varphi_\ell:\mathbb{R}^d \to N^{-1/2}\{-1,1\}^N has the property that storing φ(S)\varphi_\ell(S) (a sketch of SS) allows one to report pairwise squared distances between points in SS up to some multiplicative (1±ϵ)(1\pm \epsilon) error with high probability as long as the minimum distance is not too small compared to ϵ\epsilon. The maps φ\varphi_\ell are the \ell-fold composition of a certain type of random feature mapping. Moreover, we determine how large NN needs to be as a function of ϵ\epsilon and other parameters of the point set. Compared to existing techniques, our maps offer several advantages. The standard method for compressing point sets by random mappings relies on the Johnson-Lindenstrauss lemma which implies that if a set of nn points is mapped by a Gaussian random matrix to Rk\mathbb{R}^k with k=Θ(ϵ2logn)k =\Theta(\epsilon^{-2}\log n), then pairwise distances between points are preserved up to a multiplicative (1±ϵ)(1\pm \epsilon) error with high probability. The main advantage of our maps φ\varphi_\ell over random linear maps is that ours map point sets directly into the discrete cube N1/2{1,1}NN^{-1/2}\{-1,1\}^N and so there is no additional step needed to convert the sketch to bits. For some range of parameters, our maps φ\varphi_\ell produce sketches which require fewer bits of storage space.