NeurIPS2023

Improved Frequency Estimation Algorithms with and without Predictions

Anders Aamand, Justin Y. Chen, Huy Lê Nguyen, Sandeep Silwal, Ali Vakilian

13 citations

Abstract

Estimating frequencies of elements appearing in a data stream is a key task in largescale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically bound the error of the estimated frequencies for any possible input. The work of Hsu et al. ( 2019 ) introduced the idea of using machine learning to tailor sketching algorithms to the specific data distribution they are being run on. In particular, their learning-augmented frequency estimation algorithm uses a learned heavy-hitter oracle which predicts which elements will appear many times in the stream. We give a novel algorithm, which in some parameter regimes, already theoretically outperforms the learning based algorithm of Hsu et al. without the use of any predictions. Augmenting our algorithm with heavy-hitter predictions further reduces the error and improves upon the state of the art. Empirically, our algorithms achieve superior performance in all experiments compared to prior approaches. On the other hand, in the low space regime of B = poly(log n), our algorithm, without predictions, already archives close to a logarithmic factor improvement over even learned CS. Furthermore, our learning-augmented algorithm achieves a logarithmic factor improvement over classical CS across all space regimes, whereas the learned CS only achieves a logarithmic factor improvement in the regime B = n 1-o(1) . Furthermore, our learned version outperforms or matches learned CS in all space regimes.