KDD2025
The Adaptive Use of Count-Min Sketch: What is Safe and What is Not?
Dragos-Florian Ristache, Krzysztof Onak
Abstract
Small-space frequency estimators play a crucial role in a multitude of settings related to both data science and machine learning in the context of big data processing. Many frequency estimators use internal randomness to compress the information about the frequencies of items to a small sketch that can be used to provide estimates. Historically, these types of estimators were designed without considering the danger of invalid estimates caused by their adaptive use. However, this kind of scenario naturally occurs when they are used as a subroutine in a more complicated algorithm or when an adversary maliciously attempts to corrupt estimates. The reason why the classic way of analyzing these types of algorithms does not provide satisfying guarantees is that it often assumes that the queries and updates are independent of the previously generated estimates. In the adaptive setting, each provided estimate has the potential of leaking information about the estimator's internal randomness, which can, in turn, be used to craft queries and updates on which the frequency estimator does not perform well.