S&P2025

Hash-Prune-Invert: Improved Differentially Private Heavy-Hitter Detection in the Two-Server Model

Borja Balle, James Bell-Clark, Albert Cheu, Adrià Gascón, Jonathan Katz, Mariana Raykova, Phillipp Schoppmann, Thomas Steinke

DOI Publisher

Abstract

Differentially private (DP) heavy-hitter detection is an important primitive for data analysis. Given a threshold <tex> $t$ </tex> and a dataset of <tex> $n$ </tex> items from a domain of size <tex> $d$ </tex>, such detection algorithms ignore items occurring fewer than <tex> $t$ </tex> times while identifying items occurring more than <tex> $t+\Delta$ </tex> times; we call <tex> $\Delta$ </tex> the error margin. In the central model where a curator holds the entire dataset, <tex> $(\varepsilon, \delta)$ </tex>-DP algorithms can achieve error margin <tex> $\Theta\left(\frac{1}{\varepsilon} \log \frac{1}{\delta}\right)$ </tex>, which is optimal when <tex> $d\gg 1/\delta$ </tex>. Several works, e.g., Poplar (S&P 2021), have proposed protocols in which two or more non-colluding servers jointly compute the heavy hitters from inputs held by <tex> $n$ </tex> clients. Unfortunately, existing protocols suffer from an undesirable dependence on Iog <tex> $d$ </tex> in terms of both server efficiency (computation, communication, and round complexity) and accuracy (i.e., error margin), making them unsuitable for large domains (e.g., when items are kB-long strings, log <tex> $d\approx 10^{4}$ </tex>). We present hash-prune-invert (HPI), a technique for compiling any heavy-hitter protocol with the log <tex> $d$ </tex> dependencies mentioned above into a new protocol with improvements across the board: computation, communication, and round complexity depend (roughly) on log <tex> $n$ </tex> rather than log <tex> $d$ </tex>, and the error margin is independent of <tex> $d$ </tex>. Our transformation preserves privacy against an active adversary corrupting at most one of the servers and any number of clients. We apply HPI to an improved version of Poplar, also introduced in this work, that improves Poplar's error margin by roughly a factor of <tex> $\sqrt{n}$ </tex> (regardless of <tex> $d)$ </tex>. Our experiments confirm that the resulting protocol improves efficiency and accuracy for large <tex> $d$ </tex>.