SIGMOD2025

Perfect Sampling in Turnstile Streams Beyond Small Moments

David P. Woodruff, Shenghao Xie, Samson Zhou

Abstract

Given a vector x ∈ ℝ n induced by a turnstile stream S , a non-negative function G: ℝ → ℝ, a perfect G -sampler outputs an index i with probability G(x i )/Σ j∈[n] + 1/poly(n). Jayaram and Woodruff (FOCS 2018) introduced a perfect L p -sampler, where G(z)=|z| p , for p ∈(0,2]. In this paper, we solve this problem for p>2 by a sampling-and-rejection method. Our algorithm runs in n 1-2/p • polylog (n) bits of space, which is tight up to polylogarithmic factors in n . Our algorithm also provides a (1+ε)-approximation to the sampled item x i with high probability using an additional ε -2 n 1-2/p • polylog (n) bits of space. Interestingly, we show our techniques can be generalized to perfect polynomial samplers on turnstile streams, which is a class of functions that is not scale-invariant, in contrast to the existing perfect L p samplers. We also achieve perfect samplers for the logarithmic function G(z)=log(1+|z|) and the cap function G(z)=min(T,|z| p ). Finally, we give an application of our results to the problem of norm/moment estimation for a subset Q of coordinates of a vector, revealed only after the data stream is processed, e.g., when the set Q represents a range query, or the set n Q represents a collection of entities who wish for their information to be expunged from the dataset.