SIGMOD2025

Representation Obliviousness and Pseudodeterminism in Streaming Algorithms

Michael Chen, Sourav Chakraborty, Aduri Pavan, N. V. Vinodchandran

摘要

In this work, we study the notion of representation obliviousness in the context of pseudodeterministic streaming algorithms. A (randomized) streaming algorithm A is pseudodeterministic, if for every stream D, there is a ''canonical value'' g(D) so that with probability at least 2/3, A on input stream D outputs g(D). Intuitively, a randomized algorithm is representation oblivious if the output distribution of the algorithm does not depend on the representation of the input. We investigate this notion in the context of streaming algorithms, more specifically, distinct elements estimation (F 0 estimation) in data streams. In this context, representation obliviousness captures the idea that the output distribution of an algorithm for estimating F 0 should only depend on the set of distinct elements of the stream. This is a natural notion, as we note that standard streaming algorithms are representation-oblivious in this sense. We prove that any representation oblivious pseudodeterministic streaming algorithm for estimating F 0 must use Ω(n) space, where [n] is the universe. More generally, we prove that any representation oblivious pseudodeterministic t(n)-pass streaming algorithm requires Ω(n/t(n)) space. This lower bound matches the space requirement of the straightforward multi-pass deterministic algorithm that exactly computes F 0 .