STOC2025

A Framework for Building Data Structures from Communication Protocols

Alexandr Andoni, Shunhua Jiang, Omri Weinstein

Abstract

We present a general framework for designing efficient data structures for high-dimensional pattern-matching problems ( $\exists \;? i\in[n], f(x_i,y)=1$ ) through communication models in which $f(x,y)$ admits sublinear communication protocols with exponentially-small error. Specifically, we reduce the data structure problem to the Unambiguous Arthur-Merlin (UAM) communication complexity of $f(x,y)$ under product distributions. We apply our framework to the Partial Match problem (a.k.a, matching with wildcards), whose underlying communication problem is sparse set-disjointness. When the database consists of $n$ points in dimension $d$ , and the number of $\star$ 's in the query is at most $w = c\log n \;(\ll d)$ , the fastest known linear-space data structure (Cole, Gottlieb and Lewenstein, STOC'04) had query time $t \approx 2^w = n^c$ , which is nontrivial only when $c<1$ . By contrast, our framework produces a data structure with query time $n^{1-1/(c \log^2 c)}$ and space close to linear. To achieve this, we develop a one-sided $ε$ -error communication protocol for Set-Disjointness under product distributions with $\tildeΘ(\sqrt{d\log(1/ε)})$ complexity, improving on the classical result of Babai, Frankl and Simon (FOCS'86). Building on this protocol, we show that the Unambiguous AM communication complexity of $w$ -Sparse Set-Disjointness with $ε$ -error under product distributions is $\tilde{O}(\sqrt{w \log(1/ε)})$ , independent of the ambient dimension $d$ , which is crucial for the Partial Match result. Our framework sheds further light on the power of data-dependent data structures, which is instrumental for reducing to the (much easier) case of product distributions.