STOC2025

A Framework for Building Data Structures from Communication Protocols

Alexandr Andoni, Shunhua Jiang, Omri Weinstein

Abstract

We present a general framework for designing efficient data structures for high-dimensional pattern-matching problems (  ?i[n],f(xi,y)=1\exists \;? i\in[n], f(x_i,y)=1) through communication models in which f(x,y)f(x,y) admits sublinear communication protocols with exponentially-small error. Specifically, we reduce the data structure problem to the Unambiguous Arthur-Merlin (UAM) communication complexity of f(x,y)f(x,y) under product distributions. We apply our framework to the Partial Match problem (a.k.a, matching with wildcards), whose underlying communication problem is sparse set-disjointness. When the database consists of nn points in dimension dd, and the number of \star's in the query is at most w=clogn  (d)w = c\log n \;(\ll d), the fastest known linear-space data structure (Cole, Gottlieb and Lewenstein, STOC'04) had query time t2w=nct \approx 2^w = n^c, which is nontrivial only when c<1c<1. By contrast, our framework produces a data structure with query time n11/(clog2c)n^{1-1/(c \log^2 c)} and space close to linear. To achieve this, we develop a one-sided εε-error communication protocol for Set-Disjointness under product distributions with Θ~(dlog(1/ε))\tildeΘ(\sqrt{d\log(1/ε)}) complexity, improving on the classical result of Babai, Frankl and Simon (FOCS'86). Building on this protocol, we show that the Unambiguous AM communication complexity of ww-Sparse Set-Disjointness with εε-error under product distributions is O~(wlog(1/ε))\tilde{O}(\sqrt{w \log(1/ε)}), independent of the ambient dimension dd, which is crucial for the Partial Match result. Our framework sheds further light on the power of data-dependent data structures, which is instrumental for reducing to the (much easier) case of product distributions.