STOC2024

Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances

Spencer Compton, Gregory Valiant

1 citation

Abstract

Given data drawn from a collection of Gaussian variables with a common mean but different and unknown variances, what is the best algorithm for estimating their common mean? We present an intuitive and efficient algorithm for this task. As different closed-form guarantees can be hard to compare, the Subset-of-Signals model serves as a benchmark for “heteroskedastic” mean estimation: given n Gaussian variables with an unknown subset of m variables having variance bounded by 1, what is the optimal estimation error as a function of n and m? Our algorithm resolves this open question up to logarithmic factors, improving upon the previous best known estimation error by polynomial factors when m = nc for all 0<c<1. Of particular note, we obtain error o(1) with m = Õ(n1/4) variance-bounded samples, whereas previous work required m = Ω(n1/2). Finally, we show that in the multi-dimensional setting, even for d=2, our techniques enable rates comparable to knowing the variance of each sample.