ICLR2026

Mean Estimation from Coarse Data: Characterizations and Efficient Algorithms

Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis, Felix Zhou, Ziyu Zhu

Abstract

Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs naturally through measurement rounding, sensor limitations, and lag in economic systems. We study Gaussian mean estimation from coarse data, where each true sample x is drawn from a d-dimensional Gaussian distribution with identity covariance, but is revealed only through the set of a partition containing x. When the coarse samples, roughly speaking, have "low" information, the mean cannot be uniquely recovered from observed samples (i.e., the problem is not identifiable). Recent work by Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21] established that sample-efficient mean estimation is possible when the unknown mean is identifiable and the partition consists of only convex sets. Moreover, they showed that without convexity, mean estimation becomes NP-hard. However, two fundamental questions remained open: (1) When is the mean identifiable under convex partitions? (2) Is computationally efficient estimation possible under identifiability and convex partitions? This work resolves both questions. We provide a geometric characterization of when a convex partition is identifiable, showing it depends on whether the convex sets form "slabs" in a direction. Second, we give the first polynomial-time algorithm for finding ε-accurate estimates of the Gaussian mean given coarse samples from an unknown convex partition, matching the optimal O(d/ε 2 ) sample complexity. Our results have direct applications to robust machine learning, particularly robustness to observation rounding. As a concrete example, we derive a sample-and computationally-efficient algorithm for linear regression with market friction, a canonical problem in using ML in economics, where exact prices are unobserved and one only sees a range containing the price [Ros59]. ▶ Efficient Algorithm (Theorem 3.2): We provide the first polynomial-time algorithm for estimating µ ⋆ for any convex, identifiable partition P. The algorithm matches the sample complexity of Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21], while also being computationally efficient. These results settle the computational complexity of coarse Gaussian mean estimation under convex partitions. As a concrete application of our techniques, we obtain a sample-and computationallyefficient algorithm for linear regression with market friction, a canonical setting where coarsening arises from lag in market participants' responses (Theorem 3.3). Characterization Techniques. Our first result is a geometric characterization of the sets of the partition P that allows for identifying the Gaussian mean from coarse observations. Returning to Figure 1 (a), it is not difficult to see that partitions that consist of parallel slabs are not identifiable. This gives a very natural and intuitive sufficient condition. It remains to argue that this is also necessary, i.e., that any non-identifiable partition should look like a collection of parallel slabs. ▶ Necessity of Parallel Slabs: Our main technical contribution in characterization is to show that parallel slabs are also necessary for non-identifiability. To show this, our proof combines tools from optimal transport and variance reduction inequalities [Har04; Vem10]; we believe this combination will have broader applications in estimation with coarse samples. Algorithmic Techniques. Our computationally efficient algorithm performs stochastic gradient descent (SGD) on a variant of the log-likelihood objective (Section 4). Obtaining provable guarantees for this approach requires several key innovations (compared to, e.g., Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21]):