NeurIPS2021

Pointwise Bounds for Distribution Estimation under Communication Constraints

Wei-Ning Chen, Peter Kairouz, Ayfer Özgür

被引用 7 次

摘要

We consider the problem of estimating a d-dimensional discrete distribution from its samples observed under a b-bit communication constraint. In contrast to most previous results that largely focus on the global minimax error, we study the local behavior of the estimation error and provide pointwise bounds that depend on the target distribution p. In particular, we show that the 2 error decays with O max when n is sufficiently large, hence it is governed by the half-norm of p instead of the ambient dimension d. For the achievability result, we propose a two-round sequentially interactive estimation scheme that achieves this error rate uniformly over all p. This two-round scheme extends to q loss with q ≥ 1, and hence gives pointwise upper bounds on q error. We also develop a new local minimax lower bound with (almost) matching 2 error, showing that any interactive scheme must admit a Ω p (1+δ)/2 n2 b 2 error for any δ > 0. Our upper and lower bounds together indicate that the H 1/2 (p) log( p 1/2 ) bits of communication is both sufficient and necessary to achieve the optimal (centralized) performance, where H 1/2 (p) is the Rényi entropy of order 2. Therefore, under the 2 loss, the correct measure of the local communication complexity at p is its Rényi entropy. negative side, however, the 2 estimation error achieved by these schemes scales as O( d n2 b ) under a b-bit communication constraint on each sample, where d is the alphabet size of the unknown discrete distribution p. This suggests that without additional assumptions on p the error scales linearly in d, i.e. the introduction of communication constraints introduces a penalty d on the estimation accuracy. This is true even if we allow for interaction between clients [8, 9] . A recent work [10] has moved a step forward from the "global minimax regime" by restricting the target distribution p to be s-sparse and showing that the 2 error can be reduced to O( s log(d/s) n2 b ) in this case, i.e. the error depends on the sparsity s rather than the ambient dimension d. More recently, [11] has improved on this result by developing a group-testing based scheme that reduces the 2 error to O( s n2 b ), therefore completely removing the dependence on the ambient dimension d and matching the information-theoretic lower bound. However, both of these schemes and their performance guarantees are limited to the s-sparse setting. Little is known when the target distribution deviates slightly from being exactly s-sparse. Moreover, these schemes are essentially still worst-case schemes as they are designed to be minimax optimal, albeit over a smaller class of distributions, i.e. s-sparse. In this paper, we argue that all these results can be overly pessimistic, as worst-case notions of complexity and schemes designed to optimize these worst-case notions can be too conservative. Instead, we seek a measure of local complexity that captures the hardness of estimating a specific instance p. Ideally, we want a scheme that adapts to the hardness of the problem instead of being tuned to the worst-case scenario; that is, a scheme achieving smaller error when p is "simpler." Our contributions Motivated by these observations, in this work we consider the local minimax complexity of distribution estimation and quantify the hardness of estimating a specific p under communication constraints. In particular, under the 2 loss, we show that the local complexity of estimating p is captured by its half-norm 1 p 1 2 : we propose a two-round interactive scheme that uniformly achieves the O max p 1/2 n2 b , 1 n error under 2 loss 2 which requires no prior information on p. On the impossibility side, we also show that for any (arbitrarily) interactive scheme, the local minimax error (which is formally defined in Theorem 2.4) must be at least Ω max