NeurIPS2020

Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions

Yi Hao, Alon Orlitsky

4 citations

Abstract

The profile of a sample is the multiset of its symbol frequencies. We show that for samples of discrete distributions, profile entropy is a fundamental measure unifying the concepts of estimation, inference, and compression. Specifically, profile entropy: a) determines the speed of estimating the distribution relative to the best natural estimator; b) characterizes the rate of inferring all symmetric properties compared with the best estimator over any label-invariant distribution collection; c) serves as the limit of profile compression, for which we derive optimal nearlinear-time block and sequential algorithms. To further our understanding of profile entropy, we investigate its attributes, provide algorithms for approximating its value, and determine its magnitude for numerous structural distribution families. Recent research in statistical machine learning, ranging from neural-network training and online learning, to density estimation and property testing, has advanced evaluation criteria beyond worstcase analysis. New performance measures apply more refined metrics relating the algorithm's accuracy and efficiency to the problem's inherent structure. Consider for example learning an unknown discrete distribution from its i.i.d. samples (see also Section 2.2). The classical worst-case analysis states that in the worst case, the number of samples required to estimate a distribution to a given KL-divergence grows linearly in the alphabet size. However, this formulation is pessimistic, since distributions are rarely the worst possible, and many practical distributions can be estimated with significantly smaller samples. Furthermore, once the sample is drawn, it reveals the distribution's complexity and hence the hardness of the learning task. Going beyond worst-case analysis, one can design an adaptive learning algorithm whose theoretical guarantees vary according to the problem's simplicity. For example, Orlitsky and Suresh [2015] recently proposed an estimator that instance-by-instance achieves nearly the same performance as a genie algorithm designed with prior knowledge of the underlying distribution. We introduce profile entropy, a fundamental measure for the complexity of discrete distributions, and show that it connects three vital scientific tasks: estimation, inference, and compression. The resulting algorithms have guarantees directly relating to the sample profile entropy, hence also adapt to the intrinsic simplicity of the tasks at hand. The next subsection formalizes relevant concepts and useful notation. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.