STOC2023

Lifting Uniform Learners via Distributional Decomposition

Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan

1 citation

Abstract

We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution D. The efficiency of our transformation scales with the inherent complexity of D, running in poly(n, (md) d ) time for distributions over ±1 n whose pmfs are computed by depthd decision trees, where m is the sample complexity of the original algorithm. For monotone distributions our transformation uses only samples from D, and for general ones it uses subcube conditioning samples. A key technical ingredient is an algorithm which, given the aforementioned access to D, produces an optimal decision tree decomposition of D: an approximation of D as a mixture of uniform distributions over disjoint subcubes. With this decomposition in hand, we run the uniform-distribution learner on each subcube and combine the hypotheses using the decision tree. This algorithmic decomposition lemma also yields new algorithms for learning decision tree distributions with runtimes that exponentially improve on the prior state of the art-results of independent interest in distribution learning. This work These two settings, distribution-free and uniform-distribution learning, correspond to two extremes in terms of distributional assumptions. Distribution-free algorithms make no assumptions about the distribution, but the design of such algorithms that are computationally efficient has proven correspondingly challenging. On the other hand, we have a wealth of techniques and positive results in the uniform-distribution setting, but the guarantees of such algorithms only hold under a strong, often unrealistic assumption on the data distribution. Due to these vast differences, research in these two settings has mostly proceeded independently with few known connections. The motivation for our work is the search for fruitful middle grounds between these two extremes, where efficient algorithms can be obtained for expressive concept classes and where their guarantees hold for broad classes of distributions. More generally, we ask: Can we interpolate between the two extremes via notions of distribution complexity, and design learning algorithms whose efficiency scale with the inherent complexity of D? Relatedly, can the large body of existing results and techniques for learning under the uniform distribution be lifted, ideally in a blackbox fashion, to non-uniform but nevertheless "simple" distributions? Our results We provide affirmative answers to these questions via a natural notion of distribution complexity: Definition 1 (Decision tree complexity of D). The decision tree complexity of a distribution D over ±1 n , denoted depth(D), is the smallest integer d such that its probability mass function (pmf ) can be computed by a depth-d decision tree.