ICML2022

TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

Yi Hao, Ayush Jain, Alon Orlitsky, Vaishakh Ravindrakumar

Abstract

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an 1 distance essentially at most a constant times larger than its closest t-piece degree-d polynomial, where t ≥ 1 and d ≥ 0. Letting c t,d denote the smallest such factor, clearly c 1,0 = 1, and it can be shown that c t,d ≥ 2 for all other t and d. Yet current computationally efficient algorithms show only c t,1 ≤ 2.25 and the bound rises quickly to c t,d ≤ 3 for d ≥ 9. We derive a near-linear-time and essentially sample-optimal estimator that establishes c t,d = 2 for all (t, d) = (1, 0). Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.