NeurIPS2024

Optimal Hypothesis Selection in (Almost) Linear Time

Maryam Aliakbarpour, Mark Bun, Adam Smith

Abstract

Hypothesis selection, also known as density estimation, is a fundamental problem in statistics and learning theory. Given a sample set from an unknown distribution P and a finite class of candidate distributions (hypotheses) H := H 1 , H 2 , . . . , H n , the goal is to design an algorithm that selects a distribution ˆ H from H that best describes P . The accuracy of the algorithm is measured by the distance between ˆ H and P , compared to the distance between the closest distribution in H and P (denoted by OPT ). Specifically, we aim for ∥ ˆ H − P ∥ TV to be at most α · OPT + ϵ for some small ϵ and α . While the value of ϵ can be reduced with an increasing number of samples, α is an inherent characteristic of the algorithm. Achieving α < 3 is impossible, even with only two candidate hypotheses, unless the number of samples is proportional to the domain size of P [Bousquet, Kane, Moran ’19]. Finding a computationally efficient algorithm that achieves the optimal α has been a primary focus of research since the early work of [Devroye, Lugosi ’01]. Before our work, the algorithms achieving α < 5 required time Ω( n 2 ) . We present the first algorithm that operates in almost linear time ( ˜ O ( n/ϵ 3 ) ) and achieves α = 3 . This result improves upon a long line of hypothesis selection research. Previously known algorithms had either worse time complexity, a larger α factor, or additional assumptions about the problem setting. Additionally, we provide another (almost) linear-time algorithm with better dependency on the additive accuracy parameter ϵ , albeit with a slightly worse accuracy parameter of α = 4 .