NeurIPS2021

Sequential Algorithms for Testing Closeness of Distributions

Aadil Oufkir, Omar Fawzi, Nicolas Flammarion, Aurélien Garivier

被引用 4 次

摘要

What advantage do sequential procedures provide over batch algorithms for testing properties of unknown distributions? Focusing on the problem of testing whether two distributions D 1 and D 2 on 1, . . . , n are equal or ε-far, we give several answers to this question. We show that for a small alphabet size n, there is a sequential algorithm that outperforms any batch algorithm by a factor of at least 4 in terms sample complexity. For a general alphabet size n, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance TV(D 1 , D 2 ) between D 1 and D 2 is larger than ε. As a corollary, letting ε go to 0, we obtain a sequential algorithm for testing closeness when no a priori bound on TV(D 1 , D 2 ) is given that has a sample complexity Õ( TV(D1,D2) 4/3 ): this improves over the Õ( n/ log n TV(D1,D2) 2 ) tester of Daskalakis and Kawase [2017] and is optimal up to multiplicative constants. We also establish limitations of sequential algorithms for the problem of testing closeness: they can improve the worst case number of samples by at most a constant factor.