NeurIPS2024

Entropy testing and its application to testing Bayesian networks

Clément L. Canonne, Joy Qiping Yang

摘要

This paper studies the problem of entropy identity testing : given sample access to a distribution p and a fully described distribution q (both discrete distributions over a domain of size k ), and the promise that either p = q or | H ( p ) − H ( q ) | ⩾ ε , where H ( · ) denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability. We establish a near-optimal sample complexity bound of ˜Θ( √ k/ε + 1 /ε 2 ) for this problem, and show how to apply it to the problem of identity testing for in-degree-d n -dimensional Bayesian networks, obtaining an upper bound of ˜ O (2 d/ 2 n 3 / 2 /ε 2 + n 2 /ε 4 ) . This improves on the sample complexity bound of ˜ O (2 d/ 2 n 2 /ε 4 ) from [CDKS20], which required an additional assumption on the structure of the (unknown) Bayesian network.