NeurIPS2023

Kernel-Based Tests for Likelihood-Free Hypothesis Testing

Patrik Róbert Gerber, Tianze Jiang, Yury Polyanskiy, Rui Sun

4 citations

Abstract

Given nn observations from two balanced classes, consider the task of labeling an additional mm inputs that are known to all belong to one of the two classes. Special cases of this problem are well-known: with complete knowledge of class distributions (n=n=\infty) the problem is solved optimally by the likelihood-ratio test; when m=1m=1 it corresponds to binary classification; and when mnm\approx n it is equivalent to two-sample testing. The intermediate settings occur in the field of likelihood-free inference, where labeled samples are obtained by running forward simulations and the unlabeled sample is collected experimentally. In recent work it was discovered that there is a fundamental trade-off between mm and nn: increasing the data sample mm reduces the amount nn of training/simulation data needed. In this work we (a) introduce a generalization where unlabeled samples come from a mixture of the two classes -- a case often encountered in practice; (b) study the minimax sample complexity for non-parametric classes of densities under maximum mean discrepancy (MMD) separation; and (c) investigate the empirical performance of kernels parameterized by neural networks on two tasks: detection of the Higgs boson and detection of planted DDPM generated images amidst CIFAR-10 images. For both problems we confirm the existence of the theoretically predicted asymmetric mm vs nn trade-off.