NeurIPS2020

Private Identity Testing for High-Dimensional Distributions

Clément L. Canonne, Gautam Kamath, Audra McMillan, Jonathan R. Ullman, Lydia Zakynthinou

40 citations

Abstract

In this work we present novel differentially private identity (goodness-of-fit) testers for natural and widely studied classes of multivariate product distributions: product distributions over ±1 d and Gaussians in R d with known covariance. Our testers have improved sample complexity compared to those derived from previous techniques, and are the first testers whose sample complexity matches the order-optimal minimax sample complexity of O(d 1/2 /α 2 ) in many parameter regimes. We construct two types of testers, exhibiting tradeoffs between sample complexity and computational complexity. Finally, we provide a two-way reduction between testing a subclass of multivariate product distributions and testing univariate distributions, and thereby obtain upper and lower bounds for testing this subclass of product distributions. of this work is to give novel algorithms for hypothesis testing problems on high-dimensional distributions with improved sample complexity. In particular, we give differentially private algorithms for the following natural and fundamental problems: 1. Given samples from a multivariate Gaussian P in R d whose covariance is known to be the identity, decide if P is N (0, I d×d ) or is α-far from N (0, I d×d ) in total variation distance. Or, equivalently, decide if 2. Given samples from a product distribution P over ±1 d , decide if P is the uniform distribution or is α-far from the uniform distribution in total variation distance. Or, equivalently, decide if 3. Given samples from a product distribution P over 0, 1 d , decide if P is equal to some given extremely biased distribution or is α-far from Q in total variation distance. In this case our tester achieves the provably optimal sample complexity. The main challenge in solving these high-dimensional testing problems privately is that the only known non-private test statistics for these problems have high worst-case sensitivity. That is, these test statistics can potentially be highly brittle to changing even a single one of the samples. We overcome this challenge by identifying two methods for reducing the sensitivity of the test statistic without substantially changing its average-case behavior on typical datasets sampled from the distributions we consider. The first is based on a novel private filtering method, which gives a computationally efficient tester. The second combines the method of Lipschitz extensions [BBDS13, KNRS13] with recursive preconditioning, which yields an exponential-time tester, but with improved sample complexity. Background: Private Hypothesis Testing We start by giving some background on private hypothesis testing. First, when we say that we want a differentially private hypothesis tester for a pair H 0 , H 1 over domain X , we mean that we seek an algorithm A : X * → 0, 1 such that