NeurIPS2024

Sample-Efficient Private Learning of Mixtures of Gaussians

Hassan Ashtiani, Mahbod Majid, Shyam Narayanan

Abstract

We study the problem of learning mixtures of Gaussians with approximate differential privacy. We prove that roughly kd2+k1.5d1.75+k2dkd^2 + k^{1.5} d^{1.75} + k^2 d samples suffice to learn a mixture of kk arbitrary dd-dimensional Gaussians up to low total variation distance, with differential privacy. Our work improves over the previous best result [AAL24b] (which required roughly k2d4k^2 d^4 samples) and is provably optimal when dd is much larger than k2k^2. Moreover, we give the first optimal bound for privately learning mixtures of kk univariate (i.e., 11-dimensional) Gaussians. Importantly, we show that the sample complexity for privately learning mixtures of univariate Gaussians is linear in the number of components kk, whereas the previous best sample complexity [AAL21] was quadratic in kk. Our algorithms utilize various techniques, including the inverse sensitivity mechanism [AD20b, AD20a, HKMN23], sample compression for distributions [ABDH+20], and methods for bounding volumes of sumsets.