ICML2020

Differentially Private Set Union

Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen, Milad Shokouhi, Sergey Yekhanin

被引用 37 次

摘要

We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe UU of items, possibly of infinite size, and a database DD of users. Each user ii contributes a subset WiUW_i \subseteq U of items. We want an (ϵ\epsilon,δ\delta)-differentially private algorithm which outputs a subset SiWiS \subset \cup_i W_i such that the size of SS is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, nn-grams etc., from private text data belonging to users is an instance of the set union problem.Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a dependent fashion, guided by a policy. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain contractive properties would result in a differentially private algorithm. We design two new algorithms for differentially private set union, one using Laplace noise and other Gaussian noise, which use 1\ell_1-contractive and 2\ell_2-contractive policies respectively and provide concrete examples of such policies. Our experiments show that the new algorithms in combination with our policies significantly outperform previously known mechanisms for the problem.