VLDB2020

Set-valued Data Publication with Local Privacy: Tight Error Bounds and Efficient Mechanisms

Shaowei Wang, Yuqiu Qian, Jiachun Du, Wei Yang, Liusheng Huang, Hongli Xu

28 citations

Abstract

Most user-generated data in online services are presented as set-valued data, e.g., visited website URLs, recently used Apps by a person, and etc. These data are of great value to service providers, but also bring privacy concerns if collected and analyzed directly. To tackle potential privacy threatens, local differential privacy (LDP) attracts increasing attention nowadays. However, existing approaches only provide sub-optimal error bound for set-valued data distribution estimation with LDP. Besides, it is computational expensive and communication expensive to use for high dimensional set-valued data, considering large domains in real scenarios. Thus, existing approaches are unpractical to use on resource-constrained user-side devices (e.g., smartphones and wearable devices). In this paper, we propose a utility-optimal and efficient set-valued data publication method (i.e., wheel mechanism). On the user side, each user contributes only one numerical value to represent their privatized data. The computational complexity is O(minm log m, me ) and communication cost is O(log(me )) bits, while existing approaches usually depend on O(d) or O(log d), where m is the number of items in the set-valued data (m ≡ 1 for categorical data), d is the domain size (usually d m) and is the privacy budget. On the server side, the estimator takes numerical values from users as input and derives an unbiased distribution estimation. Theoretical results show that estimation error bounds are improved from previously known Θ( m 2 d n 2 ) to the optimal rate Θ( md n 2 ). Results on extensive experiments demonstrate that our proposed wheel mechanism is 3-100x faster than existing approaches, meanwhile has optimal statistical efficiency.