ICML2024

Tuning-free Estimation and Inference of Cumulative Distribution Function under Local Differential Privacy

Yi Liu, Qirui Hu, Linglong Kong

被引用 3 次

摘要

It is of soaring demand to develop statistical analysis tools that are robust against contamination as well as preserving individual data owners' privacy. In spite of the fact that both topics host a rich body of literature, to the best of our knowledge, we are the first to systematically study the connections between the optimality under Huber's contamination model and the local differential privacy (LDP) constraints. In this paper, we start with a general minimax lower bound result, which disentangles the costs of being robust against Huber contamination and preserving LDP. We further study four concrete examples: a two-point testing problem, a potentially diverging mean estimation problem, a nonparametric density estimation problem and a univariate median estimation problem. For each problem, we demonstrate procedures that are optimal in the presence of both contamination and LDP constraints, comment on the connections with the state-of-the-art methods that are only studied under either contamination or privacy constraints, and unveil the connections between robustness and LDP via partially answering whether LDP procedures are robust and whether robust procedures can be efficiently privatised. Overall, our work showcases a promising prospect of joint study for robustness and local differential privacy. Introduction. In modern data collection and analysis, the privacy of individuals is a key concern. There has been a surge of interest in developing data analysis methodologies that yield strong statistical performance without compromising individuals' privacy, largely driven by applications in modern technology, including in Google (e.g., Erlingsson, Pihur and Korolova (2014)), Apple (e.g., Tang et al. ( 2017 )) and Microsoft (e.g., Ding, Kulkarni and Yekhanin (2017)), and by pressure from regulatory bodies (e.g., Forti (2021), Aridor, Che and Salz ( 2021 )). The prevailing framework for the development of private methodology is that of differential privacy (Dwork et al. (2006) ). Although this originates in cryptography, there is a growing body of statistical literature that aims to explore the constraints of this framework and provide procedures that make optimal use of available data (e.g., Wasserman and Zhou (2010), Duchi, Jordan and Wainwright (2018), Rohde and Steinberger (2020), Cai, Wang and Zhang ( 2021 )). Work in this area is split between central models of privacy, where there is a third party trusted to collect and analyse data before releasing privatised results, and local models of privacy, where data are randomised before collection. We, in this paper, will consider the local differential privacy constraint, to be formally defined in Section 1.2. While classical methods for locally private analysis are restricted to the estimation of the parameter of a binomial distribution (Warner (1965) ), modern research has resulted in mechanisms for many other statistical problems including various hypothesis testing problems (e.g., Kairouz, Oh and Viswanath (2016) , Joseph et al. (2019), Berrett and Butucea (2020), Acharya et al. (2022), Lam-Weil, Laurent and Loubes (2022)), mean and median estimation (e.g., Duchi, Jordan and Wainwright (2018)), nonparametric estimation problems (e.g., Rohde and Steinberger (2020), Butucea et al. (2020), Berrett, Györfi and Walk (2021)), and change point analysis (e.g., Berrett and Yu (2021), Li, Berrett and Yu (2022)), to name but a few.