WWW2023

Sanitizing Sentence Embeddings (and Labels) for Local Differential Privacy

Minxin Du, Xiang Yue, Sherman S. M. Chow, Huan Sun

被引用 26 次

摘要

Differentially private (DP) learning, notably DP stochastic gradient descent (DP-SGD), has limited applicability in fine-tuning gigantic pre-trained language models (LMs) for natural language processing tasks. The culprit is the perturbation of gradients (as gigantic as entire models), leading to significant efficiency and accuracy drops. We show how to achieve metric-based local DP (LDP) by sanitizing (high-dimensional) sentence embedding, extracted by LMs and much smaller than gradients. For potential utility improvement, we impose a consistency constraint on the sanitization. We explore two approaches: One is brand new and can directly output consistent noisy embeddings; the other is an upgradation with post-processing. To further mitigate "the curse of dimensionality, " we introduce two trainable linear maps for mediating dimensions without hurting privacy or utility. Our protection can effectively defend against privacy threats on embeddings. It also naturally extends to inference. Our experiments 1 show that we reach the non-private accuracy under properly configured parameters, e.g., 0.92 for SST-2 with a privacy budget 𝜖 = 10 and the reduced dimension as 16. We also sanitize the label for LDP (with another small privacy budget) with limited accuracy losses to fully protect every sequence-label pair.