CCS2022

Are Attribute Inference Attacks Just Imputation?

Bargav Jayaraman, David Evans

42 citations

Abstract

Models can expose sensitive information about their training data. In an attribute inference attack, an adversary has partial knowledge of some training records and access to a model trained on those records, and infers the unknown values of a sensitive feature of those records. We study a fine-grained variant of attribute inference we call sensitive value inference, where the adversary's goal is to identify with high confidence some records from a candidate set where the unknown attribute has a particular sensitive value. We explicitly compare attribute inference with data imputation that captures the training distribution statistics, under various assumptions about the training data available to the adversary. Our main conclusions are: (1) previous attribute inference methods do not reveal more about the training data from the model than can be inferred by an adversary without access to the trained model, but with the same knowledge of the underlying distribution as needed to train the attribute inference attack; (2) black-box attribute inference attacks rarely learn anything that cannot be learned without the model; but (3) white-box attacks, which we introduce and evaluate in the paper, can reliably identify some records with the sensitive value attribute that would not be predicted without having access to the model. Furthermore, we show that proposed defenses such as differentially private training and removing vulnerable records from training do not mitigate this privacy risk. The code for our experiments is available at https://github.com/bargavj/EvaluatingDPML . Contributions. To better understand attribute inference risks, we consider threat models where an adversary has limited prior knowledge of the training distribution (Section 3.1) and study a finergrained notion of attribute inference that considers the privacy risk of identifying, with high confidence, individuals with a particular sensitive attribute values from a candidate set (Section 3.2). We propose a novel white-box attack that identifies neurons in a model that are most correlated with the sensitive value for a target attribute (Section 5). We perform extensive experimental evaluation sensitive value inference on two large real world data sets with both imputation and black-box attribute inference attacks (Section 7) and with our novel white-box attacks (Section 8). Key findings. Our experiments show that trained models leak considerable information about the underlying training distribution which can be exploited to infer sensitive attributes about individuals. While prior attribute inference attacks do not learn anything from the model that could not be learned without it, our white-box attacks are able to confidently infer sensitive value records, even 1