NeurIPS2023

What Distributions are Robust to Indiscriminate Poisoning Attacks for Linear Learners?

Fnu Suya, Xiao Zhang, Yuan Tian, David Evans

被引用 3 次

摘要

We study indiscriminate poisoning for linear learners where an adversary injects a few crafted examples into the training data with the goal of forcing the induced model to incur higher test error. Inspired by the observation that linear learners on some datasets are able to resist the best known attacks even without any defenses, we further investigate whether datasets can be inherently robust to indiscriminate poisoning attacks for linear learners. For theoretical Gaussian distributions, we rigorously characterize the behavior of an optimal poisoning attack, defined as the poisoning strategy that attains the maximum risk of the induced model at a given poisoning budget. Our results prove that linear learners can indeed be robust to indiscriminate poisoning if the class-wise data distributions are well-separated with low variance and the size of the constraint set containing all permissible poisoning points is also small. These findings largely explain the drastic variation in empirical attack performance of the state-of-the-art poisoning attacks on linear learners across benchmark datasets, making an important initial step towards understanding the underlying reasons some learning tasks are vulnerable to data poisoning attacks. One commonly studied poisoning attacks in the literature are indiscriminate poisoning attacks [4, 54, 35, 45, 5, 46, 25, 31, 10] , in which the attackers aim to let induced models incur larger test errors compared to the model trained on a clean dataset. Other poisoning goals, including targeted [42, 56, 24, 21, 18] and subpopulation [22, 46] attacks, are also worth studying and may correspond to more realistic attack goals. We focus on indiscriminate poisoning attacks as these attacks interfere with the fundamental statistical properties of the learning algorithm [45, 25] , but include a summary of prior work on understanding limits of poisoning attacks in other settings in the related work section. Indiscriminate poisoning attack methods have been developed that achieve empirically strong poisoning attacks in many settings [45, 46, 25, 31] , but these works do not explain why the proposed attacks are sometimes ineffective. In addition, the evaluations of these attacks can be deficient in 37th Conference on Neural Information Processing Systems (NeurIPS 2023).