ICML2025

Not All Wrong is Bad: Using Adversarial Examples for Unlearning

Ali Ebrahimpour Boroojeny, Hari Sundaram, Varun Chandrasekaran

Abstract

The main difference between the predictions on D T (unseen samples) and D R (observed samples) is that the model's predictions are much more confident for the samples that it has observed compared to the unseen samples. Motivation (cont.) Key Observation 2: Fine-tuning a model on the adversarial examples does not lead to catastrophic forgetting! ResNet-18 model trained on CIFAR-10 From left to right, Adv shows fine-tuning on :