NeurIPS2024

DiffHammer: Rethinking the Robustness of Diffusion-Based Adversarial Purification

Kaibo Wang, Xiaowen Fu, Yuxuan Han, Yang Xiang

摘要

Diffusion-based purification has demonstrated impressive robustness as an adversarial defense. However, concerns exist about whether this robustness arises from insufficient evaluation. Our research shows that EOT-based attacks face gradient dilemmas due to global gradient averaging, resulting in ineffective evaluations. Additionally, 1-evaluation underestimates resubmit risks in stochastic defenses. To address these issues, we propose an effective and efficient attack named DiffHam-mer. This method bypasses the gradient dilemma through selective attacks on vulnerable purifications, incorporating N -evaluation into loops and using gradient grafting for comprehensive and efficient evaluations. Our experiments validate that DiffHammer achieves effective results within 10-30 iterations, outperforming other methods. This calls into question the reliability of diffusion-based purification after mitigating the gradient dilemma and scrutinizing its resubmit risk.