ICML2024

Saliency strikes back: How filtering out high frequencies improves white-box explanations

Sabine Muzellec, Thomas Fel, Victor Boutin, Léo Andéol, Rufin VanRullen, Thomas Serre

被引用 4 次

摘要

Attribution methods correspond to a class of explainability methods (XAI) that aim to assess how individual inputs contribute to a model's decisionmaking process. We have identified a significant limitation in one type of attribution methods, known as "white-box" methods. Although highly efficient, as we will show, these methods rely on a gradient signal that is often contaminated by highfrequency artifacts. To overcome this limitation, we introduce a new approach called "FORGrad." This simple method effectively filters out these high-frequency artifacts using optimal cut-off frequencies tailored to the unique characteristics of each model architecture. Our findings show that FORGrad consistently enhances the performance of existing white-box methods, enabling them to compete effectively with more accurate yet computationally more demanding "black-box" methods. We anticipate that, because of its effectiveness, the proposed method will foster the broader adoption of straightforward and efficient whitebox methods for explainability, providing a better balance between faithfulness and computational efficiency.