NeurIPS2022
Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations
Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
116 citations
Abstract
A critical problem in the field of post hoc explainability is the lack of a common foundational goal among methods. For example, some methods are motivated by function approximation, some by game theoretic notions, and some by obtaining clean visualizations. This fragmentation of goals causes not only an inconsistent conceptual understanding of explanations but also the practical challenge of not knowing which method to use when. In this work, we begin to address these challenges by unifying eight popular post hoc explanation methods (LIME, C-LIME, KernelSHAP, Occlusion, Vanilla Gradients, Gradients × Input, SmoothGrad, and Integrated Gradients). We show that these methods all perform local function approximation of the black-box model, differing only in the neighbourhood and loss function used to perform the approximation. This unification enables us to (1) state a no free lunch theorem for explanation methods, demonstrating that no method can perform optimally across all neighbourhoods, and (2) provide a guiding principle to choose among methods based on faithfulness to the black-box model. We empirically validate these theoretical results using various real-world datasets, model classes, and prediction tasks. By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones. As machine learning models become increasingly complex and are increasingly deployed in highstakes settings (e.g., medicine [1], law [2], and finance [3]), there is a growing emphasis on understanding how models make predictions so that decision-makers (e.g., doctors, judges, and loan officers) can assess the extent to which they can trust model predictions. To this end, several post hoc explanation methods have been developed, including and Integrated Gradients [11]. However, different methods have different goals. Such differences lead to both conceptual and practical challenges to understanding and using explanation methods, thwarting progress in the field. From a conceptual standpoint, the misalignment of goals among methods leads to an inconsistent view of explanations. What is an explanation? This is unclear as different methods have different notions of explanation. Depending on the method, explanations may be local function approximations (LIME and C-LIME), Shapley values (SHAP), raw gradients (Vanilla Gradients), raw gradients 36th Conference on Neural Information Processing Systems (NeurIPS 2022).