ICLR2023

Concept Gradient: Concept-based Interpretation Without Linear Assumption

Andrew Bai, Chih-Kuan Yeh, Neil Y. C. Lin, Pradeep Kumar Ravikumar, Cho-Jui Hsieh

5 citations

Abstract

Concept-based interpretations of black-box models are often more intuitive than feature-based counterparts for humans to understand. The most widely adopted approach for concept-based gradient interpretation is Concept Activation Vector (CAV). CAV relies on learning linear relations between some latent representations of a given model and concepts. The premise of meaningful concepts lying in a linear subspace of model layers is usually implicitly assumed but does not hold true in general. In this work we proposed Concept Gradients (CG), which extends concept-based gradient interpretation methods to non-linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically measure how a small change of concept affects the model's prediction, which is an extension of gradient-based interpretation to the concept space. We demonstrate empirically that CG outperforms CAV in evaluating concept importance on real world datasets and perform a case study on a medical dataset. The code is available at https://github.com/jybai/concept-gradients .