KDD2025

Beyond Input Attribution: A Hands-On Tutorial to Concept-Based Explainable AI and Mechanistic Interpretability

Eliana Pastor, Eleonora Poeta, André Panisson, Alan Perotti, Gabriele Ciravegna

被引用 2 次

摘要

As deep learning systems become pervasive, the demand for trustworthy and transparent AI continues to grow. Traditional feature attribution methods, however, often lack robustness and alignment with human reasoning. This tutorial moves beyond feature attribution by introducing participants to two complementary interpretability paradigms: Concept-Based Explainable AI (C-XAI) and Mechanistic Interpretability. C-XAI provides explanations grounded in high-level, human-interpretable concepts, bridging the gap between model reasoning and human understanding. In parallel, mechanistic interpretability--a quickly emerging field--focuses on reverse-engineering neural networks to uncover and disentangle the internal mechanisms that give rise to human-understandable representations. Through interactive coding sessions and hands-on exercises, attendees will gain practical experience implementing, evaluating, and comparing a variety of C-XAI and mechanistic interpretability techniques. By the end of the tutorial, participants will be equipped with a modern interpretability toolbox and a deeper understanding of how to apply them in real-world scenarios.