ACL2025
Mamba Knockout for Unraveling Factual Information Flow
Nir Endy, Idan Daniel Grosbard, Yuval Ran-Milo, Yonatan Slutzky, Itay Tshuva, Raja Giryes
Abstract
This paper investigates the flow of factual information in Mamba-based language models. We rely on theoretical and empirical connections to Transformer-based architectures and their attention mechanisms. Exploiting this relationship, we adapt attentional interpretability techniques originally developed for Transformers-specifically, the Attention Knockout methodology-to both Mamba-1 and Mamba-2. Using them, we trace how information is transmitted and localized across tokens and layers, revealing patterns of subject-token information emergence and layer-wise dynamics. Notably, some phenomena vary between Mamba models and Transformer-based models, while others appear universally across all models inspected-hinting that these may be inherent to LLMs in general. By further leveraging Mamba's structured factorization, we disentangle how distinct "features" either enable token-to-token information exchange or enrich individual tokens, thus offering a unified lens to understand Mamba's internal operations. Our code can be found at https: //github.com/nirendy/mamba-knockout .