ICLR2026

Precise and Interpretable Editing of Code Knowledge in Large Language Models

Min Xue, Nikolai Bolik, Lennart Stöpler, Erik Imgrund, Janik Schmid, Artur Andrzejak

摘要

Large Language Models (LLMs) have demonstrated outstanding capabilities in various code-related tasks, including code completion, translation, or summarization. However, these pretrained models are static, posing a challenge to incorporate new knowledge into an LLM to correct erroneous behavior. Approaches such as retraining or fine-tuning demand extensive labeled datasets and might be computationally expensive, while prompt engineering fails to change models permanently. Knowledge Editing (KE) techniques (Wang et al., 2024) offer a more efficient alternative, enabling model updates with minimal data, even just a single example. Nevertheless, existing KE methods often manipulate parameters within the Transformer's multi-layer perceptrons (MLPs), where neuronal polysemanticity hinders both the precision and interpretability of the edits. To address these limitations, we exploit TransCoder (Dunefsky et al., 2024) , an MLP-like model component with a wide and sparsely activated hidden feature vector. Specifically, we introduce TransCoder-based Precise Editing (TCPE), a novel method that leverages the sparsity and monosemanticity of the TransCoder's neurons for highly localized knowledge editing. TCPE exhibits neuron-level mechanistic interpretability characteristics, revealing the correspondence between the edited neurons and the specific code-related knowledge. Furthermore, we present KECode, a new evaluation benchmark for code-to-code translation based on functional equivalence (Wei et al., 2025) . Using KECode, we conduct a systematic evaluation of representative KE methods in the context of code-to-code translation. Our experimental results demonstrate that TCPE outperforms existing KE methods, achieving a substantial improvement of translation accuracy of CodeLlama-7b-Instruct from 57.5% to 64.0% in a low-resource scenario of Java-to-D translation. INTRODUCTION Large Language Models (LLMs) have proved highly impactful in a multitude of fields within