ICML2025

Resolving Lexical Bias in Model Editing

Hammad Rizwan, Domenic Rosati, Ga Wu, Hassan Sajjad

Abstract

Model editing aims to modify the knowledge of a pre-trained language model. Previous approaches have often involved direct alterations to model weights, which can result in model degradation. Recent weight-preserving techniques avoid making modifications to the model's weights by employing an adapter that implements edits through auxiliary components. These rely heavily on scoping mechanisms based on distance functions on the model's representation space to determine when to trigger edits. We demonstrate that current adapter methods are critically vulnerable to strong lexical biases, leading to issues such as applying edits to irrelevant prompts with overlapping words. This paper presents a principled approach to learning a disentangled representation space that facilitates precise localization of edits by maintaining distance between irrelevant prompts while preserving proximity among paraphrases. In our empirical study, we show that our method, Projector Editor Networks for Model Editing -PENME, achieves state-of-the-art model editing results while being computationally efficient during inference compared to previous methods and adaptable across different architectures. We provide the codebase of PENME here: https://github. com/hammadrizwan/PENME.git