ICLR2026
Learning-Time Encoding Shapes Unlearning in LLMs
Ruihan Wu, Konstantin Garov, Kamalika Chaudhuri
Abstract
As large language models (LLMs) are increasingly deployed in the real world, the ability to "unlearn", or remove specific pieces of knowledge post hoc, has become essential for a variety of reasons ranging from privacy regulations to correcting outdated or harmful content. Prior work has proposed unlearning benchmarks and algorithms, and has typically assumed that the training process and the target model are fixed. In this work, we empirically investigate how learning-time choices in knowledge encoding impact the effectiveness of unlearning factual knowledge. Our experiments reveal two key findings: (1) learning with paraphrased descriptions improves unlearning performance and (2) unlearning individual piece of knowledge from a chunk of text is challenging. Our results suggest that learningtime knowledge encoding may play a central role in enabling reliable post-hoc unlearning. 2 * Equal contribution 2 The code is publicly released at https://github.com/wrh14/learning_time_shapes_unlearning . Preprint. Under review. An example of the textual description for a piece of knowledge: Reid Perry has Richard Perry as his father. An example of the paraphrased description: The father of Reid Perry is Richard Perry. An example of the text trunk that implies the same piece of knowledge: Richard Perry, born in 1956 in Maryland, works as an airline pilot. He is married to Parker Ross and is the father of Reid, Reed, Raymond, and Quentin Perry. Richard's parents are…