WWW2026

BalDRO: A Distributionally Robust Optimization based Framework for Large Language Model Unlearning

Pengyang Shao, Naixin Zhai, Lei Chen, Yonghui Yang, Fengbin Zhu, Xun Yang, Meng Wang

被引用 8 次

摘要

As Large Language Models (LLMs) increasingly shape online content, how to remove targeted information from well-trained LLMs (also known as LLM unlearning) has become increasingly critical for web governance. A key challenge in LLM unlearning lies in the sample-wise imbalance within the forget set: different samples exhibit widely varying unlearning difficulty, leading to asynchronous forgetting speeds where some knowledge remains insufficiently erased while others become over-forgotten. To address this challenge, we propose BalDRO, a novel and efficient framework for balanced LLM unlearning. BalDRO formulates unlearning as a min-sup process, where the inner process identifies a worst-case data distribution that adaptively emphasizes hard-to-unlearn samples, while the outer process updates model parameters based on the worst-case data distribution. We instantiate this formulation through two efficient variants: BalDRO-G, a discrete GroupDRObased approximation that focuses on high-loss subsets, and BalDRO-DV, a continuous Donsker-Varadhan dual method that enables smooth, adaptive weighting within standard LLM training pipelines. Extensive experiments on the TOFU and MUSE benchmarks demonstrate the effectiveness of our proposed BalDRO, yielding significant improvements in both forgetting quality and model utility over existing methods. For reproducibility, we have released the code for BalDRO 1 . CCS Concepts • Security and privacy → Privacy protections; • Computing methodologies → Natural language processing; Machine learning.