ICLR2026
IDEAL: Data Equilibrium Adaptation for Multi-Capability Language Model Alignment
Chenlin Ming, Chendi Qu, Qizhi Pei, Zhuoshi Pan, Yu Li, Xiaoming Duan, Lijun Wu, Conghui He
被引用 3 次
摘要
Large Language Models (LLMs) have achieved impressive performance through Supervised Fine-tuning (SFT) on diverse instructional datasets. When training on multiple capabilities simultaneously, the mixture training dataset, governed by volumes of data from different domains, is a critical factor that directly impacts the final model's performance. Unlike many studies that focus on enhancing the quality of training datasets through data selection methods, few works explore the intricate relationship between the compositional quantity of mixture training datasets and the emergent capabilities of LLMs. Given the availability of a high-quality multi-domain training dataset, understanding the impact of data from each domain on the model's overall capabilities is crucial for preparing SFT data and training a well-balanced model that performs effectively across diverse domains. In this work, we introduce IDEAL, an innovative data equilibrium adaptation framework designed to effectively optimize volumes of data from different domains within mixture SFT datasets, thereby enhancing the model's alignment and performance across multiple capabilities. IDEAL employs a gradient-based approach to iteratively refine the training data distribution, dynamically adjusting the volumes of domain-specific data based on their impact on downstream task performance. By leveraging this adaptive mechanism, IDEAL ensures a balanced dataset composition, enabling the model to achieve robust generalization and consistent proficiency across diverse tasks. Experiments across different capabilities demonstrate that IDEAL outperforms conventional uniform data allocation strategies, achieving a comprehensive improvement of approximately 7% in multi-task evaluation scores. Introduction Recent advancements in LLMs have demonstrated their remarkable ability to master diverse capabilities [14, 67, 29, 41, 33] through Supervised-Fine-tuning (SFT) on instruction-aligned datasets [37, 38, 1, 60] . By training on heterogeneous tasks such as mathematical reasoning [39, 26, 48, 45] , code generation [13, 54] , and creative writing [59, 21, 18] , models like , Claude [3], achieve promising performance across various domains. However, empirical studies reveal that naively merging datasets for multi-objective fine-tuning often degrades performance compared to single-task specialization [57, 52, 14] . To mitigate the aforementioned issue, a common approach is to adjust the training data distribution [62, 64] , thereby regulating the volume of data from each domain within the mixed dataset. However, critical challenges persist: the optimal mixture proportions of these domains are poorly understood and how to adjust the † Work during internship at Shanghai AI Lab.