EMNLP2024
DEM: Distribution Edited Model for Training with Mixed Data Distributions
Dhananjay Ram, Aditya Rawal, Momchil Hardalov, Nikolaos Pappas, Sheng Zha
摘要
Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models.The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging.Data mixing methods partially address this problem, albeit having a sub-optimal performance across data sources and require multiple expensive training runs.In this paper, we propose a simple and efficient alternative for better optimization of the data sources by combining models individually trained on each data source with the base model using basic element-wise vector operations.The resulting model, namely Distribution Edited Model (DEM), is 11 cheaper than standard data mixing and outperforms strong baselines on a variety of benchmarks, yielding upto 6.2% improvement on MMLU, 11.5% on BBH, 16.1% on DROP, 6% on MathQA, and 9.3% on HELM with models of size 3B to 13B.Notably, DEM does not require full re-training when modifying a single data-source, thus making it very flexible and scalable for training with diverse data sources.The code is available at https://github.com/amazon-science/demdistribution-edited-model.