EMNLP2024

MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic

Yuyan Zhou, Liang Song, Bingning Wang, Weipeng Chen

被引用 6 次

摘要

The advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks.Task arithmetic has emerged as a costeffective approach for MTL.It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model.However, the current lack of a method that can achieve optimal performance with low computational cost and protecting the data privacy, which limits their application to LLMs.In this paper, we propose Model Exclusive Task Arithmetic for merging GPT-scale models (MetaGPT), which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model.Since data privacy limits the use of multi-task training data, we leverage LLMs' local linearity and task vectors' orthogonality to separate the data term and scaling coefficients term and derive a model-exclusive task arithmetic method.Our proposed MetaGPT is dataagnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.Extensive experiments demonstrate that MetaGPT leads to improvements in task arithmetic and achieves state-of-the-art performance on multiple tasks.