ACL2024

Multimodal Instruction Tuning with Conditional Mixture of LoRA

Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang

Abstract

Enabling efficient adaptation of pre-trained large models to various downstream applications by only fine-tuning a small fraction of model's parameters PEFT in Multimodal Instruction Tuning: Fine-tuning a limited portion of shared parameters for diverse multimodal instruction tasks simultaneously is likely to lead to task interference Task Interference: Sharing all parameters among different tasks results in performance degradation for a subset of tasks. Our research seeks to explore and address task interference in parameter-efficient multimodal instruction tuning. Specifically, we aim to answer two critical research questions: (1) Does task interference exist in parameter-efficient multimodal instruction tuning? (2) How can we effectively mitigate this issue for robust performance across various tasks? Task Interference in Multimodal Instruction Tuning with LoRA We investigate task interference in parameter-efficient multimodal instruction tuning by analyzing gradient direction conflicts between task pairs. We compute the task interference matrix I ∈ R M ×M for LoRA decomposition matrices A and B, where M is the number of tasks and I i,j quantify the interference of task j on task i. Our results demonstrate significant task interference in both shallow and deep Transformer layers for LoRA A and B, with negative influences (blue) suggesting that learning one task can hinder another. Conversely, positive effects (red) indicate that one task's learning may enhance another's performance. These findings highlight notable task interference in parameter-efficient multimodal instruction tuning and reinforce the need for effective adaption methods to ensure robust performance across diverse multimodal tasks.