KDD2026

Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-task Learning

Ziyu Zhao, Yixiao Zhou, Xin Yu, Zhi Zhang, Didi Zhu, Tao Shen, Zexi Li, Jinluan Yang, Xuwu Wang, Jing Su, Kun Kuang, Zhongyu Wei, Fei Wu, Yu Cheng

被引用 13 次

摘要

Low-Rank Adaptation (LoRA) is widely used for adapting large language models (LLMs) to specific domains due to its efficiency and modularity. However, vanilla LoRA struggles with task conflicts in multi-task scenarios. Recent works adopt Mixture of Experts (MoE) by treating each LoRA module as an expert, thereby mitigating task interference through multiple specialized LoRA modules. While effective, these methods often isolate knowledge within individual tasks, failing to fully exploit the shared knowledge across related tasks. In this paper, we establish a connection between single LoRA and multi-LoRA MoE, integrating them into a unified framework. We demonstrate that the dynamic routing of multiple LoRAs is functionally equivalent to rank partitioning and block-level activation within a single LoRA. To systematically study the role of expert granularity in multi-task learning, we conduct an in-depth investigation within our unified framework. Our empirical results show that a finer-grained expert partitioning not only yields significant performance gains but also captures more diverse parameter patterns. These empirical findings are supported by our theoretical analysis, which proves that finer granularity expands parameter space diversity and tightens the model's error bound. Building on these findings, we propose Single-ranked Mixture of Experts LoRA (SMoRA ), which embeds MoE into LoRA by treating each rank as an independent expert. With a dynamic rank-wise activation mechanism, SMoRA facilitates a flexible composition of knowledge, enabling the model to learn deeper and more diverse features while mitigating task conflicts. Experiments demonstrate that SMoRA activates fewer parameters yet achieves better performance in multi-task scenarios.