AAAI2026

Hybrid Routing for a Mixture of LoRA Experts

Yitong Huang, Ziqi Yang, Zihui Wang, Jianzhong Qi, Rongshan Yu, Xiaoliang Fan, Cheng Wang

Abstract

Combining Mixture of Experts (MoE) with Low-Rank Adaptation (LoRA) has shown promising efficiency in multi-task instruction tuning for Large Language Models (LLMs). While existing routing schemes for such MoE systems employ auxiliary functions to ensure both expert selection certainty and workload balance among experts, they are hindered by two critical challenges: (1) Existing methods overlook the evolving cross-expert relationships across layers, leading to inefficient expert utilization. (2) The auxiliary functions fail to incorporate cross-task semantic characteristics during expert assignment, leading to suboptimal task adaptation. To address these challenges, we propose Hybrid routing for a Mixture of LoRA Experts (HotMoE), a novel multi-task instruction tuning framework that adapts hierarchical routing to the distinct characteristics of different LLM layers. First, we design a hybrid routing module. In lower layers, expert-expert attention facilitates cross-task collaboration and generalization. In higher layers, token-expert attention enables precise alignment between task semantics and specialized experts. Second, we introduce a similarity-guided auxiliary loss module to regularize routing decisions by exploiting hidden state similarities. This loss synergistically reinforces expert specialization without sacrificing certainty of expert selection by promoting cohesive activation patterns among semantically related tasks while sharpening distinctions between conflicting ones. Experiments across two multi-task instruction tuning scenarios covering seven NLP benchmarks demonstrate that HotMoE consistently outperforms all baselines, improving Mean Relative Difference by up to 1.68% with only 3.1% of trainable parameters.