WWW2026
BARouter: A Budget-adaptive Online Large Language Model Router Framework
Lingkai Zu, Xiyue Peng, Xin Liu
Abstract
With the rapid advancement of large language models (LLMs), a diverse ecosystem of models with different scales and domain specializations has emerged, including LLM-based web agents and online multi-LLM server systems. LLM routing, which opportunistically leverages this diversity to balance response quality and computational cost, has become a central problem in optimizing the performance of LLM serving systems. We propose the Budget-Adaptive Router (BARouter), a budget-adaptive online routing framework for LLM serving systems. BARouter dynamically adjusts its routing policy for incoming queries based on the estimated response quality, query costs, and real-time budget consumption, enabling seamless adaptation to varying initial budgets and evolving input distributions without manual hyperparameter tuning. Theoretically, we prove that BARouter achieves sublinear regret over the time horizon T. The extensive experiments show that BARouter effectively allocates the right budget to the queries at the right time, consistently outperforming baseline algorithms, and remains robust to varying budget levels and shifting query distributions.