AAAI2026

Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing

Hong Xie, Haoran Gu, Yanying Huang, Tao Tan, Defu Lian

Abstract

This paper proposes a variant of multiple-play stochastic bandits tailored to resource allocation problems arising from LLM applications, edge intelligence applications, etc. The proposed model is composed of M arms and K plays. Each arm has a stochastic number of capacities, and each unit of capacity is associated with a reward function. Each play is associated with a priority weight. When multiple plays compete for the arm capacity, the arm capacity is allocated in a larger priority weight first manner. Instance independent and instance dependent regret lower bounds of Ω(α1σ √ KM T ) and Ω(α1σ 2 M ∆ ln T ) are proved, where α1 is the largest priority weight and σ characterizes the reward tail. When model parameters are given, we design an algorithm named MSB-PRS-OffOpt to locate the optimal play allocation policy with a computational complexity of O(M 3 K 3 ). Utilizing MSB-PRS-OffOpt as a subroutine, an approximate upper confidence bound (UCB) based algorithm is designed, which has instance independent and instance dependent regret upper bounds matching the corresponding lower bound up to factors of √ K ln KT and α1K 2 respectively. To this end, we address nontrivial technical challenges arising from optimizing and learning under a special nonlinear combinatorial utility function induced by the prioritized resource sharing mechanism.