ICML2025
Optimal Algorithm for Max-Min Fair Bandit
Zilong Wang, Zhiyao Zhang, Shuai Li
摘要
Multi-player multi-armed bandit (MP-MAB) has been widely studied owing to its diverse applications across numerous domains. We consider an MP-MAB problem where N players compete for K arms in T rounds. The reward distributions are heterogeneous where each player has a different expected reward for the same arm. When multiple players select the same arm, they collide and obtain zero rewards. In this paper, our target is to find the max-min fairness matching that maximizes the reward of the player who receives the lowest reward. This paper improves the existing max-min regret upper bound of O(exp(1/∆) + K 3 log T log log T ). More specifically, our decentralized fair elimination algorithm (DFE) deals with heterogeneity and collision carefully and attains a regret upper bound of O((N 2 + K) log T /∆), where ∆ is the minimum reward gap between max-min value and sub-optimal arms. In addition, this paper also provides an Ω(maxN 2 , K log T /∆) regret lower bound for this problem, which indicates that our algorithm is optimal with respect to key parameters T, N, K, and ∆. Additional numerical experiments also show the efficiency and improvement of our algorithms.