NeurIPS2024

Improved Analysis for Bandit Learning in Matching Markets

Fang Kong, Zilong Wang, Shuai Li

摘要

A rich line of works study the bandit learning problem in two-sided matching markets, where one side of market participants (players) are uncertain about their preferences and hope to find a stable matching during iterative matchings with the other side (arms). The state-of-the-art analysis shows that the player-optimal stable regret is of order O ( K log T/ ∆ 2 ) where K is the number of arms, T is the horizon and ∆ is the players’ minimum preference gap. However, this result may be far from the lower bound Ω(max N log T/ ∆ 2 , K log T/ ∆ ) since the number K of arms (workers, publisher slots) may be much larger than that N of players (employers in labor markets, advertisers in online advertising, respectively). In this paper, we propose a new algorithm and show that the regret can be upper bounded by O ( N 2 log T/ ∆ 2 + K log T/ ∆) . This result removes the dependence on K in the main order term and improves the state-of-the-art guarantee in common cases where N is much smaller than K . Such an advantage is also verified in experiments. In addition, we provide a refined analysis for the existing centralized UCB algorithm and show that, under α -condition, it achieves an improved O ( N log T/ ∆ 2 + K log T/ ∆) regret.