STOC2025
Faster Rates for No-Regret Learning in General Games via Cautious Optimism
Ashkan Soleymani, Georgios Piliouras, Gabriele Farina
被引用 1 次
摘要
We establish the first uncoupled learning algorithm that attains O(n log 2 d log T ) per-player regret in multi-player general-sum games, where n is the number of players, d is the number of actions available to each player, and T is the number of repetitions of the game. Our results exponentially improve the dependence on d compared to the O(n d log T ) regret attainable by , and also reduce the dependence on the number of iterations T from log 4 T to log T compared to Optimistic Hedge, the previously well-studied algorithm with O(n log d log 4 T ) regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.