NeurIPS2022

Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games

Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

42 citations

Abstract

In this paper we establish efficient and uncoupled learning dynamics so that, when employed by all players in a general-sum multiplayer game, the swap regret of each player after $T$ repetitions of the game is bounded by $O(\log T)$ , improving over the prior best bounds of $O(\log^4 (T))$ . At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a time-invariant learning rate, the second-order path lengths of the dynamics up to time $T$ are bounded by $O(\log T)$ , a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way optimistic regularized learning with the use of self-concordant barriers. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).