NeurIPS2022

Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games

Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

42 citations

Abstract

In this paper we establish efficient and uncoupled learning dynamics so that, when employed by all players in a general-sum multiplayer game, the swap regret of each player after TT repetitions of the game is bounded by O(logT)O(\log T), improving over the prior best bounds of O(log4(T))O(\log^4 (T)). At the same time, we guarantee optimal O(T)O(\sqrt{T}) swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a time-invariant learning rate, the second-order path lengths of the dynamics up to time TT are bounded by O(logT)O(\log T), a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way optimistic regularized learning with the use of self-concordant barriers. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).