NeurIPS2021

Near-Optimal No-Regret Learning in General Games

Constantinos Daskalakis, Maxwell Fishelson, Noah Golowich

被引用 130 次

摘要

We show that Optimistic Hedge – a common variant of multiplicative-weights-updates with recency bias – attains poly(log T ) regret in multi-player general-sum games. In particular, when every player of the game uses Optimistic Hedge to iteratively update her strategy in response to the history of play so far, then after T rounds of interaction, each player experiences total regret that is poly(log T ) . Our bound improves, exponentially, the O ( T 1 / 2 ) regret attainable by standard no-regret learners in games, the O ( T 1 / 4 ) regret attainable by no-regret learners with recency bias [SALS15], and the O ( T 1 / 6 ) bound that was recently shown for Optimistic Hedge in the special case of two-player games [CP20]. A corollary of our bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of ˜ O (cid:0) 1 T (cid:1) .