NeurIPS2023

Alternation makes the adversary weaker in two-player games

Volkan Cevher, Ashok Cutkosky, Ali Kavis, Georgios Piliouras, Stratis Skoulakis, Luca Viano

5 citations

Abstract

Motivated by alternating game-play in two-player games, we study an altenating variant of the Online Linear Optimization (OLO). In alternating OLO, a learner at each round t ∈ [ n ] selects a vector x t and then an adversary selects a cost-vector c t ∈ [ − 1 , 1] n . The learner then experiences cost ( c t + c t − 1 ) ⊤ x t instead of ( c t ) ⊤ x t as in standard OLO. We establish that under this small twist, the Ω( √ T ) lower bound on the regret is no longer valid. More precisely, we present two online learning algorithms for alternating OLO that respectively admit O ((log n ) 4 / 3 T 1 / 3 ) regret for the n -dimensional simplex and O ( ρ log T ) regret for the ball of radius ρ > 0 . Our results imply that in alternating game-play, an agent can always guarantee ˜ O ((log n ) 4 / 3 T 1 / 3 ) regardless the strategies of the other agent while the regret bound improves to O (log T ) in case the agent admits only two actions.