ICML2023

Adapting to game trees in zero-sum imperfect information games

Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

13 citations

Abstract

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn εoptimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound O(H(A X + B Y )/ε 2 ) on the required number of realizations to learn these strategies with high probability, where H is the length of the game, A X and B Y are the total number of actions for the two players. We also propose two Follow the Regularized leader (FTRL) algorithms for this setting: Balanced FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive FTRL which needs O(H 2 (A X + B Y )/ε 2 ) realizations without this requirement by progressively adapting the regularization to the observations. Algorithm Sample complexity Structure-free MCCFR (Farina et al., 2020; Bai et al., 2022) O(H 4 (A X + B Y )/ε 2 ) IXOMD (Kozuno et al., 2021) O(H 2 (XA X + Y B Y )/ε 2 ) Balanced OMD (Bai et al., 2022) O(H Table 1 . Sample complexity for episodic, finite, two-player, zero-sum IIGs. Structure-free algorithm designates an algorithm that does not need to know the structure of the information set spaces in advance. The symbol O hides dependencies logarithmic in AX , BY , ε and δ. Note that for algorithms that only work with fixed action sets of size A and B we have AX = AX and BY = BY .