NeurIPS2023

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen McAleer, Andreas A. Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

被引用 16 次

DOI arXiv 出版方

摘要

We introduce a new approach for computing optimal equilibria and mechanisms via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe, via Lagrangian reformulation, that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum game. In essence, this is the game in which one player selects an equilibrium (or mechanism), while the other player attempts to find a profitable deviation. This reformulation allows us to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. By avoiding the use of an objective, our zero-sum game formulation always has bounded reward range and is not dependent on the selection of a large-enough Lagrange multiplier. This property allows us to solve the zero-sum game-and hence compute optimal equilibria, mechanisms, and so on, in both single-step and sequential settings-using out-of-the-box deep reinforcement learning algorithms for zero-sum games (e.g., deep PSRO). We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning. Our experiments also demonstrate that, in the deep reinforcement learning setting, the aforementioned bounded reward range property is crucial: experimental results with a more natural Lagrangian without the bounded reward property led to significantly worse performance. * Equal contribution.