ICML2025

Explicit Exploration for High-Welfare Equilibria in Game-Theoretic Multiagent Reinforcement Learning

Austin A. Nguyen, Anri Gu, Michael P. Wellman

Abstract

Iterative extension of empirical game models through deep reinforcement learning (RL) has proved an effective approach for finding equilibria in complex games. When multiple equilibria exist, we may have preferences among solutions. We address this equilibrium selection issue in the context of Policy Space Response Oracles (PSRO), a flexible game-solving framework based on deep RL, by skewing strategy generation towards higher-welfare solutions. At each iteration, we create an exploration policy that imitates high welfare-yielding behavior and train a response to the current solution, regularized to be similar to the exploration policy. With no additional simulation expense, our approach, named Ex 2 PSRO, tends to find higher welfare equilibria than vanilla PSRO in two benchmarks: a sequential bargaining game and a social dilemma game. Further experiments demonstrate Ex 2 PSRO's composability with other PSRO variants and illuminate the relationship between exploration policy choice and algorithmic performance.