NeurIPS2023
Convex-Concave Zero-Sum Stochastic Stackelberg Games
Denizalp Goktas, Arjun Prakash, Amy Greenwald
Abstract
Zero-sum stochastic games have found important applications in a variety of fields, from machine learning to economics. Work on this model has primarily focused on the computation of Nash equilibrium due to its effectiveness in solving adversarial board and video games. Unfortunately, a Nash equilibrium is not guaranteed to exist in zero-sum stochastic games when the payoffs at each state are not convexconcave in the players' actions. A Stackelberg equilibrium, however, is guaranteed to exist. Consequently, in this paper, we study zero-sum stochastic Stackelberg games. Going beyond known existence results for (non-stationary) Stackelberg equilibria, we prove the existence of recursive (i.e., Markov perfect) Stackelberg equilibria (recSE) in these games, provide necessary and sufficient conditions for a policy profile to be a recSE, and show that recSE can be computed in (weakly) polynomial time via value iteration. Finally, we show that zero-sum stochastic Stackelberg games can model the problem of pricing and allocating goods across agents and time. More specifically, we propose a zero-sum stochastic Stackelberg game whose recSE correspond to the recursive competitive equilibria of a large class of stochastic Fisher markets. We close with a series of experiments that showcase how our methodology can be used to solve the consumption-savings problem in stochastic Fisher markets. Min-max optimization has paved the way for recent progress in a variety of fields, from machine learning to economics. These applications require computing solutions to a constrained min-max optimization problem i.e., min x2X max y2Y f (x, y), where the objective function f : X ⇥ Y ! R is continuous, and the constraint sets X ⇢ R n and Y ⇢ R m are nonempty and compact. When f is convex-concave, and the constraint sets X and Y are convex, the seminal minimax theorem [1, 2] holds, i.e., min x2X max y2Y f (x, y) = max y2Y min x2X f (x, y), and such problems can be interpreted as computing a Nash equilibrium of a simultaneous-move min-max (or zero-sum) game between an outer player x and an inner player y with respective payoff functions f , f and respective action sets X , Y, where the solutions (x ⇤ , y ⇤ ) 2 X ⇥ Y are best responses to one another. More generally, one can consider zero-sum stochastic games, played over an infinite discrete time horizon N + . The game starts at some initial state S (0) ⇠ µ (0) . At each subsequent time-step t 2 N + , players encounter a new state s (t) 2 S. After taking their respective actions (x (t) , y (t) ) from their respective action spaces X (s (t) ) ✓ R n and Y(s (t) ) ✓ R m , they receive payoffs r(s (t) , x (t) , y (t) ), and then either transition to a new state S (t+1) ⇠ p(• | s (t) , x (t) , y (t) ) with probability , or the game ends with the remaining probability. The goal of the outer (resp. inner) player is to play a sequence 36th Conference on Neural Information Processing Systems (NeurIPS 2022).