NeurIPS2022

Zero-Sum Stochastic Stackelberg Games

Denizalp Goktas, Sadie Zhao, Amy Greenwald

19 citations

Abstract

Zero-sum stochastic games have found important applications in a variety of fields, from machine learning to economics. Work on this model has primarily focused on the computation of Nash equilibrium due to its effectiveness in solving adversarial board and video games. Unfortunately, a Nash equilibrium is not guaranteed to exist in zero-sum stochastic games when the payoffs at each state are not convexconcave in the players' actions. A Stackelberg equilibrium, however, is guaranteed to exist. Consequently, in this paper, we study zero-sum stochastic Stackelberg games. Going beyond known existence results for (non-stationary) Stackelberg equilibria, we prove the existence of recursive (i.e., Markov perfect) Stackelberg equilibria (recSE) in these games, provide necessary and sufficient conditions for a policy profile to be a recSE, and show that recSE can be computed in (weakly) polynomial time via value iteration. Finally, we show that zero-sum stochastic Stackelberg games can model the problem of pricing and allocating goods across agents and time. More specifically, we propose a zero-sum stochastic Stackelberg game whose recSE correspond to the recursive competitive equilibria of a large class of stochastic Fisher markets. We close with a series of experiments that showcase how our methodology can be used to solve the consumption-savings problem in stochastic Fisher markets. Min-max optimization has paved the way for recent progress in a variety of fields, from machine learning to economics. In a constrained min-max optimization problem, min x∈X max y∈Y f (x, y), the objective function f : X ×Y → R is continuous, and the constraint sets X ⊂ R n and Y ⊂ R m are nonempty and compact. When f is convex-concave, and the constraint sets X and Y are convex, the seminal minimax theorem [1, 2] holds, i.e., min x∈X max y∈Y f (x, y) = max y∈Y min x∈X f (x, y), and such problems can be interpreted as solving a min-max (or zero-sum) one-shot simultaneousmove game between an outer player x and an inner player y with respective payoff functions -f , f and respective action sets X , Y, where the solutions (x * , y * ) ∈ X × Y are Nash equilibria: i.e., best responses to one another with x * ∈ arg min x∈X f (x, y * ) and y * ∈ arg max y∈Y f (x * , y). More generally, one can consider zero-sum stochastic games played over an infinite discrete time horizon N + . The game starts at some initial state S (0) ∼ µ (0) . At each subsequent time-step t ∈ N + , players encounter a new state s (t) ∈ S. After taking their respective actions (x (t) , y (t) ) from their respective action spaces X (s (t) ) ⊆ R n and Y(s (t) ) ⊆ R m , they receive payoffs r(s (t) , x (t) , y (t) ), and then either transition to a new state S (t+1) ∼ p(• | s (t) , x (t) , y (t) ) with probability γ, or the game ends with the remaining probability. The goal of the outer (resp. inner) player is 36th Conference on Neural Information Processing Systems (NeurIPS 2022).