NeurIPS2024

Solving Zero-Sum Markov Games with Continuous State via Spectral Dynamic Embedding

Chenhao Zhou, Zebang Shen, Zhang Chao, Hanbin Zhao, Hui Qian

Abstract

In this paper, we propose a provably efﬁcient natural policy gradient algorithm called Spectral Dynamic Embedding Policy Optimization ( SDEPO ) for two-player zero-sum stochastic Markov games with continuous state space and ﬁnite action space. In the policy evaluation procedure of our algorithm, a novel kernel embedding method is employed to construct a ﬁnite-dimensional linear approximations to the state-action value function. We explicitly analyze the approximation error in policy evaluation, and show that SDEPO achieves an ˜ O ( 1 (1 − γ ) 3 ϵ ) last-iterate convergence to the ϵ − optimal Nash equilibrium, which is independent of the cardinality of the state space. The complexity result matches the best-known results for global convergence of policy gradient algorithms for single agent setting. Moreover, we also propose a practical variant of SDEPO to deal with continuous action space and empirical results demonstrate the practical superiority of the proposed method.