ICLR2022
Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game
Haobo Fu, Weiming Liu, Shuang Wu, Yijia Wang, Tao Yang, Kai Li, Junliang Xing, Bin Li, Bo Ma, Qiang Fu, Wei Yang
32 citations
Abstract
An optimal solution to a 2-player zero-sum IIG usually refers to a Nash Equilibrium (NE), where no player could improve by unilaterally deviating to a different policy. Figure: For instance, in the 2-player Rock-Paper-Scissors game, the NE is for both players playing the Uniform random policy: [ 1 3 , 1 3 , 1 3 ].