ICLR2022
Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game
Haobo Fu, Weiming Liu, Shuang Wu, Yijia Wang, Tao Yang, Kai Li, Junliang Xing, Bin Li, Bo Ma, Qiang Fu, Wei Yang
被引用 32 次
摘要
An optimal solution to a 2-player zero-sum IIG usually refers to a Nash Equilibrium (NE), where no player could improve by unilaterally deviating to a different policy. Figure: For instance, in the 2-player Rock-Paper-Scissors game, the NE is for both players playing the Uniform random policy: [ 1 3 , 1 3 , 1 3 ].