ICLR2022

Actor-Critic Policy Optimization in a Large-Scale Imperfect-Information Game

Haobo Fu, Weiming Liu, Shuang Wu, Yijia Wang, Tao Yang, Kai Li, Junliang Xing, Bin Li, Bo Ma, Qiang Fu, Wei Yang

被引用 32 次

摘要

An optimal solution to a 2-player zero-sum IIG usually refers to a Nash Equilibrium (NE), where no player could improve by unilaterally deviating to a different policy. Figure: For instance, in the 2-player Rock-Paper-Scissors game, the NE is for both players playing the Uniform random policy: [ 1 3 , 1 3 , 1 3 ].