ICML2021
Trajectory Diversity for Zero-Shot Coordination
Andrei Lupu, Brandon Cui, Hengyuan Hu, Jakob N. Foerster
157 citations
Abstract
We study the problem of zero-shot coordination (ZSC), where agents must independently produce strategies for a collaborative game that are compatible with novel partners not seen during training. In particular, our first contribution is to consider the need for diversity in generating such agents. Because self-play agents control their own trajectory distribution during training, their policy only performs well on this exact distribution. As a result, they achieve low scores in ZSC, since playing with another agent is likely to put them in situations they have not encountered during training. To address this issue, we train a common best response (BR) to a population of agents, which we regulate to be as diverse as possible. For that purpose, we introduce Trajectory Diversity (TrajeDi) - a differentiable objective for generating diverse reinforcement learning (RL) policies. We present TrajeDi as a generalization of the Jensen-Shannon divergence (JSD) between policies and motivate it experimentally in a simple matrix game, where it allows to find the unique ZSC-optimal solution.