NeurIPS2023

Diverse Conventions for Human-AI Collaboration

Bidipta Sarkar, Andy Shih, Dorsa Sadigh

被引用 17 次

摘要

Conventions are crucial for strong performance in cooperative multi-agent games, because they allow players to coordinate on a shared strategy without explicit communication. Unfortunately, standard multi-agent reinforcement learning techniques, such as self-play, converge to conventions that are arbitrary and non-diverse, leading to poor generalization when interacting with new partners. In this work, we present a technique for generating diverse conventions by (1) maximizing their rewards during self-play, while (2) minimizing their rewards when playing with previously discovered conventions (cross-play), stimulating conventions to be semantically different. To ensure that learned policies act in good faith despite the adversarial optimization of cross-play, we introduce mixed-play, where an initial state is randomly generated by sampling self-play and cross-play transitions and the player learns to maximize the self-play reward from this initial state. We analyze the benefits of our technique on various multi-agent collaborative games, including Overcooked, and find that our technique can adapt to the conventions of humans, surpassing human-level performance when paired with real users. 1 Unfortunately, self-play results in incredibly brittle policies that cannot work well with humans who may have different conventions [29] , including conventions that are more intuitive for people to use. In the cooking task, self-play could converge to a convention that expects player 2 to bring onions to AI player 1 from the right side. If a human player instead decides to bring onions from the left side to the AI player 1, the AI's policy may not react appropriately since this is not an interaction 1 Supplemental videos can be found on our website along with source code and anonymized user study data. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).