NeurIPS2025
In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning
Chao Li, Bingkun Bao, Yang Gao
摘要
This paper studies fully decentralized cooperative multi-agent reinforcement learning, wherein each agent solely observes the states, its local actions, and the shared team rewards. For each agent, the lack of access to other agents' actions typically induces non-stationarity during value function updates and relative overgeneralization during value function estimation, which together impede effective cooperative policy learning. However, existing works fail to address both issues simultaneously, due to their inability to model the joint policy of other agents in a fully decentralized setting. To overcome this limitation, we propose a novel method termed Dynamics-Aware Context (DAC), which formalizes the task, as locally perceived by each agent, as a Contextual Markov Decision Process, and addresses both non-stationarity and relative overgeneralization through dynamics-aware context modeling. Specifically, DAC attributes the non-stationary local task dynamics to switches among unobserved contexts, each corresponding to a distinct joint policy of the other agents. Then, DAC models the step-wise dynamics distribution using latent variables, and refers to them as contexts. Accordingly, DAC learns a context-based value function to address the non-stationarity issue during per-agent value function updates. For value function estimation, an optimistic marginal value is derived to promote the selection of cooperative actions, thus addressing the relative overgeneralization issue. Empirically, we evaluate DAC across various cooperative tasks, and the results demonstrate that DAC consistently outperforms multiple baselines, highlighting its effectiveness.