ICML2022

Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

Ziyi Chen, Yi Zhou, Rong-Rong Chen, Shaofeng Zou

被引用 35 次

摘要

Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents' sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents' local information and are sample and communication-efficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to improve sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities O( -2 ln -1 ) and O( -3 ln -1 ), respectively, and achieve an improved communication complexity O( -1 ln -1 ). Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms. sizes and number of local averaging steps. Second, when using decentralized Markovian SGD to compute the inverse Fisher information matrix, we need to use an exponentially increasing batch size to achieve an optimized sample complexity bound. Such a Markovian SGD with adaptive batch size has not been studied before and can be of independent interest. Related Work Convergence analysis of AC and NAC. In the centralized setting, the AC algorithm was firstly proposed by [9] and later developed into the natural actor-critic (NAC) algorithm [10, 11] . Then, [30, 31] and [32, 33, 11] establish the asymptotic convergence rate of centralized AC and NAC, respectively. Furthermore, [34, 25, 24, 26, 27] and [34] establish the finite-time convergence rate of centralized AC and NAC, respectively. Moreover, [28] improve the finite-time sample complexities of the above works to the state-of-the-art result for both centralized AC and NAC by leveraging mini batch sampling, and our sample complexities match these state-of-the-art results. In the decentralized setting, a few works have established the almost sure convergence result of AC [21, 17, 29, 22 ], but they do not characterize the finite-time convergence rate and the sample complexity.To the best of our knowledge, there is no formally developed decentralized NAC algorithm. Decentralized TD-type algorithms. The finite-time convergence of decentralized TD(0) has been obtained using i.i.d samples [35, 36, 37, 38] and Markovian samples [39, 37] , respectively, without revealing the agents' local actions, policies and rewards. Decentralized off-policy TD-type algorithms have been studied in [40, 41, 42, 43] .