ICML2022

Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

Sajad Khodadadian, Pranay Sharma, Gauri Joshi, Siva Theja Maguluri

被引用 46 次

摘要

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling observations from the environment is usually split across multiple agents. However, transferring these observations from the agents to a central location can be prohibitively expensive in terms of communication cost, and it can also compromise the privacy of each agent's local behavior policy. Federated reinforcement learning is a framework in which agents collaboratively learn a global model, without sharing their individual data and policies. This global model is the unique fixed point of the average of local operators, corresponding to the agents. Each agent maintains a local copy of the global model and updates it using locally sampled data. In this paper, we show that by careful collaboration of the agents in solving this joint fixed point problem, we can find the global model times faster, also known as linear speedup. We first propose a general framework for federated stochastic approximation with Markovian noise and heterogeneity, showing linear speedup in convergence. We then apply this framework to federated reinforcement learning algorithms, examining the convergence of federated on-policy TD, off-policy TD, and -learning. of these algorithms employing the convergence result of the general federated stochastic approximation. We aim to demonstrate that our approach achieves linear speedup with respect to the number of agents involved. By addressing the challenges of heterogeneity among agents, our results can be applied to a wide range of off-policy and on-policy algorithms with function approximation. This contribution is expected to bridge the gap between stochastic approximation theory and practical federated RL applications, providing a robust framework for future research and development in the field [35, 49] . The main contributions and organization of the paper are summarized in the following. • In Section 3.2, we propose and study a general Federated HeteroGeneous Stochastic Approximation framework with Markovian noise (FeGSAM). Considering Markovian sampling noise and heterogeneity poses a significant challenge in the analysis of this algorithm. The convergence result for FeGSAM serves as a workhorse that enables us to analyze both federated TD-learning and federated -learning. We characterize the convergence of FeGSAM with a refined analysis of general stochastic approximation algorithms, fundamentally improving on prior work. • In the on-policy setting, in Section 5 we propose and analyze federated TD-learning with linear function approximation, where the agents' goal is to evaluate a common policy using on-policy samples collected in parallel from their environments. The agents only share the updated value function (not data) with the central server, thus saving communication cost. We prove a linear convergence speedup with the number of agents and also characterize the impact of communication frequency on the convergence. • In the tabular off-policy setting, in Section 6 we propose and analyze the federated off-policy TD-learning and federated -learning algorithms. Again, we establish a linear speedup in their convergence with respect to the number of agents, along with constant communication cost. Since every agent samples data using a private policy and only communicates the updated value or -function, the off-policy FRL helps to keep both the data and the policy private. • In the off-policy setting with function approximation, in Section 7 we propose and analyze the federated off-policy TD-learning with linear function approximation. Considering heterogeneity in the analysis of FeGSAM is essential to establish the sample complexity of this algorithm, and show a linear speedup with respect to the number of agents. However, we will observe that this setting requires more than a constant (error-dependent) number of communications. • We finally provide a sketch of the proof of the convergence of FeGSAM in Section 8. Related Work. In the related work section, we merge the results on stochastic approximation with their corresponding applications in RL. Single node TD-learning and Q-learning. Most existing RL literature is focused on designing and analyzing algorithms that run at a single computing node. In the on-policy setting, the asymptotic convergence of TD-learning was established in [11, 82, 85] , and the finite-sample bounds were studied in [10, 18, 23, 33, 45, 75] . In the off-policy setting, [56, 101] study the asymptotic and [16, 18] characterize the finite time bound of TD-learning. The -learning algorithm was first proposed in [93] . There has been a long line of work to establish the convergence properties of -learning. In particular, [9, 11, 12, 34, 84] characterize the asymptotic convergence of -learning, [4, 5, 16, 18, 87] study the finite-sample convergence bound in the mean-square sense, and [25, 48, 64] study the high-probability convergence bounds of -learning. Federated Learn