NeurIPS2021

Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee

Flint Xiaofeng Fan, Yining Ma, Zhongxiang Dai, Wei Jing, Cheston Tan, Bryan Kian Hsiang Low

被引用 93 次

摘要

The growing literature of Federated Learning (FL) has recently inspired Federated Reinforcement Learning (FRL) to encourage multiple agents to federatively build a better decision-making policy without sharing raw trajectories. Despite its promising applications, existing works on FRL fail to I) provide theoretical analysis on its convergence, and II) account for random system failures and adversarial attacks. Towards this end, we propose the first FRL framework the convergence of which is guaranteed and tolerant to less than half of the participating agents being random system failures or adversarial attackers. We prove that the sample efficiency of the proposed framework is guaranteed to improve with the number of agents and is able to account for such potential failures or attacks. All theoretical results are empirically verified on various RL benchmark tasks. Our code is available at https://github.com/flint-xf-fan/Byzantine-Federeated-RL . Background Stochastic Variance-Reduced Gradient aims to solve min θ∈R d [J(θ) Under the common assumption of all function components J i being smooth and convex in θ, gradient descent (GD) achieves linear convergence in the number of iterations of parameter updates [25, 26] . However, every iteration of GD requires B gradient computations, which can be expensive for large B. To overcome this problem, stochastic GD (SGD) [27, 28] samples a single data point per iteration, which incurs lower per-iteration cost yet results in a sub-linear convergence rate [29] . For a better trade-off between convergence rate and per-iteration computational cost, the stochastic variancereduced gradient (SVRG) method has been proposed, which reuses past gradient computations to reduce the variance of the current gradient estimate [30] [31] [32] [33] . More recently, stochastically controlled stochastic gradient (SCSG) has been proposed for convex [34] or smooth non-convex objective function [35] , to further reduce the computational cost of SVRG especially when required is small in finding -approximate solution. Refer to Appendix A.1 for more details on SVRG and SCSG.