ICLR2021
Byzantine-Resilient Non-Convex Stochastic Gradient Descent
Zeyuan Allen-Zhu, Faeze Ebrahimianghazani, Jerry Li, Dan Alistarh
18 citations
Abstract
We study adversary-resilient stochastic distributed optimization, in which m machines can independently compute stochastic gradients, and cooperate to jointly optimize over their local objective functions. However, an α-fraction of the machines are Byzantine, in that they may behave in arbitrary, adversarial ways. We consider a variant of this procedure in the challenging non-convex case. Our main result is a new algorithm SafeguardSGD which can provably escape saddle points and find approximate local minima of the non-convex objective. The algorithm is based on a new concentration filtering technique, and its sample and time complexity bounds match the best known theoretical bounds in the stochastic, distributed setting when no Byzantine machines are present. Our algorithm is very practical: it improves upon the performance of all prior methods when training deep neural networks, it is relatively lightweight, and it is the first method to withstand two recentlyproposed Byzantine attacks. * V1 appears on this date on openreview, V1.5 polishes writing, and V2 rewrites the experiments more carefully. V2 is to appear as the camera ready version for ICLR 2021. We would like to thank Chi Jin and Dong Yin for very insightful discussions on this subject, and an anonymous reviewer who suggested a simpler proof. F. E. and D. A.