NeurIPS2023

Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise

Ta Duy Nguyen, Thien Hang Nguyen, Alina Ene, Huy L. Nguyen

43 citations

Abstract

In this work, we study the convergence in high probability of clipped gradient 1 methods when the noise distribution has heavy tails, i.e., with bounded p th mo-2 ments, for some 1 < p ≤ 2 . Prior works in this setting follow the same recipe of 3 using concentration inequalities and an inductive argument with union bound to 4 bound the iterates across all iterations. This method results in an increase in the 5 failure probability by a factor of T , where T is the number of iterations. We in-6 stead propose a new analysis approach based on bounding the moment generating 7 function of a well chosen supermartingale sequence. We improve the dependency 8 on T in the convergence guarantee for a wide range of algorithms with clipped 9 gradients, including stochastic (accelerated) mirror descent for convex objectives 10 and stochastic gradient descent for nonconvex objectives. Our high probability 11 bounds achieve the optimal convergence rates and match the best currently known 12 in-expectation bounds. Our approach naturally allows the algorithms to use time-13 varying step sizes and clipping parameters when the time horizon is unknown, 14 which appears difﬁcult or even impossible using the techniques from prior works. 15 Furthermore, we show that in the case of clipped stochastic mirror descent, several 16 problem constants, including the initial distance to the optimum, are not required 17 when setting step sizes and clipping parameters. 18