ICLR2026

High-Probability Bounds for the Last Iterate of Clipped SGD

Savelii Chezhegov, Daniela Angela Parletta, Andrea Paudice, Eduard Gorbunov

Abstract

We study the problem of minimizing a convex objective when only noisy gradient estimates are available. Assuming that stochastic gradients have finite α\alpha-th moments for some α(1,2]\alpha \in (1,2], we establish - for the first time - a high-probability convergence guarantee for the last iterate of clipped stochastic gradient descent (Clipped-SGD) on smooth objectives. In particular, we prove a rate of 1/K(2α2)/(3α)1/K^{(2\alpha-2)/(3\alpha)} with only polylogarithmic dependence on the confidence parameter. In addition, we introduce a new technique for deriving in-expectation convergence guarantees from high-probability bounds for methods with almost surely bounded updates, and apply it to obtain expectation guarantees for Clipped-SGD. Finally, we complement our theoretical analysis with empirical results that support and illustrate our findings.