ICLR2026
High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Shaojie Li, Pengwei Tang, Bowei Zhu, Yong Liu
100 citations
Abstract
Stochastic gradient descent with momentum (SGDM) is widely used in machine learning, yet high-probability learning bounds for SGDM in non-convex settings remain scarce. In this paper, we provide high-probability convergence bounds and generalization bounds for SGDM. First, we establish such bounds for the gradient norm in the general non-convex case. The resulting convergence bounds are tighter than existing theoretical results, and the obtained generalization bounds seem to be the first for SGDM. Next, under the Polyak-ojasiewicz condition, we derive bounds for the function-value error instead of the gradient norm, and the corresponding learning rates are faster than in the general non-convex case. Finally, by additionally assuming a mild Bernstein condition on the gradient, we obtain even sharper generalization bounds whose learning rates can reach in the low-noise regime, where is the sample size. Overall, we provide a systematic study of high-probability learning bounds for non-convex SGDM.