ICLR2026

Derandomized Online-to-Non-convex Conversion for Stochastic Weakly Convex Optimization

Fanfan Ji, Xiaotong Yuan

Abstract

Online-to-non-convex conversion (O2NC) is an online updates learning framework for producing Goldstein (δ,ϵ)(\delta,\epsilon)-stationary points of non-smooth non-convex functions with optimal oracle complexity O(δ1ϵ3)\mathcal{O}(\delta^{-1} \epsilon^{-3}). Subject to auxiliary random interpolation or scaling, O2NC recapitulates the stochastic gradient descent with momentum (SGDM) algorithm popularly used for training neural networks. Randomization, however, introduces deviations from practical SGDM. So a natural question arises: Can we derandomize O2NC to achieve the same optimal guarantees while resembling SGDM? On the negative side, the general answer is no due to the impossibility results of , showing that no dimension-free rate can be achieved by deterministic algorithms. On the positive side, as the primary contribution of the present work, we show that O2NC can be naturally derandomized for weakly convex functions. Remarkably, our deterministic algorithm converges at an optimal rate as long as the weak convexity parameter is no larger than O(δ1ϵ1)\mathcal{O}(\delta^{-1}\epsilon^{-1}). In other words, the stronger stationarity is expected, the higher non-convexity can be tolerated by our optimizer. Additionally, we develop a periodically restarted variant of our method to enable more progressive updates when the iterates are far from stationarity. The resulting algorithm, which can be viewed as a momentum-restarted variant of SGDM, has been empirically demonstrated to be effective and efficient for training ResNet and ViT models.