ICML2025

Toward a Unified Theory of Gradient Descent under Generalized Smoothness

Alexander Tyurin

Abstract

We study the classical optimization problem min x∈R d f (x) and analyze the gradient descent (GD) method in both nonconvex and convex settings. It is well-known that, under the L-smoothness assumption (∥∇ 2 f (x)∥ ≤ L), the optimal point minimizing the quadratic upper bound f Surprisingly, a similar result can be derived under the ℓ-generalized smoothness assumption (∥∇ 2 f (x)∥ ≤ ℓ(∥∇f (x)∥)). In this case, we derive the step size . Using this step size rule, we improve upon existing theoretical convergence rates and obtain new results in several previously unexplored setups.