NeurIPS2021

A Geometric Structure of Acceleration and Its Role in Making Gradients Small Fast

Jongmin Lee, Chanwoo Park, Ernest K. Ryu

23 citations

Abstract

Since Nesterov's seminal 1983 work, many accelerated first-order optimization methods have been proposed, but their analyses lacks a common unifying structure. In this work, we identify a geometric structure satisfied by a wide range of firstorder accelerated methods. Using this geometric insight, we present several novel generalizations of accelerated methods. Most interesting among them is a method that reduces the squared gradient norm with O(1/K 4 ) rate in the prox-grad setup, faster than the O(1/K 3 ) rates of Nesterov's FGM or Kim and Fessler's FPGM-m. 1 Obtaining an iterate xK with ∇LF (xK ) 2 ≤ requires K ≥ 66L(F (x 0 )-F ) 1 2 iterations for FISTA-G, and K ≥ 2 132L 2 x 0 -x 2 1 4 iterations for FISTA+FISTA-G, where K is a positive even integer.