ICML2020

A new regret analysis for Adam-type algorithms

Ahmet Alacaoglu, Yura Malitsky, Panayotis Mertikopoulos, Volkan Cevher

50 citations

Abstract

In this paper, we focus on a theory-practice gap for Adam and its variants (AMSgrad, AdamNC, etc.). In practice, these algorithms are used with a constant first-order moment parameter β1\beta_{1} (typically between 0.90.9 and 0.990.99). In theory, regret guarantees for online convex optimization require a rapidly decaying β10\beta_{1}\to0 schedule. We show that this is an artifact of the standard analysis and propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant β1\beta_{1}, without further assumptions. We also demonstrate the flexibility of our analysis on a wide range of different algorithms and settings.