NeurIPS2020

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

Chaobing Song, Yong Jiang, Yi Ma

被引用 25 次

摘要

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named Stochastic Variance Reduction via Accelerated Dual Averaging (SVR-ADA). In the nonstrongly convex and smooth setting, SVR-ADA can attain an O(1n)O\big(\frac{1}{n}\big)-accurate solution in O(nloglogn)O(n\log\log n) number of stochastic gradient evaluations, where nn is the number of samples; meanwhile, SVR-ADA matches the lower bound of this setting up to a loglogn\log\log n factor. In the strongly convex and smooth setting, SVR-ADA matches the lower bound in the regime nO(κ)n\le O(\kappa) while it improves the rate in the regime nκn\gg \kappa to O(nloglogn+nlog(1/(nϵ))log(n/κ))O(n\log\log n +\frac{n\log(1/(n\epsilon))}{\log(n/\kappa)}), where κ\kappa is the condition number. SVR-ADA improves complexity of the best known methods without use of any additional strategy such as optimal black-box reduction, and it leads to a unified convergence analysis and simplified algorithm for both the nonstrongly convex and strongly convex settings. Through experiments on real datasets, we also show the superior performance of SVR-ADA over existing methods for large-scale machine learning problems.