AAAI2024

Faster Stochastic Variance Reduction Methods for Compositional MiniMax Optimization

Jin Liu, Xiaokang Pan, Junwen Duan, Hongdong Li, Youqi Li, Zhe Qu

4 citations

Abstract

This paper delves into the realm of stochastic optimization for compositional minimax optimization-a pivotal challenge across various machine learning domains, including deep AUC and reinforcement learning policy evaluation. Despite its significance, the problem of compositional minimax optimization is still under-explored. Adding to the complexity, current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes. To respond to these constraints, this paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of O(κ 3 /ϵ 3 ) to obtain the ϵ-accuracy solution. We also demonstrate that NSTORM can achieve the same sample complexity under the Polyak-Łojasiewicz (PL)-condition-an insightful extension of its capabilities. Yet, NSTORM encounters an issue with its requirement for low learning rates, potentially constraining its real-world applicability in machine learning. To overcome this hurdle, we present ADAptive NSTORM (ADA-NSTORM) with adaptive learning rates. We demonstrate that ADA-NSTORM can achieve the same sample complexity but the experimental results show its more effectiveness. All the proposed complexities indicate that our proposed methods can match lower bounds to existing minimax optimizations, without requiring a large batch size in each iteration. Extensive experiments support the efficiency of our proposed methods. Faster Stochastic Variance Reduction Methods for Compositional MiniMax Optimization [2020], Chen et al. [2020], Rafique et al. [2022] across diverse scenarios. Various methodologies have been devised to address these challenges. Approaches such as Stochastic Gradient Descent Ascent (SGDA) have been proposed Lin, Jin, and Jordan [2020], accompanied by innovations like variance-reduced SGDA Luo et al. [2020], Xu et al. [2020] that aim to expedite convergence rates. Moreover, the application of Riemannian manifold-based optimization has been explored Huang, Gao, and Huang [2020] across different minimax scenarios, showcasing the breadth of methodologies available. However, all of these methods are only designed for the non-compositional problem. It indicates that the stochastic gradient can be assumed as an unbiased estimation of the full gradient of both the two sub-problems. Because it is too difficult to get an unbiased estimation in compositional optimization, these methods cannot be directly used to optimize the compositional minimax problem. Recent efforts have yielded just two studies on the nonconvex compositional minimax optimization problem (1), named Stochastic Compositional Gradient Descent Ascent (SCGDA) Gao et al. [2021] and Primal-Dual Stochastic Compositional Adaptive (PDSCA) Yuan et al. [2022]. However, they only can obtain the sample complexity O(κ 4 /ϵ 4 ) for achieving the ϵ-accuracy solution, which limits the applicability in many machine learning scenarios. Consequently, there is a pressing need to devise a more streamlined approach capable of tackling this challenge. In addition, some compositional minimization optimizations have been proposed, such as SCGD Wang, Fang, and Liu [2017] , STORM Cutkosky and Orabona [2019], and RECOVER Qi et al. [2021]. They may not be directly utilized for the minimax problem (1), because the minimization of objective f (g(x)) depends on the maximization of objective f (g(x), •) for any x ∈ X . Furthermore, the combined errors from Jacobian and gradient estimators worsen challenges in both sub-problems. We aim to develop an approach effectively tackling the compositional minimax problem (1), optimizing sample complexities efficiently, without requiring a large batch size. In this paper, to address the aforementioned challenges, we first develop a novel Nested STOchastic Recursive Momentum (NSTORM) method for the problem (1). The NSTORM method leverages the variance reduction technique Cutkosky and Orabona [2019] to estimate the inner/outer functions and their gradients. The theoretical result shows that our proposed NSTORM method can achieve the optimal sample complexity of O(κ 3 /ϵ 3 ). To the best of our knowledge, NSTORM is the first method to match the best sample complexity in existing minimax optimization studies Huang, Wu, and Hu [2023] , Luo et al. [2020] without requiring a large batch size. We also demonstrate that NSTORM can achieve the same sample complexity under the Polyak-Łojasiewicz (PL)-condition, which indicates an insightful extension of NSTORM. In particular, the central idea of our proposed NSTORM method and analysis has two aspects: 1) the variance reduction is applied to both function and gradient values, which is different from Gao et al. [2021] , Yuan et al. [2022] and 2) the estimator of the inner gradient ∇g(x) is updated with a projection to ensure that the error can be bounded regardless of the minimization sub-problem. Furthermore, because NSTORM requires a s