NeurIPS2023
Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization
Quanqi Hu, Dixian Zhu, Tianbao Yang
12 citations
Abstract
This paper investigates new families of compositional optimization problems, called non-smooth weakly-convex finite-sum coupled compositional optimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an ϵ-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms. many applications; iv) these approaches are not applicable to NSWC FCCO/TCCO with weakly convex f i . In fact, the double loop algorithm has been leveraged and extended to solving the two-way partial AUC maximization problem, a special case of NSWC FCCO [44], by sampling and updating a batch of coordinates of π at each iteration. However, it is less practical thus not implemented and its analysis did not explicitly show the convergence rate dependency on n + , n -and the block batch size. A special case of NSWC SCO problem was considered in [46] , which is given by They proposed two methods, SCS for smooth g(x) and SCS with SPIDER for non-smooth g(x). For both proposed methods, they proved a sample complexity of O(1/ϵ 6 ) for achieving an ϵ-stationary point of the objective's Moreau envelope 2 . We would like to remark that the above problem with a non-smooth g(x) is a special case of NSWC FCCO with only a convex outer function, one block and no coupled structure. Nevertheless, their algorithm for non-smooth g(•) suffers a limitation of requiring a large batch size in the order of O(1/ϵ 2 ) for achieving the same convergence. Finally, we would like to mention that non-smooth convex or strongly convex SCO problems have been considered in [27, 42, 26] , which, however, are out of scope of the present work. Preliminaries Let ∥ • ∥ be the Euclidean norm of a vector and spectral norm of a matrix. We use Π C [•] to denote the Euclidean projection onto v ∈ R m : ∥v∥ ≤ C. For vectors, inequality notations including ≤, ≥, >, < are used to denote element-wise inequality. For an expectation function f We recall the definition of general subgradient and subdifferential following [6, 24] . Definition 3.1 (subgradient and subdifferential). Consider a function f : R n → R ∪ ∞ and a point with f (x) finite. A vector v ∈ R n is a general subgradient of f at x, if f (y) ≥ f (x) + ⟨v, y -x⟩ + o(∥y -x∥), as y → x. The subdifferential ∂f (x) is the set of subgradients of f at point x.