NeurIPS2020

Finding Second-Order Stationary Points Efficiently in Smooth Nonconvex Linearly Constrained Optimization Problems

Songtao Lu, Meisam Razaviyayn, Bo Yang, Kejun Huang, Mingyi Hong

19 citations

Abstract

This paper proposes two efficient algorithms for computing approximate secondorder stationary points (SOSPs) of problems with generic smooth non-convex objective functions and generic linear constraints. While finding (approximate) SOSPs for the class of smooth non-convex linearly constrained problems is computationally intractable, we show that generic problem instances in this class can be solved efficiently. Specifically, for a generic problem instance, we show that certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions. Based on this condition, we design an algorithm named Successive Negative-curvature grAdient Projection (SNAP), which performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs. SNAP is a second-order algorithm that requires O(max1/ϵ 2 G , 1/ϵ 3 H ) iterations to compute an (ϵ G , ϵ H )-SOSP, where O hides the iteration complexity for eigenvalue-decomposition. Building on SNAP, we propose a first-order algorithm, named SNAP + , that requires O(1/ϵ 2.5 ) iterations to compute (ϵ, √ ϵ)-SOSP. The per-iteration computational complexities of our algorithms are polynomial in the number of constraints and problem dimension. To the best of our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate are designed to find SOSPs of the important class of non-convex problems with linear constraints (almost surely). * contributed to this work when he was working as a Ph.D. student at the University of Minnesota. 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada. matrix M ∈ R n×p is to be factorized into two nonnegative matrices . Further, for non-convex problems with ℓ 1 regularizer (such as sparse PCA [2]),we need to solve min g(x) + ∥x∥ 1 , which can be equivalently written as (2) Applications having these linear inequality constraints include neural networks training with the nonnegative constraint [3], nonnegative tensor factorization [4], statistical learning with simplex constraints [5], and portfolio optimization under budget constraints [6], to name just a few. Recently, algorithms that can escape strict saddle points (stationary points at which there exist directions of negative curvature) for unconstrained non-convex problems have attracted considerable research attention, and they find applications in tensor decomposition [7], phase retrieval [8], lowrank matrix factorization [9], etc. However, it is not straightforward to extend these "saddle-point escaping" algorithms to problems with simple constraints or non-smooth regularizers. As will be seen shortly, even checking that a solution is a second-order stationary point (SOSP) for linearly constrained problems can be daunting. The main task of this paper is to identify situations in which finding a SOSP for problem (1) is easy and to design efficient algorithms for this task. Related work. For unconstrained smooth problems, there has been a line of work that develops algorithms by using both the gradient direction and negative curvature so that SOSPs can be attained within a certain number of iterations [10, 11, 12] . For example, based on the fast top eigenvector computation, an accelerated gradient method [11] is guaranteed to converge to SOSPs with a quantifiable rate. By exploiting the eigendecomposition, the convergence rate of some variants of the Newton method to SOSPs has been shown in [13, 14, 15, 16] , where the algorithm is assumed to be able to access the (inexact) Hessian matrix around strict saddle points. Another way of finding negative curvature is to occasionally add noise to the iterates. Algorithms in this class include a perturbed version of (accelerated) gradient descent (GD) [17, 18] and NEgative-curvature-Originatedfrom-Noise (NEON) [19] , and their extensions [19, 20, 21, 22] . In practice, there may be some constraints, such as equality and inequality constraints. For the equality constrained case (e.g., linear/manifolds constraints), negative curvature methods, including both first-and second-order ones, have been proposed for which SOSPs can be obtained either asymptotically [23, 24] or with provable convergence rates [25, 26] under some mild conditions. Despite these recent exciting developments, the aforementioned methods are unable to incorporate even the simplest linear inequality constraints. Existing methods either directly rely on secondorder descent directions of the objective function [27, 10] , or use this information together with other methods such as the trust region method [28], cubic regularized Newton's method [29], primaldual algorithm [30], etc. These algorithms are generally undesirable for large-scale problems due to the scalability issues when computing the second-order information. However, to the best of our knowledge, there has been no first-order method that can provably compute SOS