NeurIPS2022

CCCP is Frank-Wolfe in disguise

Alp Yurtsever, Suvrit Sra

21 citations

Abstract

This paper uncovers a simple but rather surprising connection: it shows that the well-known convex-concave procedure (CCCP) and its generalization to constrained problems are both special cases of the Frank-Wolfe (FW) method. This connection not only provides insight of deep (in our opinion) pedagogical value, but also transfers the recently discovered convergence theory of nonconvex Frank-Wolfe methods immediately to CCCP, closing a long-standing gap in its non-asymptotic convergence theory. We hope the viewpoint uncovered by this paper spurs the transfer of other advances made for FW to both CCCP and its generalizations. Main contributions In light of this short background, we summarize now the key contributions of this paper. We recognize the basic (unconstrained) CCCP method to be a special case of Frank-Wolfe. By the same argument, even its generalization to convex constrained instances of (1.1) is a special case of FW. This realization allows us to immediately transfer recently discovered non-asymptotic convergence theory of FW to CCCP and convex constrained CCCP. Building on the above connection, we subsequently propose a new variant of FW (called FW+) that applies to the most general form of CCCP (called CCCP+), where the constraint set D in (1.1) is specified via several DC constraints. The variant FW+ not only allows us to syntactically view CCCP+ as a special case, but more importantly allows us to transfer non-asymptotic convergence guarantees that we prove for FW+ directly to CCCP+. While the FW+ variant and its analysis are slightly more subtle than basic FW, the connection between CCCP and FW uncovered by our work is remarkably simple, which makes it all the more surprising that it was missed in the vast body of previous work on FW and CCCP. The introduction of FW+ itself is an immediate consequence of the connection uncovered. It is also quite plausible that the connections described in this paper will have various further consequences, including development of new variants of CCCP and other related methods. For a more concrete discussion of implications, see the discussion in Section 5. Finally, we note a few additional connections of the view uncovered in this paper in the appendix. In particular, we show that various other methods such as the proximal point method, mirror descent and mirror prox can also be seen as special instances of the FW method. A deeper exploration of these connections is left as future work. Related work CCCP. The CCCP method of Yuille and Rangarajan ( 2003 ) is a special case of the more general DC algorithm (DCA) (Tao, 1997; Le Thi and Pham Dinh, 2018) . CCCP often provides a strong baseline method for tackling non-convex problems where a DC structure is already known or relatively easy to obtain-see (Lipp and Boyd, 2016) for some recent variations on the CCCP method. As already noted, asymptotic convergence of CCCP follows from the more general convergence theory of DCA (Tao, 1997), while a simplified, more direct convergence analysis was obtained in (Lanckriet and Sriperumbudur, 2009) , who analyzed convergence of both function values and iterates; moreover, these convergence results apply to the most general CCCP+ formulation (DC objective with DC constraints). A more recent work (Khamaru and Wainwright, 2018) considers gradient and subgradient based alternatives to DCA and CCCP, establishing their corresponding non-asymptotic convergence; however, its focus is on alternative and its guarantees do not transfer to CCCP, and thus a fortiori not to CCCP+. Both CCCP and DCA have received a large number of applications, in a variety of areas including statistics, machine learning, signal processing, operations research, and many more-we refer the reader to the extensive list documented in the survey (Le Thi and Pham Dinh, 2018) and to examples in (Yuille and Rangarajan, 2003) . After this work was completed, we discovered an important item of related work (Abbaszadehpeivasti et al., 2021) . The authors therein propose non-asymptotic convergence rate guarantees for unconstrained DC optimization. They present two main results: (i) Under the assumption that f and g are convex and at least one of them is smooth, they show min τ∈ [k]