NeurIPS2023
Continuous-time Analysis of Anchor Acceleration
Jaewook J. Suh, Jisun Park, Ernest K. Ryu
14 citations
Abstract
Recently, the anchor acceleration, an acceleration mechanism distinct from Nesterov's, has been discovered for minimax optimization and fixed-point problems, but its mechanism is not understood well, much less so than Nesterov acceleration. In this work, we analyze continuous-time models of anchor acceleration. We provide tight, unified analyses for characterizing the convergence rate as a function of the anchor coefficient β(t), thereby providing insight into the anchor acceleration mechanism and its accelerated O(1/k 2 )-convergence rate. Finally, we present an adaptive method inspired by the continuous-time analyses and establish its effectiveness through theoretical analyses and experiments. Then E is a constant function. The proof of Proposition 3.2 uses dilated coordinate W (t) = C(t)(X(t) -X 0 ) to derive its conservation law in the style of Suh et al. [67] . We provide the details in Appendix E.2. Recall from (1) that d ds Ã(X(s)), Ẋ(s) ≥ 0, the integrand of the last term of E is nonnegative. This motivates us to define as our Lyapunov function. Corollary 3.3. Let 𝔸 be maximal monotone and β(t) = γ t p with p > 0, γ > 0. Let Ã(X(t)) be the selection of 𝔸(X(t)) as in Section 2.1. For t 0 ≥ 0, define V : [0, ∞) → R as for t > 0 and V (0) = lim t→0+ V (t). Then V (t) ≤ V (0) holds for t ≥ 0. A technical detail is that all terms involving d ds Ã(X(s)) have been excluded in the definition of V and this is what allows 𝔸 to not be Lipschitz continuous. We provide the details in Appendix E.3. Lemma 3.4. Consider the setup of Corollary 3.3. Assume Zer𝔸 ̸ = ∅. Then for t > 0 and Proof outline of Lemma 3.4. Define Then, from monotonicity of à and Young's inequality, Applying ( 8 ) and organizing, we can get the desired result. The details are provided in Appendix E.4. Proof outline of Theorem 3.1. It remains to show that last integral term of Lemma 3.4 is O 1 . The details are provided in Appendix E.5. Before we end this section, we observe how our analysis simplifies in the special case β(t) = 1 t . In this case, and this corresponds to the Lyapunov function of [59, Section 4] for the case γ = 1. As V (0) = 0, the conclusion of Lemma 3.4 becomes which to the best rate in Table 1 . Point convergence APPM is an instance of the Halpern method [54, Lemma 3.1], which iterates converge to the element in Zer𝔸 closest to X 0 [34, 73] . The anchor ODE also exhibits this behavior. Theorem 3.5. Let 𝔸 be a maximal monotone operator with Zer𝔸 ̸ = ∅ and X be the solution of (3). If lim t→∞ Ã(X(t)) = 0 and lim t→∞ 1/C(t) = 0, then, as t → ∞, We provide the proof in Appendix E.6. Tightness of analysis In this section, we show that the convergence rates of Table 1 are actually tight by considering the dynamics under the explicit example 𝔸 = 0 1 -1 0 . Throughout this section, we denote 𝔸 as A when when the operator is linear.