NeurIPS2024

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Qihao Zhou, Haishan Ye, Luo Luo

Abstract

This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the ε\varepsilon-duality gap within communication rounds of O(δD2/ε){\mathcal O}(\delta D^2/\varepsilon), communication complexity of O(n+nδD2/ε){\mathcal O}(n+\sqrt{n}\delta D^2/\varepsilon), and local gradient calls of O~(n+(nδ+L)D2/εlog(1/ε))\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)D^2/\varepsilon\log(1/\varepsilon)), where nn is the number of nodes, δ\delta is the degree of the second-order similarity, LL is the smoothness parameter and DD is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific μ\mu-strongly-convex-μ\mu-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of O(δ/μlog(1/ε))\mathcal O(\delta/\mu\log(1/\varepsilon)), O((n+nδ/μ)log(1/ε)){\mathcal O}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon)), and O~(n+(nδ+L)/μ)log(1/ε))\tilde{\mathcal O}(n+(\sqrt{n}\delta+L)/\mu)\log(1/\varepsilon)) respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.