NeurIPS2023
Riemannian Projection-free Online Learning
Zihao Hu, Guanghui Wang, Jacob D. Abernethy
被引用 3 次
摘要
The projection operation is a critical component in a wide range of optimization algorithms, such as online gradient descent (OGD), for enforcing constraints and achieving optimal regret bounds. However, it suffers from computational complexity limitations in high-dimensional settings or when dealing with ill-conditioned constraint sets. Projection-free algorithms address this issue by replacing the projection oracle with more efficient optimization subroutines. But to date, these methods have been developed primarily in the Euclidean setting, and while there has been growing interest in optimization on Riemannian manifolds, there has been essentially no work in trying to utilize projection-free tools here. An apparent issue is that non-trivial affine functions are generally non-convex in such domains. In this paper, we present methods for obtaining sub-linear regret guarantees in online geodesically convex optimization on curved spaces for two scenarios: when we have access to (a) a separation oracle or (b) a linear optimization oracle. For geodesically convex losses, and when a separation oracle is available, our algorithms achieve O(T 1 /2 ), O(T 3 /4 ) and O(T 1 /2 ) adaptive regret guarantees in the full information setting, the bandit setting with one-point feedback and the bandit setting with two-point feedback, respectively. When a linear optimization oracle is available, we obtain regret rates of O(T 3 /4 ) for geodesically convex losses and O(T 2 /3 log T ) for strongly geodesically convex losses. • Given a separation oracle, we attain adaptive regret bounds of O(T 1 /2 ), O(T 3 /4 ) and O(T 1 /2 ) for gsc-convex losses in the full information setting, the bandit convex optimization setting with onepoint feedback 1 , and the bandit convex optimization setting with two-point feedback, respectively. • Assuming access to a linear optimization oracle, we provide algorithms that enjoy O(T 3 /4 ) adaptive regret for gsc-convex losses and O(T 2 /3 log T ) regret for strongly gsc-convex losses. • We highlight some key differences between convex sets on a Hadamard manifold and in Euclidean space. In particular, shrinking a gsc-convex set towards an interior point does not preserve convexity, and the Minkowski functional on a Hadamard manifold is non-convex. These results, which may not be well-known within the machine learning community, could be of independent interest. The technical challenges of this paper can be divided into two parts. Firstly, for the separation oracle, we need to bound the "thickness" of part of the feasible set cut by the separating hyperplane. While in Euclidean space, this can be achieved through a convex combination argument (Garber & Kretzu, 2022), the task becomes challenging on manifolds due to the varying metric at different points and the non-convex nature of the separating hyperplane. Fortunately, this problem can be addressed using the Jacobi field comparison technique. Also, unlike in Euclidean space, one cannot directly construct a separation oracle for (1δ)K using a separation oracle for K. This poses significant challenges in the bandit setting. Nonetheless, we have identified a novel solution to this issue in the two-point feedback setting. Secondly, in Euclidean space, the linear optimization oracle is typically invoked by online Frank-Wolfe (OFW) (Hazan & Kale, 2012; Kretzu & Garber, 2021) to achieve no-regret online learning. The analysis of OFW relies on the fact that the Hessian of a linear function is zero everywhere. But on Hadamard manifolds, such functions' existence implies that the manifold has zero sectional curvature everywhere (Kristály et al., 2016) . The algorithms in Garber & Kretzu (2022) do not require affinity and serve as a starting point for our results. However, the analysis still needs to be conducted carefully due to the non-convexity of the separating hyperplane on manifolds.