ICML2021

Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization

Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain

被引用 5 次

摘要

We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions f t admit a "pseudo-1d" structure, i.e. f t (w) = t (g t (w)) where the output of g t is one-dimensional. At each round, the learner observes context x t , plays prediction g t (w t ; x t ) (e.g. g t (•) = x t , • ) for some w t ∈ R d and observes loss t (g t (w t )) where t is a convex Lipschitz-continuous function. The goal is to minimize the standard regret metric. This pseudo-1d bandit convex optimization problem (PBCO) arises frequently in domains such as online decision-making or parameter-tuning in large systems. For this problem, we first show a lower bound of min( √ dT , T 3/4 ) for the regret of any algorithm, where T is the number of rounds. We propose a new algorithm OPTPBCO that combines randomized online gradient descent with a kernelized exponential weights method to exploit the pseudo-1d structure effectively, guaranteeing the optimal regret bound mentioned above, up to additional logarithmic factors. In contrast, applying state-of-the-art online convex optimization methods leads to Õ min d 9.5 √ T , √ dT 3/4 regret, that is significantly suboptimal in d. t (g t (w; x t )) where g t : R d → R is a one-dimensional function. We formulate this Pseudo-1d Bandit Convex Optimization (in Section 2) as follows: given a data point, or context, x t ∈ X at round t, the prediction of the learner is given by g t (w t ; x t ) for some w t ∈ W ⊆ R d and known g t , e.g. g t (w t ; x t ) = w t , x t . The learner then receives t (g t (w t ; x t )) from the adversary for some unknown convex, Lipschitz-continuous loss t . The goal is to minimize regret, i.e. the excess cumulative loss suffered by the learner over the best, fixed, parameter w * ∈ W in hindsight. As mentioned above, the pseudo-1d structure arises naturally in online parameter