ICML2025

Instance-Optimal Pure Exploration for Linear Bandits on Continuous Arms

Sho Takemori, Yuhei Umeda, Aditya Gopalan

摘要

This paper studies a pure exploration problem with linear bandit feedback on continuous arm sets, aiming to identify an -optimal arm with high probability. Previous approaches for continuous arm sets have employed instance-independent methods due to technical challenges such as the infinite dimensionality of the space of probability measures and the non-smoothness of the objective function. This paper proposes a novel, tractable algorithm that addresses these challenges by leveraging a reparametrization of the sampling distribution and projected subgradient descent. However, this approach introduces new challenges related to the projection and reconstruction of the distribution from the reparametrization. We address these by focusing on the connection to the approximate Carathéodory problem. Compared to the original optimization problem on the infinitedimensional space, our method is tractable, requiring only the solution of quadratic and fractional quadratic problems on the arm set. We establish an instance-dependent optimality for our method, and empirical results on synthetic environments demonstrate its superiority over existing instanceindependent baselines.