NeurIPS2024
Nearly Minimax Optimal Submodular Maximization with Bandit Feedback
Artin Tajdini, Lalit Jain, Kevin Jamieson
摘要
We consider maximizing an unknown monotonic, submodular set function with cardinality constraint under stochastic bandit feedback. At each time the learner chooses a set with and receives reward where is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum with , obtained through robust greedy maximization of . To date, the best regret bound in the literature scales as . And by trivially treating every set as a unique arm one deduces that is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like . For a slightly restricted algorithm class, we prove a stronger regret lower bound of . Moreover, we propose an algorithm Sub-UCB that achieves regret capable of matching the lower bound on regret for the restricted class up to logarithmic factors.