NeurIPS2024

Nearly Minimax Optimal Submodular Maximization with Bandit Feedback

Artin Tajdini, Lalit Jain, Kevin Jamieson

Abstract

We consider maximizing an unknown monotonic, submodular set function f:2[n][0,1]f: 2^{[n]} \rightarrow [0,1] with cardinality constraint under stochastic bandit feedback. At each time t=1,,Tt=1,\dots,T the learner chooses a set St[n]S_t \subset [n] with Stk|S_t| \leq k and receives reward f(St)+ηtf(S_t) + \eta_t where ηt\eta_t is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret with respect to an approximation of the maximum f(S)f(S_*) with S=k|S_*| = k, obtained through robust greedy maximization of ff. To date, the best regret bound in the literature scales as kn1/3T2/3k n^{1/3} T^{2/3}. And by trivially treating every set as a unique arm one deduces that (nk)T\sqrt{ {n \choose k} T } is also achievable using standard multi-armed bandit algorithms. In this work, we establish the first minimax lower bound for this setting that scales like Ω~(minLk(L1/3n1/3T2/3+(nkL)T))\tilde{\Omega}(\min_{L \le k}(L^{1/3}n^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T})). For a slightly restricted algorithm class, we prove a stronger regret lower bound of Ω~(minLk(Ln1/3T2/3+(nkL)T))\tilde{\Omega}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T})). Moreover, we propose an algorithm Sub-UCB that achieves regret O~(minLk(Ln1/3T2/3+(nkL)T))\tilde{\mathcal{O}}(\min_{L \le k}(Ln^{1/3}T^{2/3} + \sqrt{{n \choose k - L}T})) capable of matching the lower bound on regret for the restricted class up to logarithmic factors.