ICML2024

Matroid Semi-Bandits in Sublinear Time

Ruo-Chun Tzeng, Naoto Ohsaka, Kaito Ariu

2 citations

Abstract

We study the matroid semi-bandits problem, where at each round the learner plays a subset of K arms from a feasible set, and the goal is to maximize the expected cumulative linear rewards. Existing algorithms have per-round time complexity at least Ω(K), which becomes expensive when K is large. To address this computational issue, we propose FasterCUCB whose sampling rule takes time sublinear in K for common classes of matroids: O(D polylog (K) polylog (T )) for uniform matroids, partition matroids, and graphical matroids, and O(D √ Kpolylog (T )) for transversal matroids. Here, D is the maximum number of elements in any feasible subset of arms, and T is the horizon. Our technique is based on dynamic maintenance of an approximate maximum-weight basis over inner-product weights. Although the introduction of an approximate maximum-weight basis presents a challenge in regret analysis, we can still guarantee an upper bound on regret as tight as CUCB in the sense that it matches the gap-dependent lower bound by Kveton et al. (2014a) asymptotically. To develop a sublinear-time sampling rule, we present a dynamic algorithm for maintaining maximum-weight base