ICML2024

Testing the Feasibility of Linear Programs with Bandit Feedback

Aditya Gangrade, Aditya Gopalan, Venkatesh Saligrama, Clayton Scott

3 citations

Abstract

While the recent literature has seen a surge in the study of constrained bandit problems, all existing methods for these begin by assuming the feasibility of the underlying problem. We initiate the study of testing such feasibility assumptions, and in particular address the problem in the linear bandit setting, thus characterising the costs of feasibility testing for an unknown linear program using bandit feedback. Concretely, we test if ∃x : Ax ≥ 0 for an unknown A ∈ R m×d , by playing a sequence of actions xt ∈ R d , and observing Axt + noise in response. By identifying the hypothesis as determining the sign of the value of a minimax game, we construct a novel test based on low-regret algorithms and a nonasymptotic law of iterated logarithms. We prove that this test is reliable, and adapts to the 'signal level,' Γ, of any instance, with mean sample costs scaling as O(d 2 /Γ 2 ). We complement this by a minimax lower bound of Ω(d/Γ 2 ) for sample costs of reliable tests, dominating prior asymptotic lower bounds by capturing the dependence on d, and thus elucidating a basic insight missing in the extant literature on such problems. Concretely, we work in the linear bandit setting, i.e., in response to an action x ∈ X ⊂ R d , we observe scores S ∈ R m such that E[S|x] = Ax, where A is latent, and with the constraint structured as Ax ≥ α for a given tolerance vector α. We study the binary composite hypothesis testing problem of determining if there exists an x : Ax ≥ α or not, with the goal of designing a sequential test that ensures that the probability of error is smaller than some given δ. Such a test is carried out for some random time τ, corresponding directly to the sample costs, which we aim to minimise. Effectively we are testing if an unknown linear program (LP) is feasible, and we may equivalently phrase the problem as testing the sign of the minimax value Γ := max x∈X min i (Axα) i . Also note that by incorporating the objective as a constraint vector, and a proposed optimal value as a constraint level, this test also corresponds to solving the recognition (or decision) version of the underlying LP (e.g., Papadimitriou & Steiglitz, 1998, Ch. 15). This problem falls within the broad purview of pure exploration bandit problems, and specifically the socalled minimum threshold problem, which has been studied in the multi-armed case for a single constraint (e.g. Kaufmann et al., 2018 , also see §1.1). Most of this literature focuses on the asymptotic setting of δ ց 0, and the typical result is of the form if the instance is feasible, then there exist tests satisfying lim Γ 2 E [τ ] 2 log(1/δ) = 1. Prima facie this is good news, in that there is a well-developed body of methods with tight instance specific costs that do not depend on the dimension of the action set, d! However, this lack of dependence should give us pause, since it does not make sense: if, e.g., X were a simplex, and only one corner of it were feasible, then detecting this feasibility should require us to search along each of the axes of X to locate some evidence, and so cost at least Ω(d) samples. The catch here lies in the limit, which implicitly enforces the regime δ = e -ω (d) . Of course, even for modest d, such small a δ is practically irrelevant. Thus, even in the finite-armed case, the existing theory of feasibility testing does not offer a pertinent characterisation of the costs in scenarios of rich action spaces with rare informative actions. Our contributions address this, and more. Concretely, we • Design novel and simple tests for feasibility based on exploiting low-regret methods and laws of iterated logarithm to certify the sign of the minimax value Γ. • Analyse these tests, and show that they are reliable and well-adapted to Γ, with stopping times scaling as O(d 2 /Γ 2 + d log(m/δ)/Γ 2 ), thus demonstrating that the cost due to the number of constraints, m, is limited, and that testing is possible far more quickly than finding near-optimal points. • Demonstrate a minimax lower bound of Ω(d/Γ 2 ) samples on the stopping time of reliable tests over feasible instances, thus showing that this uncaptured dependence is necessary. We note that while the design approach of using low-regret methods for feasibility testing has appeared previously, their use arises either as subroutines in a complex method, or through modified versions of Thompson Sampling that are hard to even specify for the linear setting. Instead, our approach is directly motivated, and extremely simple, relying only on the standard technical tools of online linear regression and laws of iterated logarithms (LILs), employed in a new way to construct robust boundaries for our test statistics. Our results thus provide a new perspective on this testing problem, and more broadly on active hypothesis testing. Related Work Minimum Threshold testing. The single-objective finite-armed bandit setup (Lattimore & Szepesvári, 2020) posits K < ∞ actions, or 'arms,' and in e