AAAI2025

Quantum Best Arm Identification with Quantum Oracles

Xuchuang Wang, Yu-Zhen Janice Chen, Matheus Guedes de Andrade, Jonathan Allcock, Mohammad Hajiesmaili, John C. S. Lui, Don Towsley

4 citations

Abstract

Best arm identification (BAI) is a key problem in stochastic multi-armed bandits, where ๐พ arms each has an associated reward distribution, and the objective is to minimize the number of queries needed to identify the best arm with high confidence. In this paper, we explore BAI using quantum oracles. For the case where each query probes only one arm (๐‘š = 1), we devise a quantum algorithm with a query complexity upper bound of ร• (๐พฮ” -1 log(1/๐›ฟ)), where ๐›ฟ is the confidence parameter and ฮ” is the reward gap between best and second best arms. This improves on the classical bound by a factor of ฮ” -1 . For the general case where a single query can probe ๐‘š arms (1 โฉฝ ๐‘š โฉฝ ๐พ) simultaneously, we propose an algorithm with an upper bound of ร• ((๐พ/ โˆš ๐‘š)ฮ” -1 log(1/๐›ฟ)), improving by a factor of โˆš ๐‘š compared to the ๐‘š = 1 case. We also provide query complexity lower bounds for both scenarios, which match the upper bounds up to logarithmic factors, and validate our theoretical results with Qiskit-based simulations.