ICML2025

Quantum Algorithms for Finite-horizon Markov Decision Processes

Bin Luo, Yuwen Huang, Jonathan Allcock, Xiaojun Lin, Shengyu Zhang, John C. S. Lui

Abstract

In this work, we design quantum algorithms that are more efficient than classical algorithms to solve time-dependent and finite-horizon Markov Decision Processes (MDPs) in two distinct settings: (1) In the exact dynamics setting, where the agent has full knowledge of the environment's dynamics (i.e., transition probabilities), we prove that our Quantum Value Iteration (QVI) algorithm QVI-1 achieves a quadratic speedup in the size of the action space (A) compared with the classical value iteration algorithm for computing the optimal policy (π * ) and the optimal V-value function (V * 0 ). Furthermore, our algorithm QVI-2 provides an additional speedup in the size of the state space (S) when obtaining near-optimal policies and V-value functions. Both QVI-1 and QVI-2 achieve quantum query complexities that provably improve upon classical lower bounds, particularly in their dependences on S and A. (2) In the generative model setting, where samples from the environment are accessible in quantum superposition, we prove that our algorithms QVI-3 and QVI-4 achieve improvements in sample complexity over the state-of-the-art (SOTA) classical algorithm in terms of A, estimation error (ϵ), and time horizon (H). More importantly, we prove quantum lower bounds to show that QVI-3 and QVI-4 are asymptotically optimal, up to logarithmic factors, assuming a constant time horizon. This is the full version of (Luo et al., 2025) , which was presented at ICML 2025.