ICML2021

Quantum algorithms for reinforcement learning with a generative model

Daochen Wang, Aarthi Sundaram, Robin Kothari, Ashish Kapoor, Martin Roetteler

被引用 38 次

摘要

Reinforcement learning studies how an agent should interact with an environment to maximize its cumulative reward. A standard way to study this question abstractly is to ask how many samples an agent needs from the environment to learn an optimal policy for a γ\gamma-discounted Markov decision process (MDP). For such an MDP, we design quantum algorithms that approximate an optimal policy (π\pi^*), the optimal value function (vv^*), and the optimal QQ-function (qq^*), assuming the algorithms can access samples from the environment in quantum superposition. This assumption is justified whenever there exists a simulator for the environment; for example, if the environment is a video game or some other program. Our quantum algorithms, inspired by value iteration, achieve quadratic speedups over the best-possible classical sample complexities in the approximation accuracy (ϵ\epsilon) and two main parameters of the MDP: the effective time horizon (11γ\frac{1}{1-\gamma}) and the size of the action space (AA). Moreover, we show that our quantum algorithm for computing qq^* is optimal by proving a matching quantum lower bound.