NeurIPS2020

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, Yuxin Chen

145 citations

Abstract

This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider γ-discounted infinite-horizon Markov decision processes (MDPs) with state space S and action space A. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least |S||A| (1-γ) 2 . The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -a perturbed modelbased algorithm and a conservative model-based algorithm -as soon as the sample size exceeds the order of |S||A| 1-γ (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study timeinhomogeneous finite-horizon MDPs, and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).