NeurIPS2023

Provably (More) Sample-Efficient Offline RL with Options

Xiaoyan Hu, Ho-fung Leung

Abstract

The options framework yields empirical success in long-horizon planning problems of reinforcement learning (RL). Recent works show that options improves the sample efficiency in online RL where the learner can actively explores the environment. However, these results are no longer applicable to scenarios where exploring the environment online is risky, e.g., automated driving and healthcare. In this paper, we provide the first analysis of the sample complexity for offline RL with options, where the agent learns from a dataset without further interaction with the environment. We propose the PE ssimistic V alue I teration for Learning with O ptions (PEVIO) algorithm and establish near-optimal suboptimality bounds (with respect to the novel information-theoretic lower bound for offline RL with options) for two popular data-collection procedures, where the first one collects state-option transitions and the second one collects state-action transitions. We show that compared to offline RL with actions, using options not only enjoys a faster finite-time convergence rate (to the optimal value) but also attains a better performance (when either the options are carefully designed or the offline data is limited). Based on these results, we analyze the pros and cons of the data-collection procedures, which may facilitate the selection in practice.