ICML2021

On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

Tianhao Wu, Yunchang Yang, Simon S. Du, Liwei Wang

13 citations

Abstract

We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (CR-MVP), which achieves a regret bound of Õ , where S is the number of states, A is the number of actions, H is the planning horizon, K is the number of episodes, and C is the known corruption level. We also provide a novel lower bound, which indicates that our upper bound is nearly tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a √ Ktype regret in this setting and is oracle efficient.