ICML2021
On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP
Tianhao Wu, Yunchang Yang, Simon S. Du, Liwei Wang
13 citations
Abstract
We study reinforcement learning (RL) in episodic tabular MDPs with adversarial corruptions, where some episodes can be adversarially corrupted. When the total number of corrupted episodes is known, we propose an algorithm, Corruption Robust Monotonic Value Propagation (CR-MVP), which achieves a regret bound of Õ , where S is the number of states, A is the number of actions, H is the planning horizon, K is the number of episodes, and C is the known corruption level. We also provide a novel lower bound, which indicates that our upper bound is nearly tight. Finally, as an application, we study RL with rich observations in the block MDP model. We provide the first algorithm that achieves a √ Ktype regret in this setting and is oracle efficient.