NeurIPS2022

Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards

Rati Devidze, Parameswaran Kamalaruban, Adish Singla

99 citations

Abstract

We study the problem of reward shaping to accelerate the training process of a reinforcement learning agent. Existing works have considered a number of different reward shaping formulations; however, they either require external domain knowledge or fail in environments with extremely sparse rewards. In this paper, we propose a novel framework, Exploration-Guided Reward Shaping (E XPLO RS), that operates in a fully self-supervised manner and can accelerate an agent’s learning even in sparse-reward environments. The key idea of E XPLO RS is to learn an intrinsic reward function in combination with exploration-based bonuses to maximize the agent’s utility w.r.t. extrinsic rewards. We theoretically showcase the usefulness of our reward shaping framework in a special family of MDPs. Experimental results on several environments with sparse/noisy reward signals demonstrate the effectiveness of E XPLO RS.