NeurIPS2023

Recurrent Hypernetworks are Surprisingly Strong in Meta-RL

Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson

16 citations

Abstract

Deep reinforcement learning (RL) is notoriously impractical to deploy due to sample inefficiency. Meta-RL directly addresses this sample inefficiency by learning to perform few-shot learning when a distribution of related tasks is available for meta-training. While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline. However, such claims have been controversial due to limited supporting evidence, particularly in the face of prior work establishing precisely the opposite. In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential. Surprisingly, when combined with hypernetworks, the recurrent baselines that are far simpler than existing specialized methods actually achieve the strongest performance of all methods evaluated. We provide code at https://github.com/jacooba/hyper . Recent work has shown the simpler recurrent methods to be a competitive baseline relative to task-inference methods [Ni et al., 2022] . However, such claims are contentious, as the supporting experiments compare only to one task-inference method designed for meta-RL, the experiments provide additional compute to the recurrent baseline, and the results still show similar or inferior performance to more complicated methods on the majority of difficult domains. In particular, they consider two toy domains and four challenging domains, with RNNs significantly outperformed on two of the four challenging domains, and superior to the single task-inference baseline on only one. 37th Conference on Neural Information Processing Systems (NeurIPS 2023).