ICLR2022

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

457 citations

Abstract

We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from pixel observations, previously unattained by model-free RL. DrQ-v2 is conceptually simple, easy to implement, and provides significantly better computational footprint compared to prior work, with the majority of tasks taking just 8 hours to train on a single GPU. Finally, we publicly release DrQ-v2's implementation to provide RL practitioners with a strong and computationally efficient baseline. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Frames (×10 6 ) 0 200 400 600 800 Episode Return 12 DMC Tasks (Samples) SAC CURL DrQ DrQ-v2 0 10 20 30 40 50 Hours (for 3 × 10 6 Frames)