ICML2022

The Unsurprising Effectiveness of Pre-Trained Vision Models for Control

Simone Parisi, Aravind Rajeswaran, Senthil Purushwalkam, Abhinav Gupta

233 citations

Abstract

Recent years have seen the emergence of pretrained representations as a powerful abstraction for AI applications in computer vision, natural language, and speech. However, policy learning for control is still dominated by a tabularasa learning paradigm, with visuo-motor policies often trained from scratch using data from deployment environments. In this context, we revisit and study the role of pre-trained visual representations for control, and in particular representations trained on large-scale computer vision datasets. Through extensive empirical evaluation in diverse control domains (Habitat, Deep-Mind Control, Adroit, Franka Kitchen), we isolate and study the importance of different representation training methods, data augmentations, and feature hierarchies. Overall, we find that pre-trained visual representations can be competitive or even better than ground-truth state representations to train control policies. This is in spite of using only out-of-domain data from standard vision datasets, without any in-domain data from the deployment environments. Source code and more at https://sites.google. com/view/pvr-control .