ICLR2025

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

Marius Memmel, Jacob Berg, Bingqing Chen, Abhishek Gupta, Jonathan Francis

摘要

Robot learning is experiencing a surge in the size, diversity, and complexity of precollected datasets, paralleling trends in NLP and computer vision. Many methods treat these datasets as multi-task expert data to train generalist policies. However, while generalist policies improve average performance, they often underperform on individual tasks due to negative transfer, compared to specialist policies. In this work, we advocate for training policies during deployment by non-parametrically retrieving and training models on relevant data at test time, rather than relying on zero-shot pre-trained policies. We show that many robotics tasks share many low-level behaviors and that retrieval at the "sub"-trajectory granularity enables significantly improved data utilization, generalization, and robustness in adapting policies to novel problems. In contrast, existing retrieval methods tend to underutilize the data and miss out on shared cross-task content. Our proposed method, STRAP, uses vision foundation models and dynamic time warping to retrieve subsequences from large training corpora. STRAP outperforms prior retrieval algorithms in both simulated and real-world experiments, scaling to larger datasets and learning robust control policies from minimal real-world demonstrations.