ICML2025

Learning the RoPEs: Better 2D and 3D Position Encodings with STRING

Connor Schenck, Isaac Reid, Mithun George Jacob, Alex Bewley, Joshua Ainslie, David Rendleman, Deepali Jain, Mohit Sharma, Kumar Avinava Dubey, Ayzaan Wahid, Sumeet Singh, René Wagner, Tianli Ding, Chuyuan Fu, Arunkumar Byravan, Jake Varley, Alexey A. Gritsenko, Matthias Minderer, Dmitry Kalashnikov, Jonathan Tompson, Vikas Sindhwani, Krzysztof Marcin Choromanski

Abstract

Figure 1. Top: Successful diffusion policy conditioned on a STRING-enhanced Transformer vision encoder, attempting the doubleinsertion task on Aloha-sim. Bottom: Same experiment, but with a regular vision encoder for which the policy fails. STRING provides strong improvements for training dexterous robotics policies, outperforming previous position encoding algorithms such as RoPE.