ICML2025
LieRE: Lie Rotational Positional Encodings
Sophie Ostmeier, Brian Axelrod, Maya Varma, Michael E. Moseley, Akshay S. Chaudhari, Curtis Langlotz
摘要
Transformer architectures depend on explicit position encodings to capture token positional information. Rotary Position Encoding (RoPE) has emerged as a popular choice in language models due to its efficient encoding of relative position information through key-query rotations. However, RoPE faces significant limitations beyond language processing: it is constrained to one-dimensional sequence data and, even with learnable phases, offers limited representational capacity. We address these challenges with Lie Relative Encodings (LieRE), which generalizes RoPE to high-dimensional rotation matrices by leveraging their Lie group structure. Through extensive evaluation on three image datasets across 2D and 3D classification tasks, LieRE achieves 1.5% improvement over state-of-the-art baselines on 2D tasks and 1% on 3D tasks, while demonstrating superior generalization to higher resolutions. Our implementation is computationally efficient, with results reproducible on 4 A100 GPUs in 30 minutes on CIFAR100. Our code is available at https://github.com/ StanfordMIMI/LieRE .