ICLR2026

Group Representational Position Encoding

Yifan Zhang, Zixiang Chen, Yifeng Liu, Zhen Qin, Huizhuo Yuan, Kangping Xu, Yang Yuan, Quanquan Gu, Andrew C Yao

1 citation

Abstract

We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multiplicative rotations (Multiplicative GRAPE) in SO(d)\operatorname{SO}(d) and (ii) additive logit biases (Additive GRAPE) arising from unipotent actions in the general linear group GL\mathrm{GL}. In Multiplicative GRAPE, a position nZn\in\mathbb{Z} (or tRt\in\mathbb{R}) acts as G(n)=exp(nωL)\mathbf{G}(n)=\exp(n\,\omega\,\mathbf{L}) with a rank‑2 skew generator LRd×d\mathbf{L} \in \mathbb{R}^{d \times d}, yielding a relative, compositional, norm‑preserving map with a closed‑form matrix exponential. RoPE is recovered exactly when the d/2d/2 planes are the canonical coordinate pairs with log‑uniform spectrum. Learned commuting subspaces and compact non‑commuting mixtures strictly extend this geometry to capture cross-subspace feature coupling at O(d)O(d) and O(rd)O(rd) cost per head, respectively. In Additive GRAPE, additive logits arise as rank‑1 (or low‑rank) unipotent actions, recovering ALiBi and the Forgetting Transformer (FoX) as exact special cases while preserving an exact relative law and streaming cacheability. Altogether, GRAPE supplies a principled design space for positional geometry in long‑context models, subsuming RoPE and ALiBi as special cases.