ACL2024

Context Length Extension via Generalized Extrapolation Scale

Linhan Li, Huaping Zhang

摘要

Context length expansion of transformer models is considered a key challenge, especially when handling context beyond the training length during inference stage. In this paper, we propose Generalized extrapolatioN scalE (GeNE), a straightforward and effective method applied to the interpolate function of positional embeddings to achieve training short, test long. Experimental results show that GeNE notably improves long context language modeling. By randomly scaling the extrapolation ratio during the finetuning, GeNE achieves stable extrapolation on 64k contexts by training on 16k length. Further, the instruction following Llama2 model based on GeNE achieved competitive results compared with other open-source models of the same parameter scale. Our code is available at https: //github.com/LhLi-QED/GeNE .