NeurIPS2025
Reasoning is Periodicity? Improving Large Language Models Through Effective Periodicity Modeling
Yihong Dong, Ge Li, Xue Jiang, Yongding Tao, Kechi Zhang, Lecheng Wang, Hao Zhu, Huanyu Liu, Jiazheng Ding, Jia Li, Jinliang Deng, Hong Mei
Abstract
Periodicity, as one of the most important basic characteristics, lays the foundation for facilitating structured knowledge acquisition and systematic cognitive processes within human learning paradigms. However, the potential flaws of periodicity modeling in Transformer affect the learning efficiency and establishment of underlying principles from data for large language models (LLMs) built upon it. In this paper, we demonstrate that integrating effective periodicity modeling can improve the learning efficiency and performance of LLMs. We introduce FANformer, which adapts Fourier Analysis Network (FAN) into attention mechanism to achieve efficient periodicity modeling, by modifying the feature projection process of attention mechanism. Extensive experimental results on language modeling show that FANformer consistently outperforms Transformer when scaling up model size and training tokens, underscoring its superior learning efficiency. Our pretrained FANformer-1B exhibits marked improvements on downstream tasks compared to open-source LLMs with similar model parameters or training tokens. Moreover, we reveal that FANformer exhibits superior ability to learn and apply rules for reasoning compared to Transformer. The results position FANformer as an effective and promising architecture for advancing LLMs. Our code is available at https://github.com/YihongDong/FANformer .. LLMs. ❷ We propose FANformer, a novel LLM architecture, which uses a simple yet effective approach to adapt FAN into attention mechanism for efficient periodicity modeling, consistently outperforming Transformers in scaling model parameters and training tokens. ❸ We pretrain and open-source FANformer-1B, which surpasses SOTA publicly available LLMs with similar parameter counts or training token budgets on downstream tasks. Motivation In this section, we combine formalization with illustrative cases to demonstrate why periodicity modeling facilitates language modeling and reasoning, thereby elucidating the motivation behind the development of FANformer. The essence of periodicity lies in the repetitive manifestation of certain invariance under transformations, which can be strictly defined through invariance under group actions in Abstract Algebra Dummit and Foote [2004] . Let X be a set and G be a group acting on X. An element x ∈ X is said to be periodic with respect to the action of G if there exists a non-identity element p ∈ G such that p • x = x, where • denotes the group action. The element p is called a period of x. Periodicity implies that x is invariant under the action of the cyclic subgroup generated by p, denoted by ⟨p⟩. For example, f (a) = f (a + T ) can be seen as a specific instance of the abstract definition p • x = x, where x = f , p = T , and the group action is translation. When the input a and the group G are extended to higher dimensions or non-temporal domains, the manifestation of the period T also changes accordingly. Crucially, for many reasoning tasks, the underlying operation or inference rule remains invariant across structurally similar subproblems, that is, for all inputs belonging to a certain equivalence class, the same functional rule is applied, which precisely reflects periodic invariance. Consider addition as an illustrative case: let f represent the addition operation, a denote the digit position index, and let the period T = 1 correspond to the positional shift in place value. The reasoning proceeds as: