WWW2026

ONE-PASS: Single Forward Pass Decoding for Listwise Reranking

Yingpeng Du, Zhu Sun, Tianjun Wei, Jie Zhang

Abstract

Large Language Models (LLMs) have been widely adopted in ranking systems, specifically for reranking tasks. Despite the effectiveness, the auto-regressive decoding of LLMs leads to inference latency due to the memory-bandwidth-bound. To alleviate this bottleneck, prior studies have explored single token decoding as an approximation, but they suffer from performance degradation at the tail positions of ranking. In this paper, we propose a Single Forward Pass (SFP)-based method to pre-verify multiple rankings using tree attention, approximating auto-regressive decoding by relevant sub-rankings at each step. However, verifying all possible ranking permutations will lead to factorial-level token computation (N!), making it intractable within SFP. To this end, we first reduce item ranking permutations to combinations (2N), based on the empirical observation that the LLM's next-item generation is less sensitive to the exact ordering of preceding items. Furthermore, we divide the full ranking into K sub-rankings and aggregate their individual probabilities, which further decreases the verification space to 2⌈N/K⌉ • K << 2N. However, naïvely aggregating sub-ranking probabilities leads to inaccurate estimates in listwise ranking. To overcome this, we introduce a Möbius inversion model that explicitly decompose the individual contribution of subsets within a complete lattice, as verified by tree attention. Then, we learn their higher-order effects with a hierarchical self-attention model to reconstruct the full ranking probability. Experiments on both information retrieval and recommendation tasks show the effectiveness of our proposed method.