ACL2025
When Will the Tokens End? Graph-Based Forecasting for LLMs Output Length
Grzegorz Piotrowski, Mateusz Bystronski, Mikolaj Holysz, Jakub Binkowski, Grzegorz Chodak, Tomasz Kajdanowicz
5 citations
Abstract
Large Language Models (LLMs) are typically trained to predict the next token in a sequence. However, their internal representations often encode signals that go beyond immediate next-token prediction. In this work, we investigate whether these hidden states also carry information about the remaining length of the generated output—an implicit form of foresight (Pal et al., 2023). Accurately estimating how many tokens are left in a response has both theoretical and practical relevance. From an interpretability perspective, it reveals that the model may internally track its progress through a generation. From a systems perspective, it enables more efficient inference strategies, such as LLM inference via output-length-aware scheduling (Sha-hout et al., 2024). In our work we show that by using graph-based approach one can predict length of the generated text after prefilling stage. The findings presented in this study may be particularly valuable for organizations providing LLM-based services that seek to manage and forecast inference costs more effectively.