ICLR2026

FutureFill: Fast Generation from Convolutional Sequence Models

Naman Agarwal, Xinyi Chen, Evan Dogariu, Devan Shah, Hubert Strauss, Vladimir Feinberg, Daniel Suo, Peter Bartlett, Elad Hazan

6 citations

DOI arXiv Publisher

Abstract

We address the challenge of efficient auto-regressive generation in sequence prediction models by introducing FutureFill-a general-purpose fast generation method for any sequence prediction algorithm based on convolutional operators. Future-Fill reduces generation time from quadratic to quasilinear in the context length. Moreover, when generating from a prompt, it requires a prefill cache whose size grows only with the number of tokens to be generated-often much smaller than the caches required by standard convolutional or attention-based models. We validate our theoretical claims with experiments on synthetic tasks and demonstrate substantial efficiency gains when generating from a deep convolutional sequence prediction model.