ICLR2026
DPad: Efficient Diffusion Language Models with Suffix Dropout
Xinhua Chen, Sitao Huang, Cong Guo, Chiyue Wei, Yintao He, Jianyi Zhang, Hai Helen Li, Yiran Chen
31 citations
Abstract
Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose , a training-free method that restricts attention to a structured subset of suffix tokens, preserving fidelity while eliminating redundancy. integrates two strategies: (i) a , which maintains a fixed-length suffix window, and (ii) , which deterministically removes distant suffix tokens before attention computation. This concise design is compatible with existing optimizations such as parallel decoding and prefix caching, and lends itself to a lightweight implementation. Comprehensive evaluations across multiple benchmarks on and models demonstrate that delivers up to speedup over vanilla dLLMs while maintaining comparable accuracy, highlighting its potential for efficient and scalable long-sequence inference.