NeurIPS2025

DINGO: Constrained Inference for Diffusion LLMs

Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, Gagandeep Singh

Abstract

Diffusion LLMs have emerged as a promising alternative to conventional autoregressive LLMs, offering substantial potential for improving runtime efficiency. However, existing diffusion models fail to provably enforce user-specified formal constraints, such as regular expressions, which makes them unreliable for tasks that require structured outputs, such as fixed-schema JSON generation. Unlike autoregressive models, which generate tokens sequentially, diffusion LLMs predict a block of tokens in parallel. This parallelism makes traditional constrained decoding algorithms, designed to enforce constraints with sequential token prediction, ineffective at preserving the true output distribution. To address this limitation, we propose DINGO, a dynamic programming-based constrained decoding strategy that is both efficient and provably distribution-preserving. DINGO enables sampling of output strings with the highest probability under the model's predicted distribution while strictly adhering to any user-specified regular expression. On standard symbolic math and JSON generation benchmarks, DINGO achieves up to a 68% points of improvement over unconstrained inference. * Equal contributing authors ordered randomly Preprint. Under review. any constrained decoding algorithm for diffusion LLMs should also ensure that enforcing formal constraints does not come at the cost of distorting the true output distribution. Key Challenges: Diffusion LLMs generate a block of tokens starting from a fully masked string composed of special mask tokens ⊥, and iteratively unmask one or more tokens at each step until producing a fully unmasked output. Each unmasking step (referred to as a diffusion step) can unmask tokens at arbitrary positions in the block, with no left-to-right sequential dependency across steps. As a result, designing constrained decoding for diffusion LLMs requires addressing the following: • RQ1: Efficiently detecting invalid tokens and restricting token choices at each diffusion step to ensure the final unmasked string is always structurally correct. • RQ2: Ensuring the generated token block maximizes the probability under the output distribution. Contributions: We present the first constrained decoding algorithm for diffusion LLMs, making the following contributions: • We introduce DINGO, the first constrained decoding algorithm for diffusion LLMs that supports any user-specified regular expression. DINGO provably ensures that the output string is always a valid prefix of some string in the target regular language. • DINGO uses dynamic programming to ensure that the output string achieves the maximum probability among all valid strings over the output block with respect to the true output distribution. This approach guarantees scalability while maintaining optimality (e.g., maximizing the probability), in contrast to existing methods such as [Park et al., 2024b], which rely on repeated resampling. Resampling-based methods are computationally expensive and unsuitable for practical deployment. • Extensive experiments on multiple open-source diffusion LLMs and benchmarks show that DINGO significantly outperforms standard unconstrained decoding, achieving up to a 68% improvement on challenging tasks such as the GSM-symbolic benchmark for symbolic reasoning [Mirzadeh et al., 2024] and a JSON generation benchmark [NousResearch, 2024]. Roadmap: We provide the necessary background in Section 2, formalize constrained decoding for diffusion LLMs in Section 3, describe the DINGO algorithm along with its correctness and optimality proofs in Section 4, and present experimental results in Section 5.