ICLR2026

What Exactly Does Guidance Do in Masked Discrete Diffusion Models

Ye He, Kevin Rojas, Molei Tao

3 citations

Abstract

Masked discrete diffusion models have been gaining popularity recently, and classifier-free guidance, just like its continuous counterpart, has been proposed to enable efficacious conditional generation by discrete diffusion. To quantify the precise effect of discrete guidance, this article considers masked discrete diffusion with arbitrary data distribution in low dimension, so that the distribution that guided masked discrete diffusion samples from, as well as the sampling dynamics, can be analytically and exactly quantified and interpreted. When the full data distribution is a mixture over classes and the goal is to sample from a specific class, guidance amplifies class-specific regions while suppresses regions shared with other classes. This effect depends on the guidance strength ww and induces distinct covariance structures in the sampled distribution. Notably, we observe quantitatively different behaviors in 11D and 22D. We also show that for large ww, the decay rate of the total variation (TV\text{TV}) along the reverse dynamics is double-exponential in ww for both 11D and 22D. These findings highlight the role of guidance, not just in shaping the output distribution, but also in controlling the dynamics of the sampling trajectory. Our theoretical analysis is supported by experiments that illustrate the geometric effects of guidance and its impact on convergence.