ICML2025

Accelerated Diffusion Models via Speculative Sampling

Valentin De Bortoli, Alexandre Galashov, Arthur Gretton, Arnaud Doucet

Abstract

Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and then accepting or rejecting them based on the target model's distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vectorvalued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out-of-the-box to any diffusion model. We demonstrate significant generation speedup on various diffusion models, halving the number of function evaluations while generating exact samples from the target model. Finally, we also show how this procedure can be used to accelerate Langevin diffusions to sample unnormalized distributions. Motivation Denoising diffusion models (DDMs), introduced by Sohl-Dickstein et al. ( 2015 ) and further developed by Ho et al. (2020) and Song et al. (2021), are generative models exhibiting state-of-the-art performance in a wide variety of domains. The core concept behind DDMs is the progressive transformation of a data distribution into a Gaussian distribution through the addition of noise. Sample generation is achieved by simulating an approximation of the time-reversal of this noising process. This requires multiple evaluations of a neural network that approximates the scores of the noising process, and typically involves simulating a Markov chain over hundreds of steps. Since sample generation is computationally expensive, sev-* Equal contribution 1 Google DeepMind.