ICCV2023
A Complete Recipe for Diffusion Generative Models
Kushagra Pandey, Stephan Mandt
14 citations
Abstract
Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at https://github.com/mandt-lab/PSLD . We include the proofs for Theorems 2.1 and 2.2 in Appendix A.1 for completeness. These results provide a general recipe for designing forward processes in SGMs. For the SGM to be a useful forward process, we need it to converge to a simple factorized distribution that serves as the initialization point of the backwards (generative) process. Consequently, we consider the following form of the stationary distribution p s (z): p s (z) = N (x; 0 dx , I dx )N (0 dm , M I dm ). (5) This form results from setting U (x) = x T x 2 in Eqn. 3. Therefore, for a positive semidefinite matrix D(z) and a skew-symmetric matrix Q(z), the most general class of forward processes which lead to an invariant distribution p s (z) can be specified by substituting the form of ∇H(z) (corresponding to p s (z) defined in Eqn. 5) in Eqn. 4. A similar characterization of forward processes has also been explored in a concurrent work by [27] in the context of likelihood estimation (see Section 5). Additional constraints on D and Q