ICLR2025

Learning to Discretize Denoising Diffusion ODEs

Vinh Tong, Dung-Trung Hoang, Anji Liu, Guy Van den Broeck, Mathias Niepert

摘要

Diffusion Probabilistic Models (DPMs) are generative models showing competitive performance in various domains, including image synthesis and 3D point cloud generation. Sampling from pre-trained DPMs involves multiple neural function evaluations (NFEs) to transform Gaussian noise samples into images, resulting in higher computational costs compared to single-step generative models such as GANs or VAEs. Therefore, reducing the number of NFEs while preserving generation quality is crucial. To address this, we propose LD3, a lightweight framework designed to learn the optimal time discretization for sampling. LD3 can be combined with various samplers and consistently improves generation quality without having to retrain resource-intensive neural networks. We demonstrate analytically and empirically that LD3 improves sampling efficiency with much less computational overhead. We evaluate our method with extensive experiments on 7 pre-trained models, covering unconditional and conditional sampling in both pixel-space and latent-space DPMs. We achieve FIDs of 2.38 (10 NFE), and 2.27 (10 NFE) on unconditional CIFAR10 and AFHQv2 in 5-10 minutes of training. LD3 offers an efficient approach to sampling from pre-trained diffusion models. Code is available at https://github.com/vinhsuhi/LD3 . Published as a conference paper at ICLR 2025 on multi-step sampling, selecting an appropriate strategy is crucial. Current approaches often rely on handcrafted schedules, which may not be optimal. Recent work has focused on optimizing time schedules. Xue et al. ( 2024 ) formulate an optimization problem aimed at identifying the optimal time discretization. They derive an upper bound for the global truncation error under the assumption that the score prediction error of the pretrained model is uniformly bounded by a small constant. However, this assumption is quite strong, as it leads to an optimization problem that depends solely on the noise schedule parameters, ignoring the influence of both the solver and the neural network. While this allows for a fast solution, typically found in a matter of seconds, it overlooks critical information about the pretrained model (trained dataset) and solver design. Furthermore, minimizing the upper bound does not necessarily equate to minimizing the actual global error. Sabour et al. (2024) empirically observe this problem when they derive a bound to the divergence between the analytical ODE solution distribution and the numerical solution distribution. Their objective is challenging to optimize that they need to simulate many sampling trajectories and use a large batch size when performing optimization to reduce the variance and early stopping to prevent divergence. Consequently, their proposed approach is slow and hard to use. Instead of optimizing the global truncation error, Chen et al. ( 2024 ) optimizes the local truncation errors. However, their method ignores the information about the solver being used to solve the ODE and it is not guarantee to optimize the global truncation error. Watson et al. (2022; 2021) propose the Differentiable Diffusion Sampler Search (DDSS) method, which aims to improve the Kernel Inception Score by optimizing time discretization. By leveraging Kernel Inception Score (KID) to guide the optimization process, DDSS aims to enhance the quality of generated samples. However, their method requires a large amount of training samples and needs over 50k iterations with batch size 512 to converge. We summarize some key differences between LD3 and similar approaches in Table 1 . Recent work by