CVPR2024

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar

被引用 11 次

摘要

Large generative diffusion models have revolution-ized text-to-image generation and offer immense po-tential for conditional generation tasks such as im-age enhancement, restoration, editing, and compositing. However, their widespread adoption is hindered by the high computational cost, which limits their real-time application. To address this challenge, we in-troduce a novel method dubbed CoDi, that adapts a pre-trained latent diffusion model to accept additional image conditioning inputs while significantly reducing the sampling steps required to achieve high-quality results. Our method can leverage architectures such as ControlNet to incorporate conditioning inputs with-out compromising the model's prior knowledge gained during large scale pre-training. Additionally, a con-ditional consistency loss enforces consistent predictions across diffusion steps, effectively compelling the model to generate high-quality images with conditions in a few steps. Our conditional-task learning and distil-lation approach outperforms previous distillation meth-ods, achieving a new state-of-the-art in producing high-quality images with very few steps (e.g., 1–4) across multiple tasks, including super-resolution, text-guided image editing, and depth-to-image generation.