ICLR2025

Consistency Models Made Easy

Zhengyang Geng, Ashwini Pokle, Weijian Luo, Justin Lin, J. Zico Kolter

摘要

Consistency models (CMs) offer faster sampling than traditional diffusion models, but their training is resource-intensive. For example, as of 2024, training a state-ofthe-art CM on CIFAR-10 takes one week on 8 GPUs. In this work, we identify the "curse of consistency" for training such models and propose an effective training scheme that largely mitigates this issue and improves the efficiency of building such models. Specifically, by expressing CM trajectories via the differential equation, we argue that diffusion models can be viewed as a special case of CMs. We can thus fine-tune a consistency model starting from a pretrained diffusion model and progressively approximate the full consistency condition to stronger degrees over the training process. Our resulting method, which we term Easy Consistency Tuning (ECT), achieves vastly reduced training times while improving upon the quality of previous methods: for example, ECT achieves a 2-step FID of 2.73 on CIFAR10 within 1 hour on a single A100 GPU, matching Consistency Distillation trained for hundreds of GPU hours. Owing to this computational efficiency, we investigate the scaling laws of CMs under ECT, showing that they obey the classic power law scaling, hinting at their ability to improve efficiency and performance at larger scales. Our code is available.